dataio.dataio module¶

Module for DataIO class.

The metadata spec is documented as a JSON schema, stored under schema/.

read_metadata(filename)[source]¶

Read the metadata as a dictionary given a filename.

If the filename is e.g. /some/path/mymap.gri, the assosiated metafile will be /some/path/.mymap.gri.yml (or json?)

Parameters:: filename (str | Path) – The full path filename to the data-object.
Return type:: dict
Returns:: A dictionary with metadata read from the assiated metadata file.

class ExportData[source]¶

Bases: object

This class provides context for the metadata generated when data is exported.

Here is a complete example of how it is used:

for name in ["TopOne", "TopTwo", "TopThree"]:
    poly = xtgeo.polygons_from_roxar(project, name, POL_FOLDER)

    ed = dataio.ExportData(
        config=CFG,
        content="depth",
        unit="m",
        vertical_domain="fault_lines",
        domain_reference="msl",
        timedata=None,
        is_observation=False,
        tagname="faultlines",
        workflow="rms structural model",
        name=name
    )
    out = ed.export(poly)

In general, fmu-dataio tries to take care of exporting data automatically to conventional and standard locations. In the documentation below you might find references to the following terms.

pwd: The present working directory. This is the directory a script or application is started from.
rootpath: The directory from which relative file names are relative to. This is auto-detected by fmu-dataio.
casepath: The path where the FMU case originates from (is started from). This should be equivalent to the rootpath in most circumstances.

Examples:

/project/foo/resmod/ff/2022.1.0/rms/model                   # pwd
/project/foo/resmod/ff/2022.1.0/                            # rootpath

A file:

/project/foo/resmod/ff/2022.1.0/share/results/maps/xx.gri   # example absolute
                                share/results/maps/xx.gri   # example relative

When running an Ert forward job using a normal Ert job (e.g. a script):

/scratch/nn/case/realization-44/iter-2                      # pwd
/scratch/nn/case                                            # rootpath

A file:

/scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri  # absolute
                 realization-44/iter-2/share/results/maps/xx.gri  # relative

When running an Ert forward job but here executed from RMS:

/scratch/nn/case/realization-44/iter-2/rms/model            # pwd
/scratch/nn/case                                            # rootpath

A file:

/scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri  # absolute
                 realization-44/iter-2/share/results/maps/xx.gri  # relative

config: dict[str, Any] | GlobalConfiguration¶

Required in order to produce valid metadata.

This global config must be provided either as an input value here or through an environment variable.

This value should be a dictionary with static settings. In the standard case this is read from FMU global variables produced by fmuconfig. The dictionary must contain some predefined main level keys to work with fmu-dataio.

Note

If missing or empty, an export() may still be done, but without any metadata produced.

content: str | dict | None = None¶

A required string describing the content of the data, e.g. "volumes".

Warning

Using the content argument as a dict to set both the content and the content metadata will be deprecated. Set the content argument to a valid content string, and provide the extra information through the content_metadata argument instead.

Some content types, like "seismic", require additional information. This should be provided through the content_metadata argument described below.

The list of content types that can be provided is controlled and input values are validated against a current list of them. In the following enumeration you would use only the string values of the content type.

class Content

The content type of a given data object.

Content.depth = 'depth'

A data object representing depth values.

Typically provided as an xtgeo.RegularSurface or xtgeo.Grid for export.

Content.facies_thickness = 'facies_thickness'

Thickness map representing facies thickness, derived from a 3D grid.

Typically provided as an xtgeo.RegularSurface for export.

Content.fault_lines = 'fault_lines'

Intersections between fault planes and horizons.

Typically provided as an xtgeo.Polygons for export.

Content.fault_surface = 'fault_surface'

A surface representing a fault plane.

Typically provided either as an RMS FaultRoom GeoJSON surface or an fmu-dataio TSurfData for export.

Content.fault_properties = 'fault_properties'

Properties, such as permeability and porosity, on a fault.

Typically provided as a GeoJSON file derived from RMS FaultRoom for export.

Content.field_outline = 'field_outline'

Polygons representing the outline of a field, initial (static) conditions.

Typically provided as an xtgeo.Polygons for export.

Content.field_region = 'field_region'

Delineated or named region within a field.

Typically provided as an xtgeo.Polygons for export.

Content.fluid_contact = 'fluid_contact'

Depth surface representing a fluid contact used per realization.

Typically provided as an xtgeo.RegularSurface for export.

Content.khproduct = 'khproduct'

The product of permeability (k) and reservoir thickness (h).

Typically provided as an xtgeo.RegularSurface for export.

Content.lift_curves = 'lift_curves'

Table representing the relationship between production rates and pressures.

Typically provided as a Pandas Dataframe for export.

Content.mapping = 'mapping'

Tabular cross-references used to translate between different naming conventions or identifiers.

Acts as a bridge to align data across different domains, such as: * Official stratigraphy to model zonation. * Static reservoir regions/zones to simulator-specific identifiers (e.g., FIPGRP). * Unique Well Identifiers (UWI) to simulation well names.

Typically provided as a Pandas Dataframe for export.

Content.named_area = 'named_area'

A named area within a field that is _not_ a region.

Typically provided as an xtgeo.Polygons for export.

Content.observations = 'observations'

ERT observations generated for the ensemble.

Typically provided as a Pandas Dataframe for export.

Tip

You should not export this manually. This is done automatically by the CREATE_CASE_METADATA ERT workflow.

Content.production_network = 'production_network'

Tabular data representing the production group structure.

Typically provided as a Pandas Dataframe.

Tip

You should not export this manually. Use SIM2SUMO.

Content.pinchout = 'pinchout'

Polygons designating a pinchout.

Typically provided as an xtgeo.Polygons for export.

Content.property = 'property'

A property, like permeability or porosity, belonging to a 3D grid.

Typically provided as an xtgeo.GridProperty.

Tip

This content type requires additional input in the content_metadata field.

Grid property data handling is still immature. More comprehensive data categorization will come in the future.

Content.pvt = 'pvt'

Tabular pressure-volume-temperature data.

Typically provided as a Pandas Dataframe for export.

Tip

You should not export this manually. Use SIM2SUMO.

Content.regions = 'regions'

Distinct areas within the field that have different characteristics.

Examples may be volume regions or contact regions.

Typically provided as an xtgeo.Polygons or xtgeo.GridProperty.

Content.relperm = 'relperm'

Tabular relative permeability data.

Typically provided as a Pandas Dataframe for export.

Tip

You should not export this manually. Use SIM2SUMO.

Content.rft = 'rft': Tabular reservoir formation tests data.

Tip

You should not export this manually. Use SIM2SUMO.

Content.seismic = 'seismic'

Data that is seismic in nature, including seismic cubes and surface data derived from seismic cubes.

Typically provided as an xtgeo.Cube, xtgeo.RegularSurface, or other.

Tip

This content type requires additional input in the content_metadata field.

Seismic data handling is still immature. More comprehensive data categorization will come in the future.

Content.simulationtimeseries = 'simulationtimeseries'

Time-series data generated by a reservoir simulator like OPM Flow or Eclipse.

For example, a summary file parsed into a Pandas Dataframe by res2df.

Tip

You should not export this manually. Use SIM2SUMO.

Content.subcrop = 'subcrop'

Surface or polygon representing a subcrop area.

Typically provided as an xtgeo.RegularSurface or xtgeo.Polygons for export.

Content.thickness = 'thickness'

A thickness map.

Typically provided as an xtgeo.RegularSurface for export.

Content.time = 'time'

A seismic time surface or seismic cube in time domain.

Typically provided as an xtgeo.RegularSurface or xtgeo.Cube.

Content.transmissibilities = 'transmissibilities'

Tabular data containing transmissibilities (neighbour and non-neigbor-connections).

Typically provided as a Pandas Dataframe.

Tip

You should not export this manually. Use SIM2SUMO.

Content.velocity = 'velocity'

A seismic velocity map represented as a regular surface or a cube.

Typically provided as an xtgeo.RegularSurface or xtgeo.Cube for export.

Content.volumes = 'volumes'

Tabulated inplace volumes per grid, initial (static) conditions.

Typically provided as a Pandas Dataframe.

Content.well_completions = 'well_completions'

Tabular data representing well completions.

Typically provided as a Pandas Dataframe.

Tip

You should not export this manually. Use SIM2SUMO.

Content.wellpicks = 'wellpicks'

Tabular data representing wellpicks.

Typically provided as a Pandas Dataframe.

content_metadata: dict | None = None¶

Optional. Dictionary with additional information about the provided content. Only required for some content types, e.g. "seismic".

Example:

content_metadata={"attribute": "amplitude", "calculation": "mean"},

classification: str | None = None¶

Optional. Security classification level of the data object.

If present it will override the default found in the config.

The list of classification types that can be provided is controlled and input values are validated against a current list of them. In the following enumeration you would use only the string values of the classification type.

class Classification

The security classification for a given data object.

Classification.internal = 'internal'

Grants access to all users with READ access to the asset.

The READ role is an access role defined by the asset’s Unix and Sumo groups. This is the default for most data.

Classification.restricted = 'restricted'

Grants access to all users with WRITE access to the asset.

The WRITE role is an access role defined by the asset’s Unix and Sumo groups. This is the default for some sensitive data, like volumes, but in general must be explicitly set when restricted access is desired.

domain_reference: str = 'msl'¶

Optional. Reference to the vertical scale of the data.

class DomainReference

DomainReference.msl = 'msl': In reference to Mean Sea Level.

DomainReference.sb = 'sb': In reference to Sea Bottom.

DomainReference.rkb = 'rkb': In reference to Rotary Kelly Bushing (RKB).

Note

Use the vertical_domain key to set the domain (depth or time).

vertical_domain: str | dict = 'depth'¶

Optional. The vertical domain of the data.

class VerticalDomain

VerticalDomain.depth = 'depth': In the domain of depth.

VerticalDomain.time = 'time': In the domain of time.

A reference for the vertical scale can be provided with the domain_reference value.

Note

If the content is "depth" or "time" this value will be set accordingly.

Warning

Providing a dictionary as a value is deprecated.

geometry: str | None = None¶

Optional. For grid properties only which need a reference to the 3D grid geometry object.

The value must point to an existing file which has already been exported with fmu-dataio, and hence has an associated metadata file. The grid name will be derived from the grid metadata, if present, and applied as part of the grid property file name.

Note

This value may replace the usage of both the parent value and the grid_model value in the near future.

is_observation: bool = False¶

If True then data will be exported to the share/observations/ directory.

By default this is False which will export results to the share/results/ directory.

However, if preprocessed is True, then the export directory will be set to share/preprocessed/ irrespective the value of is_observation.

is_prediction: bool = True¶: Indicates if the exported data is model prediction data.

timedata: list[str] | list[list[str]] | None = None¶

Optional. List of dates, where the dates are strings on form "YYYYMMDD".

timedata=["20200101"],

timedata=["20200101", "20180101"],

A maximum of two dates can be input. The oldest date will be set as t0 in the metadata and the latest date will be t1.

Note

It is also possible to provide a label to each date by using a list of lists, e.g. [["20200101", "monitor"], ["20180101", "base"]].

unit: str | None = ''¶

Optional. The measurement unit relevant to the exported data.

For example, "m" would be set if the measurement unit is meters.

Caution

This value is not currently controlled by a known list but will be in the future.

table_index: list[str] | None = None¶

Optional. A list of strings indicating the index columns for tabular data.

This value should be set for tabular data like Pandas data frames only.

Example:

table_index=["ZONE", "REGION"],

This can also be applied to points or polygons objects that are exported in table format to specify attributes that should act as index columns.

Tip

Index columns in tabular data refer to one or more columns that uniquely identify each row in the dataset. They serve as a reference point for data retrieval and manipulation, enabling simple and efficient access to specific rows.

preprocessed: bool = False¶

If True, data is exported to the "share/preprocessed/" directory.

This metadata can be partially re-used in an Ert model run using the ExportPreprocessedData class.

Note

Most data are not preprocessed data, and as such this key shouldn’t often be used. An example of preprocessed data is seismic data.

description: str | list[str] = ''¶: Optional. A multi-line description of the data either as a string or a list of strings.

Tip

You do not need to set this.

display_name: str | None = None¶: Optional. Set a display name for clients to use when visualizing.

Tip

You do not need to set this.

name: str = ''¶

Optional. The name of the data object being exported.

If not set, fmu-dataio infers it from object data type. If the name is found in the stratigraphy static metadata list, the official stratigraphic name will be used.

For example, if "TopValysar" is the model name and the actual name is "Valysar Top Fm.", the latter name will be used.

Tip

You do not need to set this.

tagname: str = ''¶

Optional. A short tag description which will be a part of the file name.

As an example, if exporting a fault polygon from a horizon named "TopVolantis",

tagname="faultlines",

The exported filename will be volantis_gp_top--faultlines.csv

Tip

You do not need to set this, but it may be useful for local workflows.

workflow: str | dict[str, str] | None = None¶: Optional. Short string description of workflow.

Warning

Providing a dictionary as a value is deprecated.

Tip

You do not need to set this.

forcefolder: str = ''¶

Optional. This value allows exporting to a non-standard directory relative to the casepath/rootpath.

Warning

Using this optional is generally not recommended.

This option is dependent upon the FMU context (case or realization) and the is_observation boolean value.

Example:

forcefolder="seismic",

This will replace the cubes/ standard directory for xtgeo.Cube output with seismic/.

Caution

Use with care and avoid if possible!

parent: str = ''¶

Optional. This value is required for datatype xtgeo.GridProperty, unless the geometry value is given.

“Parent” refers to the name of the grid geometry. It will only be added in the filename, and not as genuine metadata entry.

Warning

This value is a candidate for deprecation. Use geometry instead.

If both parent and geometry are given, the grid name derived from the geometry object will have precedence.

casepath: str | Path | None = None¶: Optional. Path to a case directory that contains valid case metadata fmu_case.yml in folder <CASE_DIR>/share/metadata/.

Tip

You typically do not need to set this.

aggregation: bool = False¶

fmu_context: str | None = None¶

rep_include: bool | None = None¶

subfolder: str = ''¶

undef_is_zero: bool = False¶

case_folder: ClassVar[str] = 'share/metadata'¶

polygons_fformat: ClassVar[str] = 'csv'¶

points_fformat: ClassVar[str] = 'csv'¶

table_fformat: ClassVar[str] = 'csv'¶

access_ssdl: dict¶

depth_reference: str | None = None¶

realization: int | None = None¶

reuse_metadata_rule: str | None = None¶

runpath: str | Path | None = None¶

verbosity: str = 'DEPRECATED'¶

grid_model: str | None = None¶

__init__(config=<factory>, content=None, content_metadata=None, classification=None, domain_reference='msl', vertical_domain='depth', geometry=None, is_observation=False, is_prediction=True, timedata=None, unit='', table_index=None, preprocessed=False, description='', display_name=None, name='', tagname='', workflow=None, forcefolder='', parent='', casepath=None, aggregation=False, fmu_context=None, rep_include=None, subfolder='', undef_is_zero=False, access_ssdl=<factory>, depth_reference=None, realization=None, reuse_metadata_rule=None, runpath=None, verbosity='DEPRECATED', grid_model=None)¶

allow_forcefolder_absolute: ClassVar[bool] = False¶

arrow_fformat: ClassVar[str | None] = None¶

createfolder: ClassVar[bool] = True¶

cube_fformat: ClassVar[str | None] = None¶

filename_timedata_reverse: ClassVar[bool] = False¶

grid_fformat: ClassVar[str | None] = None¶

include_ertjobs: ClassVar[bool] = False¶

legacy_time_format: ClassVar[bool] = False¶

meta_format: ClassVar[Literal['yaml', 'json'] | None] = None¶

surface_fformat: ClassVar[str | None] = None¶

dict_fformat: ClassVar[str | None] = None¶

table_include_index: ClassVar[bool] = False¶

verifyfolder: ClassVar[bool] = True¶

generate_metadata(obj, compute_md5=True, **kwargs)[source]¶

Generate and return the complete metadata for a provided object.

An object may be a map, 3D grid, cube, table, etc which is of a known and supported type.

Examples of such known types are XTGeo objects (e.g. a RegularSurface), a Pandas Dataframe, a PyArrow table, etc.

Parameters:

obj (Annotated[Cube | GridProperty | Grid | Points | Polygons | RegularSurface | DataFrame | FaultRoomSurface | TriangulatedSurface | MutableMapping | Table | Path | str]) – XTGeo instance, a Pandas Dataframe instance or other supported object.
compute_md5 (bool) – Deprecated, a MD5 checksum will always be computed.
**kwargs (Any) – Using other ExportData() input keys is now deprecated, input the arguments when initializing the ExportData() instance instead.

Return type:

dict

Returns:

A dictionary with all metadata.

export(obj, **kwargs)[source]¶

Export supported data objects with metadata.

This function exports data without changing the content of the data. The file format of the data may be determined by values set in the class.

A file containing metadata will be exported next to it. It will have the same name as the data, but will be prefixed with a .. This causes the metadata to not be visible by a standard ls command. The metadata is stored in a YAML file.

top_volantis--depth.gri
.top_volantis--depth.gri.yml

Parameters:: obj (Annotated[Cube | GridProperty | Grid | Points | Polygons | RegularSurface | DataFrame | FaultRoomSurface | TriangulatedSurface | MutableMapping | Table | Path | str]) – An xtgeo object, Pandas dataframe, or other supported object. A full list of supported data types can be found in the documentation.
Returns:: The full path to the exported item.
Return type:: str

Note

Providing **kwargs is deprecated and will be removed in a later version.