dataio.dataio module
Module for DataIO class.
The metadata spec is documented as a JSON schema, stored under schema/.
- exception dataio.dataio.ValidationError[source]
Bases:
ValueError,KeyErrorRaise error while validating.
- dataio.dataio.read_metadata(filename)[source]
Read the metadata as a dictionary given a filename.
If the filename is e.g. /some/path/mymap.gri, the assosiated metafile will be /some/path/.mymap.gri.yml (or json?)
- Parameters:
filename (
Union[str,Path]) – The full path filename to the data-object.- Return type:
dict- Returns:
A dictionary with metadata read from the assiated metadata file.
- class dataio.dataio.ExportData(access_ssdl=<factory>, aggregation=False, casepath=None, config=<factory>, content=None, depth_reference='msl', description='', fmu_context='realization', forcefolder='', grid_model=None, is_observation=False, is_prediction=True, name='', undef_is_zero=False, parent='', realization=-999, reuse_metadata_rule=None, runpath=None, subfolder='', tagname='', timedata=None, unit='', verbosity='CRITICAL', vertical_domain=<factory>, workflow='', table_index=None, table_index_values=None)[source]
Bases:
objectClass for exporting data with rich metadata in FMU.
This class sets up the general metadata content to be applied in export. The idea is that one ExportData instance can be re-used for several similar export() jobs. For example:
edata = dataio.ExportData( config=CFG, content="depth", unit="m", vertical_domain={"depth": "msl"}, timedata=None, is_prediction=True, is_observation=False, tagname="faultlines", workflow="rms structural model", ) for name in ["TopOne", TopTwo", "TopThree"]: poly = xtgeo.polygons_from_roxar(PRJ, hname, POL_FOLDER) out = ed.export(poly, name=name)
Almost all keyword settings like
name,tagnameetc can be set in both the ExportData instance and directly in thegenerate_metadataorexport()function, to provide flexibility for different use cases. If both are set, theexport()setting will win followed bygenerate_metadata() and finally ExportData().A note on ‘pwd’ and ‘rootpath’ and ‘casepath’: The ‘pwd’ is the process working directory, which is folder where the process (script) starts. The ‘rootpath’ is the folder from which relative file names are relative to and is normally auto-detected. The user can however force set the ‘actual’ rootpath by providing the input casepath. In case of running a RMS project interactive on disk:
/project/foo/resmod/ff/2022.1.0/rms/model << pwd /project/foo/resmod/ff/2022.1.0/ << rootpath A file: /project/foo/resmod/ff/2022.1.0/share/results/maps/xx.gri << example absolute share/results/maps/xx.gri << example relative
When running an ERT2 forward job using a normal ERT job (e.g. a script):
/scratch/nn/case/realization-44/iter-2 << pwd /scratch/nn/case << rootpath A file: /scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri << absolute realization-44/iter-2/share/results/maps/xx.gri << relative
When running an ERT2 forward job but here executed from RMS:
/scratch/nn/case/realization-44/iter-2/rms/model << pwd /scratch/nn/case << rootpath A file: /scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri << absolute realization-44/iter-2/share/results/maps/xx.gri << relative
- Parameters:
access_ssdl (
dict) –- Optional. A dictionary that will overwrite or append
to the default ssdl settings read from the config. Example:
{"access_level": "restricted", "rep_include": False}casepath (
Union[str,Path,None]) – To override the automatic and actualrootpath. Absolute path to the case root. If not provided, the rootpath will be attempted parsed from the file structure or by other means. See also fmu_context, where “case” may need an explicit casepath!config (
dict) – Required in order to produce valid metadata, either as key (here) or through an environment variable. A dictionary with static settings. In the standard case this is read from FMU global variables (via fmuconfig). The dictionary must contain some predefined main level keys to work with fmu-dataio. If the key is missing or key value is None, then it will look for the environment variable FMU_GLOBAL_CONFIG to detect the file. If no success in finding the file, a UserWarning is made. If both a valid config is provided and FMU_GLOBAL_CONFIG is provided in addition, the latter will be used. Note that this key shall be set while initializing the instance, ie. it cannot be used ingenerate_metadata()orexport(). Note also: If missing or empty, export() may still be done, but without a metadata file (this feature may change in future releases).content (
Union[dict,str,None]) – Optional, default is “depth”. Is a string or a dictionary with one key. Example is “depth” or {“fluid_contact”: {“xxx”: “yyy”, “zzz”: “uuu”}}. Content is checked agains a white-list for validation!fmu_context (
str) – In normal forward models, the fmu_context isrealizationwhich is default and will put data per realization. Other contexts may becasewhich will put data relative to the case root (see also casepath). Another important context is “preprocessed” which will output to a dedicated “preprocessed” folder instead, and metadata will be partially re-used in an ERT model run. If a non-FMU run is detected (e.g. you run from project), fmu-dataio will detect that and set actual context to None as fall-back (unless preprocessed is specified). If value is “preprocessed”, see alsoreuse_metadatakey.description (
Union[str,list]) – A multiline description of the data either as a string or a list of strings.display_name – Optional, set name for clients to use when visualizing.
forcefolder (
str) – This setting shall only be used as exception, and will make it possible to output to a non-standard folder. A/in front will indicate an absolute path*; otherwise it will be relative to casepath or rootpath, as dependent on the both fmu_context and the is_observations boolean value. A typical use-case is forcefolder=”seismic” which will replace the “cubes” standard folder for Cube output with “seismics”. Use with care and avoid if possible! (*) For absolute paths, the class variable allow_forcefolder_absolute must set to True.grid_model (
Optional[str]) – Currently allowed but planned for deprecationinclude_index – This applies to Pandas (table) data only, and if True then the index column will be exported. Deprecated, use class variable
table_include_indexinsteadis_prediction (
bool) – True (default) if model prediction datais_observation (
bool) – Default is False. If True, then disk storage will be on the “share/observations” folder, otherwise on share/result. An exception arise if fmu_context is “preprocessed”, then the folder will be set to “share/processed” irrespective the value of is_observation.name (
str) – Optional but recommended. The name of the object. If not set it is tried to be inferred from the xtgeo/pandas/… object. The name is then checked towards the stratigraphy list, and name is replaced with official stratigraphic name if found in static metadata stratigraphy. For example, if “TopValysar” is the model name and the actual name is “Valysar Top Fm.” that latter name will be used.parent (
str) – Optional. This key is required for datatype GridProperty, and refers to the name of the grid geometry.realization (
int) – Optional, default is -999 which means that realization shall be detected automatically from the FMU run. Can be used to override in rare cases. If so, numbers must be >= 0reuse_metadata_rule (
Optional[str]) – This input is None or a string describing rule for reusing metadata. Default is None, but if the input is a file string or object with already valid metadata, then it is assumed to be “preprocessed”, which merges the metadata after predefined rules.runpath (
Union[str,Path,None]) – TODO! Optional and deprecated. The relative location of the current run root. Optional and will in most cases be auto-detected, assuming that FMU folder conventions are followed. For an ERT run e.g. /scratch/xx/nn/case/realization-0/iter-0/. while in a revision at project disc it will the revision root e.g. /project/xx/resmod/ff/21.1.0/.subfolder (
str) – It is possible to set one level of subfolders for file output. The input should only accept a single folder name, i.e. no paths. If paths are present, a deprecation warning will be raised.tagname (
str) – This is a short tag description which be be a part of file name.timedata (
Optional[List[list]]) – If given, a list of lists with dates, .e.g. [[20200101, “monitor”], [20180101, “base”]] or just [[2021010]]. The output to metadata will from version 0.9 be different (API change)verbosity (
str) – Is logging/message level for this module. Input as in standard python logging; e.g. “WARNING”, “INFO”, “DEBUG”. Default is “CRITICAL”.vertical_domain (
dict) – This is dictionary with a key and a reference e.g. {“depth”: “msl”} which is default if missing.workflow (
str) – Short tag desciption of workflow (as description)undef_is_zero (
bool) – Flags that nans should be considered as zero in aggregations
Note
Comment on time formats
If two dates are present (i.e. the element represents a difference, the input time format is on the form:
timedata: [[20200101, "monitor"], [20180101, "base"]]
Hence the last data (monitor) usually comes first.
In the new version this will shown in metadata files as where the oldest date is shown as t0:
data: t0: value: 2018010T00:00:00 description: base t1: value: 202020101T00:00:00 description: monitor
The output files will be on the form: somename–t1_t0.ext
Note
Using config from file
Optionally, the keys can be stored in a yaml file as argument, and you can let the environment variable FMU_DATAIO_CONFIG point to that file. This can e.g. make it possible for ERT jobs to point to external input config’s. For example:
export FMU_DATAIO_CONFIG="/path/to/mysettings.yml" export FMU_GLOBAL_CONFIG="/path/to/global_variables.yml"
In python:
eda = ExportData() eda.export(obj)
-
allow_forcefolder_absolute:
ClassVar[bool] = False
-
arrow_fformat:
ClassVar[str] = 'arrow'
-
case_folder:
ClassVar[str] = 'share/metadata'
-
createfolder:
ClassVar[bool] = True
-
cube_fformat:
ClassVar[str] = 'segy'
-
filename_timedata_reverse:
ClassVar[bool] = False
-
grid_fformat:
ClassVar[str] = 'roff'
-
include_ert2jobs:
ClassVar[bool] = False
-
legacy_time_format:
ClassVar[bool] = False
-
meta_format:
ClassVar[str] = 'yaml'
-
polygons_fformat:
ClassVar[str] = 'csv'
-
points_fformat:
ClassVar[str] = 'csv'
-
surface_fformat:
ClassVar[str] = 'irap_binary'
-
table_fformat:
ClassVar[str] = 'csv'
-
dict_fformat:
ClassVar[str] = 'json'
-
table_include_index:
ClassVar[bool] = False
-
verifyfolder:
ClassVar[bool] = True
-
access_ssdl:
dict
-
aggregation:
bool= False
-
casepath:
Union[str,Path,None] = None
-
config:
dict
-
content:
Union[dict,str,None] = None
-
depth_reference:
str= 'msl'
-
description:
Union[str,list] = ''
-
fmu_context:
str= 'realization'
-
forcefolder:
str= ''
-
grid_model:
Optional[str] = None
-
is_observation:
bool= False
-
is_prediction:
bool= True
-
name:
str= ''
-
undef_is_zero:
bool= False
-
parent:
str= ''
-
realization:
int= -999
-
reuse_metadata_rule:
Optional[str] = None
-
runpath:
Union[str,Path,None] = None
-
subfolder:
str= ''
-
tagname:
str= ''
-
timedata:
Optional[List[list]] = None
-
unit:
str= ''
-
verbosity:
str= 'CRITICAL'
-
vertical_domain:
dict
-
workflow:
str= ''
-
table_index:
Optional[list] = None
-
table_index_values:
Optional[dict] = None
- generate_metadata(obj, compute_md5=True, **kwargs)[source]
Generate and return the complete metadata for a provided object.
An object may be a map, 3D grid, cube, table, etc which is of a known and supported type.
Examples of such known types are XTGeo objects (e.g. a RegularSurface), a Pandas Dataframe, a PyArrow table, etc.
If the key
reuse_metadata_ruleis applied with legal value, the object may also be a reference to a file with existing metadata which then will be re-used.- Parameters:
obj (
Any) – XTGeo instance, a Pandas Dataframe instance or other supported object.compute_md5 (
bool) – If True, compute a MD5 checksum for the exported file.**kwargs – For other arguments, see ExportData() input keys. If they exist both places, this function will override!
- Return type:
dict- Returns:
A dictionary with all metadata.
Note
If the
compute_md5key is False, thefile.checksum_md5will be empty. If true, the MD5 checksum will be generated based on export to a temporary file, which may be time-consuming if the file is large.
- export(obj, return_symlink=False, **kwargs)[source]
Export data objects of ‘known’ type to FMU storage solution with metadata.
This function will also collect the data spesific class metadata. For “classic” files, the metadata will be stored i a YAML file with same name stem as the data, but with a . in front and “yml” and suffix, e.g.:
top_volantis--depth.gri .top_volantis--depth.gri.yml
- Parameters:
obj – XTGeo instance, a Pandas Dataframe instance or other supported object.
return_symlink – If fmu_context is ‘case_symlink_realization’ then the link adress will be returned if this is True; otherwise the physical file path will be returned.
**kwargs – For other arguments, see ExportData() input keys. If they exist both places, this function will override!
- Returns:
full path to exported item.
- Return type:
String
- class dataio.dataio.InitializeCase(config, rootfolder=None, casename=None, caseuser=None, restart_from=None, description=None, verbosity='CRITICAL')[source]
Bases:
objectInstantate InitializeCase object.
In ERT this is typically ran as an hook workflow in advance.
- Parameters:
config (
dict) – A configuration dictionary. In the standard case this is read from FMU global variables (via fmuconfig). The dictionary must contain some predefined main level keys. If config is None or the env variable FMU_GLOBAL_CONFIG pointing to a file is provided, then it will attempt to parse that file instead.rootfolder (
Union[str,Path,None]) – To override the automatic and actualrootpath. Absolute path to the case root, including case name. If not provided (which is not recommended), the rootpath will be attempted parsed from the file structure or by other means.casename (
Optional[str]) – Name of case (experiment)caseuser (
Optional[str]) – Username providedrestart_from (
Optional[str]) – ID of eventual restart (deprecated)description (
Union[str,list,None]) – Description text as string or list of strings.verbosity (
str) – Is logging/message level for this module. Input as in standard python logging; e.g. “WARNING”, “INFO”.
-
meta_format:
ClassVar[str] = 'yaml'
-
config:
dict
-
rootfolder:
Union[str,Path,None] = None
-
casename:
Optional[str] = None
-
caseuser:
Optional[str] = None
-
restart_from:
Optional[str] = None
-
description:
Union[str,list,None] = None
-
verbosity:
str= 'CRITICAL'
- generate_metadata(force=False, skip_null=True, **kwargs)[source]
Generate case metadata.
- Parameters:
force (
bool) – Overwrite existing case metadata if True. Default is False. If force is False and case metadata already exists, a warning will issued and None will be returned.skip_null – Fields with None/missing values will be skipped if True (default)
**kwargs – See InitializeCase() arguments; initial will be overrided by settings here.
- Return type:
Optional[dict]- Returns:
A dictionary with case metadata or None
- generate_case_metadata(force=False, skip_null=True, **kwargs)
Generate case metadata.
- Parameters:
force (
bool) – Overwrite existing case metadata if True. Default is False. If force is False and case metadata already exists, a warning will issued and None will be returned.skip_null – Fields with None/missing values will be skipped if True (default)
**kwargs – See InitializeCase() arguments; initial will be overrided by settings here.
- Return type:
Optional[dict]- Returns:
A dictionary with case metadata or None
- export(force=False, skip_null=True, **kwargs)[source]
Export case metadata to file.
- Parameters:
force (
bool) – Overwrite existing case metadata if True. Default is False. If force is False and case metadata already exists, a warning will issued and None will be returned.skip_null – Fields with None/missing values will be skipped if True (default)
**kwargs – See InitializeCase() arguments; initial will be overrided by settings here.
- Return type:
Optional[str]- Returns:
Full path of metadata file or None
- class dataio.dataio.AggregatedData(aggregation_id=None, casepath=None, source_metadata=<factory>, name='', operation='', tagname='', verbosity='CRITICAL')[source]
Bases:
objectInstantate AggregatedData object.
- Parameters:
aggregation_id (
Optional[str]) – Give an explicit ID for the aggregation. If None, an ID will beuuids. (made based on existing realization) –
casepath (
Union[str,Path,None]) – The root folder to the case, default is None. If None, the casepath is derived from the first input metadata paths (cf.source_metadata) if possible. If given explicitly, the physical casepath folder must exist in advance, otherwise a ValueError will be raised.source_metadata (
list) – A list of individual metadata dictionarys, coming from the valid metadata per input element that forms the aggregation.operation (
str) – A string that describes the operation, e.g. “mean”. This is mandatory and there is no default.verbosity (
str) – Is logging/message level for this module. Input as in standard python logging; e.g. “WARNING”, “INFO”.tagname (
str) – Additional name, as part of file name
-
meta_format:
ClassVar[str] = 'yaml'
-
aggregation_id:
Optional[str] = None
-
casepath:
Union[str,Path,None] = None
-
source_metadata:
list
-
name:
str= ''
-
operation:
str= ''
-
tagname:
str= ''
-
verbosity:
str= 'CRITICAL'
- generate_metadata(obj, compute_md5=True, skip_null=True, **kwargs)[source]
Generate metadata for the aggregated data.
This is a quite different and much simpler operation than the ExportData() version, as here most metadata for each input element are already known. Hence, the metadata for the first element in the input list is used as template.
- Parameters:
obj (
Any) – The map, 3D grid, table, etc instance.compute_md5 (
bool) – If True, an md5 sum for the file will be created. This involves a temporary export of the data, and may be time consuming for large data.skip_null (
bool) – If True (default), None values in putput will be skipped**kwargs – See AggregatedData() arguments; initial will be overridden by settings here.
- Return type:
dict