The FMU results data model¶
This section describes the data model used for FMU results when exporting with fmu-dataio. For the time being, the data model is hosted as part of fmu-dataio.
The data model described herein is new and shiny, and experimental in many aspects. Any feedback on this is greatly appreciated. The most effective feedback is to apply the data model, then use the resulting metadata.
The FMU data model is described using a Pydantic model which programmatically generates a JSON Schema.
This schema contains rules and definitions for all attributes in the data model. This means, in practice, that outgoing metadata from FMU needs to comply with the schema. If data is uploaded to e.g. Sumo, validation will be done on the incoming data to ensure consistency.
About the data model¶
Why is it made?¶
FMU is a mighty system developed by and for the subsurface community in Equinor, to make reservoir modeling more efficient, less error-prone and more repeatable with higher quality, mainly through automation of cross-disciplinary workflows. It combines off-the-shelf software with in-house components such as the ERT orchestrator.
FMU is defined more and more by the data it produces, and direct and indirect dependencies on output from FMU is increasing. When FMU results started to be regularly transferred to cloud storage for direct consumption from 2017/2018 and outwards, the need for stable metadata on outgoing data became immiment. Local development on Johan Sverdrup was initiated to cater for the digital ecosystem evolving in and around that particular project, and the need for generalizing became apparent with the development of Sumo, Webviz and other initiatives.
The purpose of the data model is to cater for the existing dependencies, as well as enable more direct usage of FMU results in different contexts. The secondary objective of this data model is to create a normalization layer between the components that create data and the components that use those data. The data model is designed to also be adapted to other sources of data than FMU.
Scope of this data model¶
This data model covers data produced by FMU workflows. This includes data generated by direct runs of model templates, data produced by pre-processing workflows, data produced in individual realizations or hooked workflows, and data produced by post-processing workflows.
Note
An example of a pre-processing workflow is a set of jobs modifying selected input data for later use in the FMU workflows and/or for comparison with other results in a QC context.
Note
An example of a post-processing workflow is a script that aggregates results across many realizations and/or iterations of an FMU case.
This data model covers data that, in the FMU context, can be linked to a specific case.
Note that e.g. ERT and other components will, and should, have their own data models to cater for their needs. It is not the intention of this data model to cover all aspects of data in the FMU context. The scope is primarily data going out of FMU to be used elsewhere.
A denormalized data model¶
The data model used for FMU results is a denormalized data model, at least to a certain point. This means that the static data will be repeated many times. Example: Each exported data object contains basic information about the FMU case it belongs to, such as a unique ID for this case, its name, the user that made it, which model template was used, etc. This information if stored in every exported .yml file. This may seem counterintuitive, and differs from a relational database (where this information would typically be stored once, and referred to when needed).
There are a few reasons for choosing a denormalized data model:
First, the components for creating a relational database containing these data is not and would be extremely difficult to implement fast. Also, the nature of data in an FMU context is very distributed, with lots of files spread across many files and folders (currently).
Second, a denormalized data model enables us to utilize search engine technologies for for indexing. This is not efficient for a normalized data model. The penalty for duplicating metadata across many individual files is returned in speed and ease-of-use.
Note
The data model is only denormalized to a certain point. Most likely, it is better described as a hybrid. Example: The concept of a case is used in FMU context. In the outgoing metadata for FMU results, some information about the current case is included. However, details about the case is out of scope. For this, a consumer would have to refer to the owner of the case definition. In FMU contexts, this will be the workflow manager (ERT).
Standardized vs anarchy¶
Creating a data model for FMU results brings with it some standard. In essence, this represents the next evolution of the existing FMU standard. We haven’t called it “FMU standard 2.0” because although this would ressonate with many people, many would find it revolting. But, sure, if you are so inclined you are allowed to think of it this way. The FMU standard 1.0 is centric around folder structure and file names - a pre-requisite for standardizing for the good old days when files where files, folders were folders, and data could be consumed by double-clicking. Or, by traversing the mounted file system.
With the transition to a cloud-native state comes numerous opportunities - but also great responsibilities. Some of them are visible in the data model, and the data model is in itself a testament to the most important of them: We need to get our data straight.
There are many challenges. Aligning with everyone and everything is one. We probably don’t succeed with that in the first iteration(s). Materializing metadata effectively, and without hassle, during FMU runs (meaning that everything must be fully automated is another. This is what fmu-dataio solves. But, finding the balance between retaining flexibility and enforcing a standard is perhaps the most tricky of all.
This data model has been designed with the great flexibility of FMU in mind. If you are a geologist on an asset using FMU for something important, you need to be able to export any data from your workflow and use that data without having to wait for someone else to rebuild something. For FMU, one glove certainly does not fit all, and this has been taken into account. While the data model and the associated validation will set some requirements that you need to follow, you are still free to do more or less what you want.
We do, however, STRONGLY ENCOURAGE you to not invent too many private wheels. The risk is that your data cannot be used by others.
The materialized metadata has a nested structure which can be represented by Python dictionaries, yaml or json formats. The root level only contains key attributes, where most are nested sub-dictionaries.
Relations to other data models¶
The data model for FMU results is designed with generalization in mind. While in practice this data model cover data produced by, or in direct relations to, an FMU workflow - in theory it relates more to subsurface predictive modeling generally, than FMU specifically.
In Equinor, FMU is the primary system for creating, maintaining and using 3D predictive numerical models for the subsurface. Therefore, FMU is the main use case for this data model.
There are plenty of other data models in play in the complex world of subsurface predictive modeling. Each software applies its own data model, and in FMU this encompasses multiple different systems.
Similarly, there are other data models in the larger scope where FMU workflows represent one out of many providors/consumers of data. A significant motivation for defining this data model is to ensure consistency towards other systems and enable stable conditions for integration.
fmu-dataio has three important roles in this context:
Be a translating layer between individual softwares’ data models and the FMU results data model.
Enable fully-automated materialization of metadata during FMU runs (hundreds of thousands of files being made)
Abstract the FMU results data model through Python methods and functions, allowing them to be embedded into other systems - helping maintain a centralized definition of this data model.
The parent/child principle¶
In the FMU results data model, the traditional hierarchy of an FMU setup is not continued. An individual file produced by an FMU workflow and exported to disk can be seen in relations to a hiearchy looking something like this: case > iteration > realization > file
Many reading this will instinctively disagree with this definition, and significant confusion arises from trying to have meaningful discussions around this. There is no unified definition of this hierarchy (despite many claiming to have such a definition).
In the FMU results data model, this hiearchy is flattened down to two levels: The Parent (case) and children to that parent (files). From this, it follows that the most fundamental definition in this context is a case. To a large degree, this definition belongs to the ERT workflow manager in the FMU context. For now, however, the case definitions are extracted by-proxy from the file structure and from arguments passed to fmu-dataio.
Significant confusion can also arise from discussing the definition of a case, and the validity of this hiearchy, of course. But consensus (albeit probably local minima) is that this serves the needs.
Each file produced in relations to an FMU case (meaning before, during or after) is tagged with information about the case - signalling that this entity belongs to this case. It is not the intention of the FMU results data model to maintain all information about a case, and in the future it is expected that ERT will serve case information beyond the basics.
Note
Dot-annotation - we like it and use it. This is what it means:
The metadata structure is a dictionary-like structure, e.g.
{
"myfirstkey": {
"mykey": "myvalue",
"anotherkey": "anothervalue"
}
}
Annotating tracks along a dictionary can be tricky. With dot-annotation, we can
refer to mykey in the example above as myfirstkey.mykey. This will be a
pointer to myvalue in this case. You will see dot annotation in the
explanations of the various metadata blocks below: Now you know what it means!
Weaknesses¶
uniqueness
The data model currently has challenges wrt ensuring uniqueness. Uniqueness is a challenge in this context, as a centralized data model cannot (and should not!) dictate in detail nor define in detail which data an FMU user should be able to export from local workflows.
understanding validation errors
When validating against the current schema, understanding the reasons for non-validation can be tricky. The root cause of this is the use of conditional logic in the schemas - a functionality JSON Schema is not designed for. See Logical rules below.
Logical rules¶
The schema contains some logical rules which are applied during validation. These are rules of type “if this, then that”. They are, however, not explicitly written (nor readable) as such directly. This type of logic is implemented in the schema by explicitly generating subschemas that A) are only valid for specific conditions, and B) contain requirements for that specific situation. In this manner, one can assure that if a specific condition is met, the associated requirements for that condition is used.
Example:
"oneOf": [
{
"$comment": "Conditional schema A - 'if class == case make myproperty required'",
"required": [
"myproperty"
],
"properties": {
"class": {
"enum": ["case"]
},
"myproperty": {
"type": "string",
"example": "sometext"
}
}
},
{
"$comment": "Conditional schema B - 'if class != case do NOT make myproperty required'",
"properties": {
"myproperty": {
"type": "string",
"example": "sometext"
}
}
}
]
For metadata describing a case, requirements are different compared to
metadata describing data objects.
For selected contents, a content-specific block under data is required. This
is implemented for fluid_contact, field_outline and seismic.
Validation of data¶
When fmu-dataio exports data from FMU workflows, it produces a pair of data + metadata. The two are
considered one entity. Data consumers who wish to validate the correct match of data and metadata can
do so by verifying recreation of file.checksum_md5 on the data object only. Metadata is not considered
when generating the checksum.
This checksum is the string representation of the hash created using RSA’s MD5 algorithm. This hash
was created from the file that fmu-dataio exported. In most cases, this is the same file that are
provided to consumer. However, there are some exceptions:
Seismic data may be transformed to other formats when stored out of FMU context and the checksum may be invalid.
Changes and revisions¶
The only constant is change, as we know, and in the case of the FMU results data model - definitely so. The learning component here is huge, and there will be iterations. This poses a challenge, given that there are existing dependencies on top of this data model already, and more are arriving.
To handle this, two important concepts has been introduced.
Versioning. The current version of the FMU metadata is 0.22.0.
Contractual attributes. Within the FMU ecosystem, we need to retain the ability to do rapid changes to the data model. As we are in early days, unknowns will become knowns and unknown unknowns will become known unknowns. However, from the outside perspective some stability is required. Therefore, we have labelled some key attributes as contractual. They are listed at the top of the schema. This is not to say that they will never change - but they should not change erratically, and when we need to change them, this needs to be subject to alignment.
Schema version changelog¶
0.22.0¶
Added ‘observations_breakthrough’ standard result for Ert breakthrough observations.
Added index columns for lift curves table.
Added
fmu.ensemble.descriptionan optional field for free-text description of ensemble.
0.21.0¶
Added
observationsas new content typeAdded ‘observations_rft’ standard result for Ert rft observations.
Added ‘observations_summary’ standard result for Ert summary observations.
0.20.0¶
Added constraint to disallow empty list for these fields in
masterdata.smda:field
country
Added new standard result for
grid_model_staticAdded list of known property attributes
Added ‘source’ field in tracklog events
Added optional ‘value_statistics’ to ‘data.spec’ for grid properties and surfaces.
Removed unused ‘stratigraphic_alias’ from global configuration stratigraphy
0.19.0¶
Added new standard result for
GridExtractedDepthSurfaceStandardResultAdded ‘mapping’ as a new content type
Added ‘codenames’ to ‘data.spec’ for discrete grid properties.
Added standard results for the following simulator tables:
Lift curves
Production network
Pvt
Relperm
Rft
Timeseries
Well completions
Added standard result ‘simulator_fipregions_mapping’.
0.18.0¶
Added dedicated link to SMDA stratigraphic column
Added ‘parameters’ standard result for Ert parameters.
Added ‘ert.ensemble’
Made ‘fmu.ensemble.id’ optional
0.17.0¶
Removed ‘faultroom_triangulated’ as a possible option in ‘data.Layout’, ‘triangulated’ should be used instead
0.16.1¶
New ert simulation mode ‘manual_enif_update’ added to ErtSimulationMode enum.
0.16.0¶
‘well_completions’ is added as a new content type
‘production_network’ is added as a new content type
0.15.1¶
‘data.fluid_contact.contact’ is added to $contractual
0.15.0¶
‘ObjectMetadataClass.triangulated_surface’ is removed, replaced by ‘ObjectMetadataClass.surface’
‘Layout.triangulated_surface’ is renamed to ‘Layout.triangulated’
0.14.0¶
Correct example name for realization class
Add new standard result type
StructureDepthFaultSurfaceStandardResult
0.13.0¶
Content ‘fault_triangulated_surface’ renamed to ‘fault_surface’
0.12.0¶
fmu.ert.simulation_modenow supportsensemble_information_filter
0.11.0¶
data.standard_resultnow supportsFluidContactSurfaceStandardResultfmu.entity.uuidadded as optional fieldfile.runpath_relative_pathadded as optional fieldfmu.ert.experiment.idis added as contractual fieldimproved validation of grid numbering
improved validation of grid increments
fmu.ert.simulation_modeno longer supportsiterative_ensemble_smootheradded
TSurfto list of supported file formatsdata.standard_resultnow supportsStructureTimeSurfaceStandardResult
0.10.0¶
triangulated_surfaceadded as a new object classEnsembleobjects withclass=ensembleis now supported, and will in the future replaceIterationobjectsfmu.context.stagenow supports optionensemble$contractual.fmu.ensemble.uuidand$contractual.fmu.ensemble.nameaddedfmu.ensembleadded as duplicate and future replacement offmu.iterationdata.propertyadded as optional field for data of contentpropertydata.property.attributeadded as optional field.data.property.is_discreteadded as optional field.data.standard_resultnow supportsStructureDepthIsochoreStandardResultdata.standard_resultnow supportsStructureDepthFaultLinesStandardResultdata.spec.columnsadded as optional field for points, polygonsdata.spec.num_columnsadded as optional field for points, polygonsdata.spec.num_rowsadded as optional field for points, polygonsdata.spec.sizeadded as optional field for polygons
0.9.0¶
This is the first versioned update to the schema and contains numerous changes.
$contractual.stratigraphic_aliashas been removed. It was never used.data.product: renamed todata.standard_resultdata.spec.nrowmust be greater or equal to 0 for cubes, surfacesdata.spec.ncolmust be greater or equal to 0 for cubes, surfacesdata.spec.nlaymust be greater or equal to 0 for cubesdata.spec.xincmust be greater or equal to 0 for cubes, surfacesdata.spec.yincmust be greater or equal to 0 for cubes, surfacesdata.spec.zincmust be greater or equal to 0 for cubesdata.spec.npolysmust be greater or equal to 0 for polygonsdata.spec.num_columnsis no longer optional and must be greater or equal to 0 for tablesdata.spec.num_rowsis no longer optional and must be greater or equal to 0 for tablesdata.spec.sizemust be greater or equal 0 for tables, pointsdata.time.t0is no longer optionaldata.time.t0.valueis no longer optionaldata.time.t1.valueis no longer optional (data.time.t1remains optional)data.stratigraphic_aliashas been removedfile.absolute_path_symlinkhas been removedfile.relative_path_symlinkhas been removedfmu.aggregation.parametershas been removedfmu.ert.experimentis no longer optionalfmu.ert.experiment.idis no longer optionalfmu.ert.simulation_modeis no longer optionalfmu.iteration.idis no longer optional and must be greater or equal to 0fmu.realization.idmust be greater or equal to 0fmu.realization.parametershas been removedfmu.realization.jobshas been removed
0.8.0¶
This is the initial schema version.
Contractual attributes¶
The following attributes are contractual:
accessclassdata.aliasdata.bboxdata.contentdata.fluid_contact.contactdata.formatdata.geometrydata.grid_modeldata.is_observationdata.is_predictiondata.namedata.offsetdata.seismic.attributedata.spec.columnsdata.standard_result.namedata.stratigraphicdata.tagnamedata.timedata.vertical_domainfile.checksum_md5file.relative_pathfile.size_bytesfmu.aggregation.operationfmu.aggregation.realization_idsfmu.casefmu.context.stagefmu.entity.uuidfmu.ensemble.namefmu.ensemble.uuidfmu.ert.experiment.idfmu.iteration.namefmu.iteration.uuidfmu.modelfmu.realization.idfmu.realization.is_referencefmu.realization.namefmu.realization.uuidfmu.workflowmasterdatasourcetracklog.datetimetracklog.eventtracklog.user.idversion
Metadata example¶
Expand below to see a full example of valid metadata for surface exported from FMU.
You will find more examples in fmu-dataio github repository.
FAQ¶
We won’t claim that these questions are really very frequently asked, but these are some key questions you may have along the way.
My existing FMU workflow does not produce any metadata. Now I am told that it has to. What do I do?
First step: Start using fmu-dataio in your workflow. You will get a lot for free using it, amongst other things, metadata will start to appear from your workflow. To get started with fmu-dataio, see the overview section.
This data model is not what I would have chosen. How can I change it?
The FMU community (almost always) builds what the FMU community wants. The first step would be to define what you are unhappy with, preferably formulated as an issue in the fmu-dataio github repository.
This data model allows me to create a smashing data visualisation component, but I fear that it is so immature that it will not be stable - will it change all the time?
Yes, and no. It is definitely experimental and these are early days. Therefore, changes will occur as learning is happening. Part of that learning comes from development of components utilizing the data model, so your feedback may contribute to evolving this data model. However, you should not expact erratic changes. The concept of Contractual attributes are introduced for this exact purpose. We have also chosen to version the metadata - partly to clearly separate from previous versions, but also for allowing smooth evolution going forward. We don’t yet know exactly how this will be done in practice, but perhaps you will tell us!