Overview¶
The essential application logic, helper code, maintenance utilities, and sample
data needed to define and maintain a flexible metadata expression and
intregration framework is encapsulated in the DataCatalog
package. It is
built for use in both interactive and automated manners by data
consumers data producers alike.
Key Features¶
The Data Catalog design is based on experience with dozens of projects that need to integrate and manage large, measured data and complex analytical processes. Its data model includes:
- Topics, designs, experiments, samples, and measurement artifacts
- Reference data files
- Analysis and ETL pipelines and jobs
- Linkages with external resources
- Complex types like temperature, liquid media component, and time point
- Sophistcated parentage and derivation relationships
- Multiple data processing levels
In addition, it is designed to be easy to extend and maintain, with declarative representations of system behavior, a strong model of document history and change tracking, delegated or deferred edit authority, and detailed knowledge of record-level attribution and ownership. This is accomplished by:
- A data model that is defined and extended using only JSON schema
- Document that mantain creation, update, and revision history
- Support for document- and role-level update authorizations
- Logical isolatation of data across tenants, projects, and users
Use Cases¶
- Data Catalog code and services are used for many purposes:
- Transform and load lab-provided metadata traces into a project Data Catalog
- Capture and verify fixity for raw and processed data products
- Describe and manage ETL and processing pipelines and jobs
- Support aggregate reporting and integrity checking
- Enable data discovery amd exploration
Where to Find It¶
The datacatalog
package is installed in the sd2e/python3
and
sd2e/reactors:python3-edge
Docker images. It will soon be available by
default inside the Jupyter notebooks enviroment. You can also install it
on your own local system or embed it in projects.