Task Area 1
Summary
Managing data production is becoming an increasing challenge, in particular in the neutron and x-ray science community which has seen a rapid increase in the quality and quantity of data. The aim of TA1 is to develop tools and make recommendations for best practices to capture complete data, metadata and experiment-relevant information. This is the basis for FAIR data. Achieving this entails having experiment workflows and sample information recorded in electronic laboratory notebooks (ELN), saving raw and processed data in standardized formats e.g. NeXus and providing links to metadata catalogues such as SciCat. All these measures facilitate a comprehensive record of experimental information, improved reproducibility and adherence to FAIR principles.
Quality metadata is a requirement to find data
Metadata from the experiment, data storage and subsequent data processing is at the core of FAIR science. DAPHNE4NFDI formulated recommendations for metadata capture for the PaN community <https://doi.org/10.5281/zenodo.12169109> for experiments at large-scale facilities. This includes considerations for
-
Administrative, experimental and sample metadata
-
Provenance of research data
-
Curation, archiving and licencing for access
The presented metadata standards are built upon the output of PaNOSC and EXPaNDS <https://doi.org/10.5281/zenodo.6821676> and follow extensive discussions within DAPHNE4NFDI and the Use Case communities.
Electronic Laboratory Notebooks document experiment workflows
Electronic Laboratory Notebooks are a building block of FAIR research data management, by documenting the sample, experimental details, data collection and analysis process. DAPHNE4NFDI mapped out and published detailed specifications for use of ELNs for photon and neutron research <https://doi.org/10.1080/08940886.2024.2432265>, including
-
Access management
-
User experience
-
Automated ingestion
Electronic Laboratorey Notebooks are the key to understanding experiments
ELNs are the entrance point to an experiment for any scientist keen to understand and reproduce an experiment. DAPHNE4NFDI aims to develop ELNs promoting good scientific practice and enabling FAIR research while being user friendly and meeting legal and ethical standards. Picking up existing ideas and developments, DAPHNE4NFDI brings promising solutions to a mature state. Facility-wide deployment necessitates the integration of the developed software into the research data management infrastructure. Currently the following ELN solutions are worked on within DAPHNE4NFDI,
-
MLZ ELN <https://forge.frm2.tum.de/review/plugins/gitiles/mlz/eln/>
-
Snip <https://snip.roentgen.physik.uni-goettingen.de/frontpage>
-
MyLog <>
-
SciLog <https://github.com/paulscherrerinstitute/scilog>
-
Mediawiki <https://www.hzdr.de/db/Cms?pOid=67705&pNid=0>
Metadata capture
To ensure complete and comprehensive metadata capture at the time of the experiment, DAPHNE4NFDI is developing and promoting tools to facilitate metadata aggregation. This includes
-
source and instrument settings,
-
sample and sample environment information,
-
experiment and experimentalist data (user office),
-
manual inputs.
These will be ingested to a metadata aggregator according to the metadata schema and the aggregator can distribute subsets automatically to various consumers, such as:
-
a metadata catalogue
-
analysis pipelines,
-
file writers
-
automated ELN entries.
Such tools are very much integrated with the operations of the large scale facilities, and DAPHNE4NFDI promotes common standards and solutions where possible.
By separating the metadata capture process into small and independent tasks which can be reused in different settings and at different facilities, DAPHNE4NFDI delivers individual packages (e.g. SciCat <https://www.scicatproject.org/documentation/> or ELNs) to the community.
PLATZHALTER DIAGRAMM
|
Standardized file formats are crucial for interoperability
The ultimate output produced by photon and neutron experiments is data. This is typically stored in files. Standardizing the formats for data and metadata files has benefits for both scientists and facilities.
DAPHNE4NFDI members are actively involved in developing tools to convert data into preferred common formats, such as NeXus/HDF5 <https://www.nexusformat.org/> and OpenPMD <https://www.openpmd.org/>. Adoption of such formats is accelerating as the large scale facilities align their outputs along common standards.
For the scientific community, standards result in directly comparable data, such that files are transferable between different facilities and can easily be compared and analyzed together. Additionally, the variety of software requirements to access the data is reduced, libraries are reusable, and users are already familiar with the programs.
The facilities benefit from suitable file compression built into the format,and features like multiple reader access can speed up processing. DAPHNE4NFDI encourages wide adoption of common formats across communities to drive sustainable development.
All stakeholders benefit from the ability to develop schemas that map directly into the file structure.
Persistent Sample Identifiers: The Birth Certificate for Samples
TA1 has made significant advancements in the implementation and use of Persistent Identifiers (PIDs) for samples to enhance data management in photon and neutron science. Sample PIDs, e.g. International Generic Sample Numbers (IGSN) <https://doi.org/10.60578/ceuz-rq0x>, ensure unique and persistent identification for physical samples. By linking these identifiers to experimental data, metadata catalog entries, and derived publications, DAPHNE4NFDI promotes improved data Findability, Reuse, and Repeatability of results for PaN research <link to TA2>. Key initiatives in this area include:
-
Incorporation of PIDs in metadata standards and in establishing workflows
-
Fostering dedicated sample database systems
-
Unified sample description schemes