SciCat: Der Schlüssel zur effizienten Metadatenverwaltung in der Photon- und Neutronenforschung
The need for a metadata catalogue has become increasingly apparent, with one main task: finding data. One solution is SciCat, an open-source scientific metadata catalogue tailored to the needs of photon and neutron research, that enables the FAIR principles to be fulfilled, provides a ready path for metadata publication, and can be easily integrated with other infrastructures due to its flexibility. In 2021 DAPHNE4NFDI recommended SciCat for Photon and Neutron facilities that had no existing metadata catalogues. As a result many German facilities such as DESY, MLZ, KIT and HZDR are in advanced stages of setting up SciCat for their users.
SciCat at DESY
A scalable Kubernetes cluster was set up where SciCat is deployed in test mode collecting metadata at dedicated beamlines. It was integrated into the authentication and authorisation system using the DESY registry and combined with the Helmholtz AAI login enabling federated login. Initial experiences have been gained at the FLASH and PETRA III beamlines (DESY and Hereon), where metadata from experiments is regularly ingested. In 2023, efforts focused on software migration within SciCat, standardising scientific metadata, and understanding user needs. In 2024, the focus shifted to DOI minting and ensuring a performant, reliable service and value contact with the users. DAPHNE4NFDI funding has accelerated SciCat development, addressing urgent issues and fostering collaboration within the SciCat project and among DAPHNE4NFDI members.
SciCat at MLZ
Starting in Q2/2023 two Kubernetes clusters, a development and a production cluster, based on Red Hat OpenShift have been provisioned to orchestrate container workloads. SciCat instances are deployed using a GitOps approach that allows a reliable operation of the production catalogue besides further developments on separate catalogue instances. A loosely coupled data pipeline between MLZ instruments and the clusters has been setup. It facilitates the implementation of secondary user services, such as the MLZ SciCat ingestor. This service aggregates metadata information from the user office system GhOST about the proposal, the automatic metadata capture of instrument and sample environment parameters via the instrument control software NICOS, and sample information from the sample registration system at MLZ and populates the SciCat catalogue. A full demonstrator using single-sign-on based on keycloak (AAI), starting at the proposal in the GhOST system, using the instrument control software NICOS with virtual instruments and streaming messages to our cluster, where the MLZ SciCat ingestor processes those messages and finally populates the SciCat development instance has been implemented. In 2025, fine-tuning of aforementioned services towards a production deployment of SciCat until resuming user operation are on MLZ agenda including DOI/PID minting.
SciCat PaN Reflectometry database
DESY provides a SciCat catalogue for open data. One example is the PaN Reflectometry data published at public-data.desy.de, which compiles reflectometry data and metadata as references for high-quality datasets and AI-driven analyses. Researchers are invited to upload their data via sisyphos.desy.de, afterwards curated entries will be made available under the CC-BY license. Open instances of SciCat are also running at HZDR, .…
Sample Data catalogues
Metadata management also involves persistent identification of samples and linking related resources, which are crucial for maintaining a healthy research data environment. International Generic Sample Numbers (IGSNs), based on DOI infrastructure, improve sample data discoverability and connect resources like ELNs and data catalogues.
DAPHNE4NFDI supports efforts at several institutions to implement IGSNs as a key element of research data management workflows. Kiel University hosts a service for creating and registering IGSNs for university members, allowing researchers to record samples easily. This service has been used in FAIR publications, such as the Bio SAX use Case.