Skip to article frontmatterSkip to article content

SciOp & NWB-LinkML

Resilient Data Infrastructure in an Era of Informational Crisis

Authors
Affiliations
University of California, Los Angeles
Safeguarding Research and Culture
Institute of Pirate Technology
Safeguarding Research and Culture
University of California, Los Angeles
University of California, Los Angeles
Safeguarding Research and Culture
Safeguarding Research and Culture
Scientific Data Division, Lawrence Berkeley National Laboratory
Safeguarding Research and Culture
The University of Tokyo
Humboldt-Universität zu Berlin
Safeguarding Research and Culture
European Molecular Biology Laboratory (EMBL)
University of Southampton
Safeguarding Research and Culture
University of California, Los Angeles

Abstract

Our archives are too often single points of failure that can disappear in one lapsed funding cycle. This year has shown that fragility is anything but theoretical, with foundational health, climate, and neuroscientific research vanishing from archives overnight. Traditional archives also limit our ambitions for “open data” as a living, collaborative, process - instead data is overwhelmingly used once and stockpiled, frozen in time at the “end” of an experiment. The moment demands we rearchitect our informational systems in every domain of research from the benchtop to the library.

We will present our work across two of those domains:

  1. as sciop: we are responding to immediate crisis with hybrid peer-to-peer/federated archives. With thousands of participants contributing whatever they can, sciop is a different kind of archive that is rapidly eclipsing the scale and flexibility of our largest data repositories. We will show ongoing work extending and bridging the peer-to-peer BitTorrent and federated ActivityPub protocols with the arbitrary metadata of the Semantic Web to make a new kind of distributed archive without single points of failure.
  2. as nwb-linkml: we are redesigning neuroscientific data formats for distributed, fluid use spanning experimentation and archives. We use linkml to unbundle the typically vertically-integrated data standards stack, modularizing schemas, implementations, and formats, giving a clear path towards interoperability between diverse biomedical data formats. We will show work that allows NWB to be stored in flexible, lightweight metadata files with arbitrary backends for arrays, rendering NWB as a chunked, content-addressed format suitable for a new generation of metadata-enriched p2p protocols.

Together, these two pieces hint at how reimagining archives and data formats can dissolve basic problems in research data management: there is no need for “conversion” to a format when the format supports fluid schemas that can evolve along with the experiment while maintaining rigor. There is no need for a separate archival step when your data is shared from the moment it’s acquired in federated archives that span scales from labs to disciplines. There is no risk of our work disappearing when we rebuild our data infrastructures to reflect the social nature of its production, where communication and distribution is built into every stage of the research lifecycle.

At SfN

Poster

Alt Text Forthcoming, the sciop/nwb/mio poster

Sciop

NWB-LinkML

Standards

Specifications for extensions to the ActivityPub protocol and ActivityStreams vocabulary