SciOp & NWB-LinkML
Resilient Data Infrastructure in an Era of Informational Crisis
Abstract¶
Our archives are too often single points of failure that can disappear in one lapsed funding cycle. This year has shown that fragility is anything but theoretical, with foundational health, climate, and neuroscientific research vanishing from archives overnight. Traditional archives also limit our ambitions for “open data” as a living, collaborative, process - instead data is overwhelmingly used once and stockpiled, frozen in time at the “end” of an experiment. The moment demands we rearchitect our informational systems in every domain of research from the benchtop to the library.
We will present our work across two of those domains:
- as
sciop: we are responding to immediate crisis with hybrid peer-to-peer/federated archives. With thousands of participants contributing whatever they can,sciopis a different kind of archive that is rapidly eclipsing the scale and flexibility of our largest data repositories. We will show ongoing work extending and bridging the peer-to-peer BitTorrent and federated ActivityPub protocols with the arbitrary metadata of the Semantic Web to make a new kind of distributed archive without single points of failure. - as
nwb-linkml: we are redesigning neuroscientific data formats for distributed, fluid use spanning experimentation and archives. We uselinkmlto unbundle the typically vertically-integrated data standards stack, modularizing schemas, implementations, and formats, giving a clear path towards interoperability between diverse biomedical data formats. We will show work that allows NWB to be stored in flexible, lightweight metadata files with arbitrary backends for arrays, rendering NWB as a chunked, content-addressed format suitable for a new generation of metadata-enriched p2p protocols.
Together, these two pieces hint at how reimagining archives and data formats can dissolve basic problems in research data management: there is no need for “conversion” to a format when the format supports fluid schemas that can evolve along with the experiment while maintaining rigor. There is no need for a separate archival step when your data is shared from the moment it’s acquired in federated archives that span scales from labs to disciplines. There is no risk of our work disappearing when we rebuild our data infrastructures to reflect the social nature of its production, where communication and distribution is built into every stage of the research lifecycle.
At SfN¶
- Meeting planner: https://
www .abstractsonline .com /pp8 / #! /21171 /presentation /42182 - Time: Monday, November 17th, 1PM-5PM
- Location: SDCC Halls B-H, PSTR254.08 / ZZ14
Poster¶
Links¶
Sciop¶
- sciop
- sciop-cli: https://
codeberg .org /safeguarding /sciop -cli - sciop-scraping: https://
codeberg .org /safeguarding /sciop -scraping - torrent-models
- fastapi-activitypub: https://
github .com /p2p -ld /fastapi -activitypub
(note that at the time of presentation, this package is not really started, we are finalizing a few features on sciop and then will get cracking on this shortly)
NWB-LinkML¶
- nwb-linkml: https://
github .com /p2p -ld /nwb -linkml - numpydantic
Miniscope-Related¶
Standards¶
Specifications for extensions to the ActivityPub protocol and ActivityStreams vocabulary
- FEP-d8c8:
Torrenttypes - https://fediverse .codeberg .page /fep /fep /d8c8/ - FEP-1580: Moving objects - https://
fediverse .codeberg .page /fep /fep /1580/
