Dataset
Business usage
Dataset is the primary unit of publication and citation. It contains its own metadata (public or private) and a list of Arbogen IDs, which act as pointers to sequence records rather than embedding the sequences themselves.
Datasets can be shared across multiple Working Groups and, once stable, can obtain a DOI for citation.
Visibility:
Dataset visibility is public by design: the dataset and the Arbogen IDs it references are discoverable. However, Arbogen IDs only point to sequences: they do not include FASTA data or sensitive metadata.
- If a sequence is public, it can be accessed or downloaded via its Arbogen ID.
- If a sequence is private, its ID may appear in a dataset, but the sequence and metadata remain inaccessible unless explicitly shared by the owner (see Data Sharing).
Technical behavior
- Versioning: each validated modification (addition/removal) creates a new version of the dataset, linked to its predecessor.
- Provenance & traceability: links between versions ensure traceability.
- Access control: public visibility includes only Arbogen IDs, creation date, and description.
Categories:
- Mine: Datasets created by you.
- Shared: Datasets shared with you via Working Groups.
- Reference: Platform reference datasets.
- All: Registry of all datasets on the platform.
Actions:
- Share: Share a dataset to a Working Group.
- Download:
- Arbogen IDs (CSV)
- Metadata (CSV)
- FASTA sequences
- Full export packages
- Open permalink: Open a dataset detail page via its permalink.
- Delete: Only the Owner can delete a dataset, and only if it does not have a DOI. Deletion removes it from all Working Groups.
Creation:
- Click Create Dataset.
- Fill in:
- Name
- Public Description
- Private Description
- Once created, the dataset is empty. Use Manage Data to add sequences via Arbogen IDs and then, when appropriate, create a DOI.