Skip to Content

Dataset

Business usage

Dataset is the primary unit of publication and citation. It contains its own metadata (public or private) and a list of Arbogen IDs, which act as pointers to sequence records rather than embedding the sequences themselves.
Datasets can be shared across multiple Working Groups and, once stable, can obtain a DOI for citation.

Visibility:
Dataset visibility is public by design: the dataset and the Arbogen IDs it references are discoverable. However, Arbogen IDs only point to sequences: they do not include FASTA data or sensitive metadata.

  • If a sequence is public, it can be accessed or downloaded via its Arbogen ID.
  • If a sequence is private, its ID may appear in a dataset, but the sequence and metadata remain inaccessible unless explicitly shared by the owner (see Data Sharing).

Technical behavior

  • Versioning: each validated modification (addition/removal) creates a new version of the dataset, linked to its predecessor.
  • Provenance & traceability: links between versions ensure traceability.
  • Access control: public visibility includes only Arbogen IDs, creation date, and description.

Categories:

  • Mine: Datasets created by you.
  • Shared: Datasets shared with you via Working Groups.
  • Reference: Platform reference datasets.
  • All: Registry of all datasets on the platform.

Actions:

  • Share: Share a dataset to a Working Group.
  • Download:
    • Arbogen IDs (CSV)
    • Metadata (CSV)
    • FASTA sequences
    • Full export packages
  • Open permalink: Open a dataset detail page via its permalink.
  • Delete: Only the Owner can delete a dataset, and only if it does not have a DOI. Deletion removes it from all Working Groups.

Creation:

  1. Click Create Dataset.
  2. Fill in:
    • Name
    • Public Description
    • Private Description
  3. Once created, the dataset is empty. Use Manage Data to add sequences via Arbogen IDs and then, when appropriate, create a DOI.