Sequence

Business usage

A sequence is the core biological record managed in Arbogen.
It represents genetic information that can be imported, validated, enriched with metadata, and associated with a dataset.
Sequences are the building blocks of research projects and publications, and can be shared within Working Groups for collaborative analysis.

To contribute new sequences to the platform, users must import them through the dedicated workflow described in the Import Process Description.
Sequences cannot be added manually: all contributions must follow this standardized pipeline to ensure validation, quality control, and traceability.

Once deposited, sequences can be:

shared within Working Groups, enabling collaborative review and curation;
referenced inside datasets, through their Arbogen ID only (see Dataset).

Important: datasets include only the identifiers of sequences.
They do not embed the genetic material itself nor the associated metadata.
Access to a sequence depends strictly on its sharing status and the rights of the user viewing the dataset (see Data Sharing).

Technical behavior

Validation: automatic checks (format, duplicates, taxonomy, quality) and complementary analyses (serotyping, genotyping) before acceptance.
Provenance & traceability: each sequence links to its import files, processing logs, and sharing status.
Access control: visibility depends on its sharing status; basic identifiers (Sample ID, Infection Loc Admin0, Sampling Date, Serotype, Genotype, Lineage, Virus Species) remain accessible to all.
Editing metadata after import
Once a sequence has been successfully imported, some of its associated metadata can be corrected directly within the platform.
To do this, open the Data Grid in the Explore Data section and click the Pencil icon (edit) on the line corresponding to your sequence.
This allows you to update editable metadata fields and re-run validation before saving the changes.
See the detailed interface description in the Data Grid & Table Mode section.
Nomenclature: the genetic code and its metadata share the same status and are treated as one entity.
Referencing in datasets: datasets store only Arbogen IDs, not the sequence or metadata themselves (see Dataset).
Automatic external publication: any private sequence deposited on the platform is automatically published to GenBank after 2 years unless the depositor has already made it public or deleted it (when allowed).
Users may also force immediate GenBank publication from the interface if they wish to make their sequence available earlier.