Database Configuration
LAPIS and SILO need a database_config.yaml
.
It’s main purpose is to define the database schema for the sequence metadata.
See the tutorial for an example,
or use our config generator to generate your own config.
More examples can be found in our tests.
The database config is considered static configuration that doesn’t change with data updates. This page contains the technical specification of the database config.
The Schema Object
The database_config.yaml
must contain a schema
object on top level.
It permits the following fields:
Key | Type | Required | Description |
---|---|---|---|
instanceName | string | true | The name assigned to the instance. Only used for diplay purposes. |
metadata | array | true | A list of metadata objects that is available on the underlying sequence data. |
opennessLevel | enum | true | Possible values: OPEN . To be extended in the future. |
primaryKey | string | true | The field that serves as the primary key in SILO for the data. |
dateToSortBy | string | false | The field used to sort the data by date. Queries on this column will be faster. |
partitionBy | string | false | The field used to partition the data. Used by SILO for overall query optimization. |
features | array | false | A list of feature objects. |
The Metadata Object
The metadata object permits the following fields:
Key | Type | Required | Description |
---|---|---|---|
name | string | true | The name of the metadata field. |
type | enum | true | The type of the metadata. |
generateIndex | boolean | false | See Generating an index below |
lapisAllowsRegexSearch | boolean | false | If true, LAPIS will autogenerate a filter ${name}.regex . See String search. |
Metadata Types
SILO currently supports the following metadata types:string
int
float
-
pango_lineage
: Systematic classification of lineage with inheritance structure that can be computed for some pathogens. date
: Values must be valid dates in the formYYYY-MM-DD
.-
insertion
: A comma separated list of nucleotide insertions. Each insertion has the form<segment>:<position>:<symbols>
. Example value:segment1:123:CCG,segment2:501:AAAGGG
. If there is only one segment, the segment name can be omitted:123:CCG,501:AAAGGG
. -
aaInsertion
: A comma separated list of amino acid insertions. Each insertion has the form<gene>:<position>:<symbols>
. Example value:S:123:CCG,ORF1A:501:AAAGGG
.
Generating an Index
Columns of type string
support generating an index.
For columns of type pango_lineage
, an index is always generated.
SILO internally stores precomputed bitmaps for those columns so that a query on that column becomes a trivial lookup.
Features
The feature object permits the following fields:
Key | Type | Required | Description |
---|---|---|---|
name | string | true | The name of the feature. |
Currently, there is only one available feature: sarsCoV2VariantQuery
.
This enables a specialized query language for SARS-CoV-2 instances.