Skip to content

Advanced query

Advanced queries should allow an arbitrary combination of filters. Standard LAPIS filters take the form filter1 AND filter2 AND ... filterN but they cannot query for more custom cases such as filter1 OR (filter2 AND NOT filter3), the advanced queries feature allows such combinations and the creation of more custom queries.

The formal specification of the query language is available here as an ANTLR v4 grammar. In following, we provide an informal description and examples. The respective unit test provides a full list of possible atomic queries.

Advanced queries are case-insensitive. All operators and field names can be written in upper or lower case. Only the values of metadata fields are case-sensitive.

We support mutation and insertion queries for both nucleotide and amino acid sequences, see the mutation filter page for more details. Note the addition of the MAYBE operator to query ambiguous nucleotide symbols.

Standard metadata queries take the form metadataField=query, for example

country=Ghana

Note that if the metadata field does not only contain letters and numbers it must be enclosed in single quotes, for example

country='United States of America'

To search for empty fields (fields that are null) use the IsNull operator:

IsNull(host)

For dates and numbers (int or float) we allow queries for ranges, using the >= and <= operators, for example:

date>=2021-01-01
date<=2021-12-31

For string fields we also allow regex search. To use the regex substring search on a metadata string field you must append .regex to the end of the metadata field name and enclose the query in single quotes:

host.regex='.*bos.*'

For regex searches the advanced queries use the google/re2 regex syntax.

Boolean fields can be queried using true and false (again case-insensitive) values, for example:

isLabHost=true
isLabHost=false

The query language understands Boolean logic. Expressions can be connected with & (and), | (or) and ! (not). Both & and AND are recognized as and, | and OR are recognized as or, and ! and NOT are recognized as not. Parentheses ( and ) can be used to define the order of the operations.

We also add a custom syntax N-of and exactly-N-of to match sequences for which at least or exactly N out of a list of expressions are fulfilled. The syntax is as follows, where expr1, …, expr5 are any valid expressions:

[3-of: expr1, expr2, expr3, expr4, expr5]
[exactly-2-of: expr1, expr2, expr3, expr4]

Variant queries can be sent in GET and POST requests just like standard filters.

Some example queries that can be plugged into the advancedQuery parameter:

  • Get the sequences with the nucleotide mutation 300G, without a deletion at position 400 and either the AA change S:123T or the AA change S:234A:

    300G & !400- & (S:123T | S:234A)

    This can also be written as

    300G AND NOT 400- AND (S:123T OR S:234A)
  • Get all sequences from the USA that do not have cows as a host and that also have the mutation 300G:

    NOT host='bos taurus' AND 300G AND country=USA
  • Get the sequences with at least 3 out of five mutations/deletions:

    [3-of: 123A, 234T, S:345-, ORF1a:456K, ORF7A:100-]
  • Get the sequences that fulfill exactly 2 out of 4 conditions:

    [exactly-2-of: 123A & 234T, !234T, S:345- | S:346-, [2-of: 222T, 333G, 444A, 555C]]