Advanced query
Advanced queries should allow an arbitrary combination of filters. Standard LAPIS filters take the form
filter1 AND filter2 AND ... filterN but they cannot query for more custom cases such as filter1 OR (filter2 AND NOT filter3),
the advanced queries feature allows such combinations and the creation of more custom queries.
The formal specification of the query language is available here as an ANTLR v4 grammar. In following, we provide an informal description and examples. The respective unit test provides a full list of possible atomic queries.
Features
Section titled “Features”Advanced queries are case-insensitive. All operators and field names can be written in upper or lower case. Only the values of metadata fields are case-sensitive.
Variant Queries
Section titled “Variant Queries”We support mutation and insertion queries for both nucleotide and amino acid sequences, see the mutation filter
page for more details. Note the addition of the MAYBE operator to query ambiguous nucleotide symbols.
Metadata Queries
Section titled “Metadata Queries”Standard metadata queries take the form metadataField=query, for example
country=GhanaNote that if the metadata field does not only contain letters and numbers it must be enclosed in single quotes, for example
country='United States of America'To search for empty fields (fields that are null) use the IsNull operator:
IsNull(host)For dates and numbers (int or float) we allow queries for ranges, using the >= and <= operators, for example:
date>=2021-01-01date<=2021-12-31For string fields we also allow regex search. To use the regex substring search on a metadata string field you must append .regex to the end of the metadata field name
and enclose the query in single quotes:
host.regex='.*bos.*'For regex searches the advanced queries use the google/re2 regex syntax.
Boolean fields can be queried using true and false (again case-insensitive) values, for example:
isLabHost=trueisLabHost=falseBoolean operators
Section titled “Boolean operators”The query language understands Boolean logic. Expressions can be connected with & (and), | (or) and ! (not).
Both & and AND are recognized as and, | and OR are recognized as or, and ! and NOT are recognized as not.
Parentheses ( and ) can be used to define the order of the operations.
We also add a custom syntax N-of and exactly-N-of to match sequences for which at least or exactly N out of a list of expressions are fulfilled.
The syntax is as follows, where expr1, …, expr5 are any valid expressions:
[3-of: expr1, expr2, expr3, expr4, expr5][exactly-2-of: expr1, expr2, expr3, expr4]Examples
Section titled “Examples”Variant queries can be sent in GET and POST requests just like standard filters.
-
POST:
Terminal window curl -X POST "https://lapis.cov-spectrum.org/open/v2/sample/aggregated" \-H "Content-Type: application/json" \-d '{"advancedQuery": "501T and country=Switzerland"}'
Some example queries that can be plugged into the advancedQuery parameter:
-
Get the sequences with the nucleotide mutation 300G, without a deletion at position 400 and either the AA change S:123T or the AA change S:234A:
300G & !400- & (S:123T | S:234A)This can also be written as
300G AND NOT 400- AND (S:123T OR S:234A) -
Get all sequences from the USA that do not have cows as a host and that also have the mutation 300G:
NOT host='bos taurus' AND 300G AND country=USA -
Get the sequences with at least 3 out of five mutations/deletions:
[3-of: 123A, 234T, S:345-, ORF1a:456K, ORF7A:100-] -
Get the sequences that fulfill exactly 2 out of 4 conditions:
[exactly-2-of: 123A & 234T, !234T, S:345- | S:346-, [2-of: 222T, 333G, 444A, 555C]]