Ambiguous symbols
The symbols page
lists all symbols that the underlying sequence files in .FASTA
format can contain.
The ambiguous symbols arise from imperfect reads in the sequencer.
While one mostly queries for the symbols A
, C
, G
, T
and -
to look for specific features and mutations of a sequence,
or N
for quality control of the underlying data,
the ambiguous symbols R
through V
are often too cumbersome to consider in analyses.
LAPIS supports the flexible consideration of these ambiguous symbols through an extension of the boolean logic syntax in the variant queries.
Here we introduce a new expression MAYBE
to consider sequences that have an ambiguous code which maybe matches the queried value.
Example
Consider the following sequences:
A filter for the mutation 3G
returns only the sequence AAGCG
, as it is the only sequence with the symbol G
at position 3.
The filter MAYBE(3G)
however also considers that the sequences AARCG
and AANCG
may have the symbol G
at position 3,
because the symbols R
and N
can represent Guanine.