Introduction and Goals
This document was inspired by the arc42 template.
It describes LAPIS (Lightweight API for Sequences) and SILO, which is a platform to give easy access to genomic sequence data alongside metadata of the sequenced probes. It is used to filter potentially large sequence data and return the result to the user through web access, so that a user can develop their own evaluation of the data.
The initial implementation of LAPIS is specialized for SARS-CoV-2 and is used by CoV-Spectrum. In this approach, we want to develop a generalized API that is configurable for a wider range of organisms. The solution consists of two systems, LAPIS and SILO, which are designed to work together, but they could be operated independently of each other. SILO serves as a database with a specialized query language (SILO queries), which could in principle be exchanged by any other database.
This document focuses on LAPIS, but we usually think of it as operated as a single unit with SILO. We refer to this special configuration as SILO-LAPIS.
The following goals have been established for this system:
Priority | |
---|---|
1 | Provide an API to query large sets of genomic sequence data in a very efficient way. |
2 | Provide the infrastructure such that other researchers (“maintainers”) can easily setup their own instance. |
Requirements Overview
Requirement | |
---|---|
Create an instance for a given organism | Create an instance of the whole system by giving a configuration for a organism. |
Store data efficiently | Store data in compressed form. |
Provide web access to data | Provide endpoints for custom user queries to the data. |
Quality Goals
Quality Category | Quality | Description |
---|---|---|
Usability | Ease of use | Ease of use for the user to hand in queries. |
Ease of use | Ease of use for maintainers to create a new instance for a new organism. | |
Ease of learning | The queries should be as easy to write as possible. We provide material to assist in learning the query language. | |
Performance efficiency | Time behaviour | It is possible to query millions of sequences in less than a second. |
Scalability | Performance (query response time, memory usage) grows at most linearly with the number of stored sequences. | |
Maintainability | Reusability | It is possible to use LAPIS with any other database that implements the SILO query language. |
Testability | SILO-LAPIS is well tested on end to end scope. The tests serve as examples for users and maintainers. |
Stakeholders
Role | Expectations |
---|---|
Database researcher | Can develop new genomic data engineering algorithms for LAPIS. |
Developer | Can fix bugs and add new features to LAPIS. |
User (beginner level) | Can write simple queries to LAPIS by hand and get a fast result. |
User (advanced level/tool developers) | Can write advanced, possibly programmatically generated queries to LAPIS and get a fast result. |
Maintainer | Can host the software on their own servers for their own organism configuration. |