To accelerate research towards solutions relating to the COVID-19 global pandemic, we compiled this quick start guide about IBM Functional Genomics Platform resources.
SARS-CoV-2 Data Summary:
- IBM Functional Genomics Platform contains over 22,000 viral genome assemblies retrieved from GenBank and GISAID. We have annotated all of the genes, proteins, and functional domains contained in those genomes. We also pre-computed all of the connections among those biological entities too, so you can traverse across any data type you’re interested in. These biological entities are also connected to Gene Ontology (GO) and KEGG pathway information.
- For SARS-CoV-2 specifically, there are over 5,000 genomes annotated (as of 2020-04-09) and their corresponding molecular targets are shown below. This data is getting actively updated as new sequences become available.
- Even in this early wave of sequencing data, we’re already observing variants present for key SARS-CoV-2 proteins (distinct protein count indicates number of unique sequences with that name).
How can I ask and answer my research questions?
- Check out our example COVID-19 Python notebook on GitHub
- We’ve created a collection of COVID-19 data. You can search for it in the UI using these terms:
- COVID-19, SARS-CoV-2, spike glycoprotein, betacoronavirus
- If you can’t find the sequence you’re looking for, you can our BLAST app to search against all viral genes and proteins
- Check out our general Best Practices and Getting Started Guide