COVID-19 Quick Start

To accelerate research towards solutions relating to the COVID-19 global pandemic, we compiled this quick start guide about IBM Functional Genomics Platform resources.

SARS-CoV-2 Data Summary:

  • IBM Functional Genomics Platform contains over 22,000 viral genome assemblies retrieved from GenBank and GISAID. We have annotated all of the genes, proteins, and functional domains contained in those genomes. We also pre-computed all of the connections among those biological entities too, so you can traverse across any data type you’re interested in. These biological entities are also connected to Gene Ontology (GO) and KEGG pathway information.
  • For SARS-CoV-2 specifically, there are over 5,000 genomes annotated (as of 2020-04-09) and their corresponding molecular targets are shown below. This data is getting actively updated as new sequences become available.
  • Even in this early wave of sequencing data, we’re already observing variants present for key SARS-CoV-2 proteins (distinct protein count indicates number of unique sequences with that name).

How can I ask and answer my research questions?

  • Check out our example COVID-19 Python notebook on GitHub
  • We’ve created a collection of COVID-19 data. You can search for it in the UI using these terms:
    • COVID-19, SARS-CoV-2, spike glycoprotein, betacoronavirus
  • If you can’t find the sequence you’re looking for, you can our BLAST app to search against all viral genes and proteins

Other tips:

1 Like

GettingStarted Notebook - Unable to get Genomes for the Genus getting: KeyError: “[‘metadata’] not in index” error. Only Genera, genome_type, id and taxid can be retrieved if you omit the metadata. Does anyone else is facing that?

Hi @alerod, it looks like new data has come in for that example and not all the metadata has been processed for it. Removing the metadata key for the time being will allow you to continue.

Ed

1 Like

yep, thanks @eseabolt