This document is a running list of features and data updates made to the system. Please refer to the following for the latest changes and improvements.
May 29, 2020 (v1.2.0) - Improvements and updates to SARS-CoV-2 data, Variant Browser Application
New application to browse variants across SARS-CoV-2 genomes has been deployed in collaboration with the IBM Tokyo research lab
Genome quality inclusion criteria for SARS-CoV-2 has been improved in the following ways:
- identify and mark genomes removed from GISAID as inactive (by default this data is not shown to users, in subsequent releases we will allow users to filter data types by active or inactive)
- process all SARS-CoV-2 genomes with additional stringency on genome quality:
- genome length must be > 29,000 bp AND
- number of Ns must be < 1% Ns AND
- must meet GISAID high coverage (defined by GISAID as limitations on number of Ns as shown above and <0.05% unique amino acid mutations not seen in other GISAID sequences and no insertion/deletion unless verified by the submitter)
- genomes not meeting these criteria are now marked as inactive (see notes above on expansion of this status in future releases)
Note: by improving genome quality criteria we observed a significant reduction in truncated gene and protein products for SARS-CoV-2 indicating the necessity of increasing the stringency of this criteria
All bacterial and viral genomes have a unique hash (md5) to track modifications of genome sequences and data updates from all data sources
Improvements in our gene and protein identification pipeline for SARS-CoV-2:
- Updated references to use UniProt SAR-CoV-2 protein released on 4/29/2020
- Better handling of short viral sequences in Prokka and Prodigal
Updated our functional domain annotation pipeline to use InterProScan 5-44.79 (latest) specific to SARS-CoV-2 (InterPro 78.1 - A special update for SARS-CoV-2 – InterPro – Articles)
Retrieved and processed the latest collection of SARS-CoV-2 as of 05-18-2020. This has been extended across all biological entities in our UI, apps, and developer toolkit.
March 27, 2020 (v1.1.0) - COVID spotlight
- OMXWare is now formally known as the IBM Functional Genomics Platform. This new name is reflected across the UI and other text areas. The SDK, REST services, and other programmatic endpoints still use the OMXWare-based name for backwards compatibility and to minimize breakages.
- The platform has been expanded to include gene, protein, functional domain, and pathway information for over 22,000 viral GenBank and GISAID assemblies. This includes annotation of over 1,000 SARS-CoV-2 genomes in response to the COVID-19 global pandemic. (Note: re-distribution of genome assemblies is not available at this time due to data usage requirements, but the source accession ids are present here for your retrieval of this raw data from the originating source. The genes, proteins, domains, etc. are all available for your research through our UI or developer toolkit.)
- We have created new BLAST indices with the latest bacterial and viral gene and protein data.
- The Functional Genomics Platform now contains a notion of “collections” to help surface relevant slices of the data. Our first collection aggregates all SARS-CoV-2 data. You can access this in the UI search (keywords COVID-19, SARS-CoV-2, etc.) as well as in our Python SDK and REST Services. These collections also allow you to retrieve bacterial-only, viral-only, or all data in our developer tools.
- To support usage of this new data, we have also provided quick start guidelines for COVID-19 and summary visualizations in our Explore gallery.
March 1, 2020 (v1.0.0)
- Our UI search, services, and SDK now contain the latest data expanding IBM Functional Genomics Platform by over 50K genomes and tens of millions of genes, proteins, and functional domains
- New BLAST gene and protein indices have been deployed containing the latest data
- Updated Docker documentation is now accessible and updated for all users
- In the Explore gallery, our biological entity summary now plots the latest data live from our services
- We have corrected the display of taxonomy at kingdom and species rank
- Plus minor bug fixes and improvement to IBM Functional Genomics Platform system stability