Unable to retrieve proteins

I have a fresh version of omxware lib from today, loaded in Jupyter Notebook. Token is accepted, help(omx) prints expected text.

I run this example taken straight from the COVID-19 Quick Start Notebook:
search_term = ‘Replicase polyprotein 1a’
proteins = omx.proteins(protein_name=search_term,page_size=25,classification=‘virus’,collection = [‘sars-cov-2’])
total = proteins.total_results()

… and get 0 results.

No error, just 0 results.

What I really need are unique protein sequences for all SARS-COV-2 variants, i.e. 1198 sequences.

I tried retrieving proteins from the COVID-19 collection using ids=all and that returned a ridiculous 55M records.

I now get 0 results for everything I run, so I assume I have been blocked in some way or blacklisted because the examples should work… or has the API changed?

I came here in desperation after realizing that ViPR is only returning a fraction of available protein sequences. I have people waiting in the lab for data that relies on those sequences.

Any help greatly appreciated.

I have checked that I can retrieve genes, not proteins. Are you limiting access to protein data?

This returns 591 items:

search_term = 'Spike Glycoprotein'
genes_sars_cov2 = omx.genes(gene_name=search_term,page_size=25,collection=['covid19'],classification='virus')
total = genes_sars_cov2.total_results()

This returns 0 results:

search_term = 'Replicase polyprotein 1a'
proteins = omx.proteins(protein_name=search_term,page_size=25,classification='virus',collection = ['sars-cov-2'])
print('Total Results for %s: %d' % (search_term, proteins.total_results()))

Hi Isabelle,
Thank you for posting and welcome to the community! I was able to reproduce what you’re seeing and we are not limiting protein search results so you should see non-zero counts. We’re actively looking into this and will follow up shortly. So sorry for the inconvenience.

Also, collection='all' searches all bacteria and virus which is why there are so many results returned.

I did see that you can return gene sequences for that search term if that helps you in the interim.

search_term = 'Replicase polyprotein 1a'
genes = omx.genes(gene_name=search_term,page_size=25,classification='virus',collection = ['sars-cov-2'])
print('Total Results for %s: %d' % (search_term, genes.total_results()))

Thank you for your patience while we look into this!

1 Like

@isaphan it looks like the latest data was in the middle of getting pushed out to our production servers at the same time you were requesting results. If you could try again now, you should be all set. I’m seeing 1,435 protein results for ‘Replicase polyprotein 1a’ and this has been verified by two other team members.

1 Like

Thanks so much @superman , I confirm it is working now. Awesome!

1 Like

Most certainly! And as an added bonus you’ve got data that’s hot off the presses :slight_smile:

1 Like

Result count jumped from 2656 (after your last message), to 3547 this morning after omxware upgrade to 0.1.38. The jump seems a bit excessive, is that number correct or am I now seeing duplicates?

On a minor note, the otherwise excellent curation appears to include rogue double quotes that muddle the count by protein name:
34 "Replicase polyprotein 1a
29 "Replicase polyprotein 1ab
837 Replicase polyprotein 1a
1050 Replicase polyprotein 1ab

The annotation is still so vastly cleaner and consistent compared to ViPR that I feel bad for pointing this out. You Guys are doing a fantastic job. Keep it up!

Please ignore, I realized that some polyproteins have the 2 names:
Replicase polyprotein 1a, Replicase polyprotein 1ab
Replicase polyprotein 1ab, Replicase polyprotein 1a

2 Likes