I am trying to find a way to retrieve genes and their prevalence on different sample types. For example I am trying to figure out what the most prevalent genes are in human fecal samples. I am only looking at the OMXWare Services endpoints but can’t quite figure out if it is possible at all.
It is great to see that you’re exploring OMXWare. Just a reminder, the OMXWare data is genome isolates for now However, to your question, we do have metadata on these isolates which includes where the bacteria were isolated from. This is surfaced in the Search endpoint.
Where you can search keywords across any NCBI BioSample metadata including
biosample.isolation_source. I’d recommend trying “feces” and “fecal” as a keywords and any other synonyms that may fit what you’re thinking of. From this, you can retrieve the genome ids isolated from feces and then pull the corresponding genes to do an analysis on the prevalence.
Check out the services end point and let us know if this will fit your need. We’re happy to provide more information too.
Here is a sample output, using the endpoint
/api/secure/search/fecal . On line 29, you can see
biosample.isolation_source: fecal and on line 58 you can see
id: GCF_001649215.1, which is the OMXWare identifier for this particular genome. This id can be then used to pull genes, protein and domains that are contained in the genome.