Proposed feature: how would you like to search your sequence against IBM Functional Genomics Platform?

We’re designing and implementing a way for users to search their nucleotide or protein sequences against IBM Functional Genomics Platform data. We’d love to hear more about the types of queries and search spaces you’re interested in. Respond below to help us answer these questions…

Do you want to search against all of IBM Functional Genomics Platform? A specific slice of IBM Functional Genomics Platform? And if a specific slice, what kind e.g. by genus, gene name, etc? How often would you hope or expect this to be update?

Thanks so much for all of your help making IBM Functional Genomics Platform an even better resource!

-The IBM Functional Genomics Platform Team

That’s a great step forward Superman and team ! Sequence matching is the most fundamental operation in bioinformatics space, so enabling a search against OMXWare sequences is a very welcoming step forward.

Search operations can be divided across hierarchies as you mentioned in your question above. Given that OMXWare contain only bacterial sequences, it would be useful to search the entire space (say the level 0 search). A reduced form of search on taxonomic structure (level 1) can include/exclude certain genomes and search against the rest. It will be very useful to keep the search “interface” or queries, similar to what is considered standard in bioinformatics space, so more biologists can naturally use it without requiring additional learning curves. Some examples as how the community uses sequence searches are here - Nucleotide BLAST: Search nucleotide databases using a nucleotide query and jackhmmer search | HMMER .

I would discourage gene level searches, simply because OMXWare already has APIs to extract gene level information. So users can download the sequence they want to use as reference and do a sequence matching on their own platform. Further, OMXware provides various metadata etc along with sequences, so a user should be well equipped to do things of interest and not overburden the OMXWare compute infrastructure.

In terms of usage, the searches will enable users to ask questions related to sequence alignment and sequence homologs with respect to their input sequences.

Thank you @ritesh for your very thoughtful response. I like your description of leveled searching— makes perfect sense.

It sounds like you’d prefer genome searching instead of gene searching…is that right? Where results would include the genomes with the best matches to the query sequence.

That’s right. Genome search is preferred, because gene based searches would be very limiting. Further gene sequences keep changing as new annotations and experimental evidences come, even the number of genes are not static. As a result of a search, it would be useful to know which genomes were matched, and what regions in the genomes were matched, i.e., the genome co-ordinates. If one knows the co-ordinates, then it is trivial to know if a gene exists there or not, that would automatically encompass the gene-searching as well.

Good point. Thank you so much!