Overcoming Engineering Challenges: Taming ‘-omics’ Data

Posted in Information technology by Brian Buntz on January 23, 2013

Syapse (Palo Alto, CA), a startup trying to bring genomic data into routine clinical use, is developing a data-management platform to enable clinicians to sift through the mounds of data generated by “-omics:” genomics, proteomics, metabolomics, and transcriptomics to ultimately improve the diagnosis of a variety of potential illnesses. The end goal is, as a GigaOm article puts it, "to make mining omics data as simple for its users as Salesforce.com makes [customer relationship management] for its users." 

MPMN reached out to Syapse to get a sense of some technical perspective on how the company is going about doing that. Namely, we asked the company's chief technology officer, Tony Loeser what the company's primary engineering hurdles are (which are obviously software related, rather than mechanical) and how it has overcome them.

"An example challenge is the complexity of the content that we are dealing with. Biology and medicine are precise sciences, with tens of thousands of concepts and interrelationships between those,” says the firm’s chief technology officer, Tony Loeser. “There are dozens of public ontologies describing the structure of biomedical knowledge. Every company we work with sees this data landscape differently, and has different data integration needs. We need to be able to accommodate all of that within our application suite.”

"Our approach has been to build our biomedical information store on a semantic, RDF (Resource Description Framework) back end. Working together with customers, we are able to craft ontologies that describe precisely their view of the data that they work with and interact with,” Loeser continues. “Much of the functionality in the application is configured automatically based on the custom ontologies, providing a fully customized interface for each customer.” For high volume -omic information that is not amenable to RDF storage, the company is building specialized data stores to allow both Big Data-style data manipulation and a semantic view of relevant subsets of the data, he explains.

"Looking ahead, we feel that this data architecture uniquely prepares us for the ever-present challenges such as data integration, customer-driven model customization, or precise management of data central to a collaboration."

Brian Buntz is the editor-in-chief of MPMN. Follow him on Twitter at @brian_buntz.