Far too many people have had the unfortunate experience – for a friend, relative, or themselves – of a medical diagnosis that comes too late. It is difficult to stay on top of everything that could go wrong with the human body, particularly when you don’t know what you are looking for. This is why I was excited to learn how through the Intel Science and Technology Center for Big Data, Intel Labs is helping to make a new contribution to the field of genomics.
Genomics is exciting because it offers the potential to assess your risk of contracting a disease by comparing characteristics expressed by your own DNA to patterns found in others. Admittedly, I’m not certain I want to know what fate may have in store for me. But I would want to know as much as I could about my parents, my wife, or my children, so we could be actively searching for warning signs and take the quickest possible action when and if the time comes.
In a recent ISTC blog, MIT’s Manasi Vartak noted that it is now possible at a single facility to gene sequence more than 2000 people per day, creating about six trillion bytes of data in the process. To understand patterns of gene expression one must look across samples from many people, further compounding the Big Data challenge. It is becoming too much for today’s computing systems. According to Ketan Paranjape, Intel’s Global Director of Healthcare and Life Sciences, as we approach the $1000 per genome mark, downstream analytics and a final diagnosis still costs somewhere in the $100K-$300K range. New applications, tools and compute paradigms are needed to make this more affordable.
One major challenge for genomics, as with many Big Data applications, is a lack of standard benchmarks – a set of algorithms that represent an application’s technical needs. Researchers and computer architects need these benchmarks to test new approaches and to compare results with those of their peers.
Manasi and the MIT Database Group are working with our Parallel Computing Lab (led by Pradeep Dubey), Novartis, and the Broad Institute to publish “GenMark” this fall. This new benchmark suite will represent common genomics tasks such as:
- Given the expression of certain genes, predict the expression of other genes
- Find genes that behave similarly in the context of a specific disease (aka biclustering)
- Simplifying complex genetic models to a smaller number of key genes (aka SVD)
- Find genes whose expressions are correlated in certain diseases (aka covariance)
Building better systems that can find these correlations among greater populations of genetic data will yield actionable knowledge that doctors can use to guide testing and treatment.
Additionally, Ketan is leading a cross-Intel initiative called “Compute for Personalized Medicine” to help identify and solve the affordability challenges in genomics. He believes that the entire patient ecosystem comprised of the payer (insurance companies, government), provider (hospitals, clinics), pharmaceutical industry and the patient themselves must collaborate across these multiple silos to realize the vision of precision medicine. I look forward to a day when we can apply this knowledge to eliminate as many surprises and lost treatment opportunities as possible.