Human genetics holds the key to understanding pathogenesis of many devastating diseases like type 2 diabetes and Alzheimer’s disease. The discovery, development, and commercialization of new classes of drugs can take 10-15 years and greater than $5 billion in R&D investment only to see less than 5% of the drugs make it to market. Committed to creating therapeutic innovations, Regeneron has built one of the world’s most comprehensive genetics databases to supplement our state-of-the-art drug development pipeline. While these massive volumes of data provide an unprecedented opportunity to gain novel therapeutic insights, Regeneron has encountered a number of challenges on the road to delivering on the promises of big data and genomics in drug discovery. For example, how do you enable fast and accurate query from >80B data points? And how do you expedite novel statistical tests on TB-scale data?
This presentation will share Regeneron’s vision for building a scalable and performant informatics infrastructure to accelerate genetics-driven drug development. Specifically, we highlight key challenges in establishing the world’s largest clinical genetics databases, provide an overview of how Regeneron leverages Databricks’ Unified Analytics Platform and Apache Spark, and discuss in detail key engineering innovations that have already come out of this collaborative effort.