How Databricks and Machine Learning is Powering the Future of Genomics

Presented by

Frank Austin Nothaft, Genomics Data Engineer at Databricks

About this talk

With the drastic drop in the cost of sequencing a single genome, many organizations across biotechnology, pharmaceuticals, biomedical research, and agriculture have begun to make use of genome sequencing. While the sequence of a single genome may provide insight about the individual who was sequenced, to derive maximal insight from the genomic data, the ultimate goal is to query across a cohort of many hundreds to thousands of individuals. Join this webinar to learn how Databricks — powered by Apache Spark — enables queries across a database of genomics in interactive time and simplifies the application of machine learning models and statistical tests to genomics data across patients, to derive more insight into the biological processes driven by genomic alterations. In this webinar, we will: - Demonstrate how Databricks can rapidly query annotated variants across a cohort of 1,000 samples. - Look at a case study using Databricks to improve the performance of running an expression quantitative trait loci (eQTL) test across samples from the GEUVADIS project. - Show how we can parallelize conventional genomics tools using Databricks.

Related topics:

More from this channel

Upcoming talks (9)
On-demand talks (85)
Subscribers (38692)
No matter at what stage of your data journey you’re in, this channel will help you get a better understanding of the fundamental concepts of the Databricks Lakehouse platform and the problems we’re helping to solve for data teams.