„A molecule in a haystack: Find drug targets in cancer omics data“
Background
21st century biology developed large-scale methods for genomics, epigenomics, proteomics and generates vast amounts of data. This includes DNA sequences, gene mutations, epigenetic modifications, gene expression, post-transcriptional regulation, protein levels, drug-protein-interactions, clinical parameters and much more. But while we generate lots of data, we lack methods to efficiently store, manage and analyze them.
We need better solutions to robustly combine all the knowledge from molecular omics data to clinically relevant covariates. In this challenge, we try to identify new drug targets by integrating public data sets of cancer related omics experiments.
Problem
Before we can do anything with the data we need to get an overview, see the connections and understand the relevant biological questions. For that we have to clean, structure and integrate everything and enrich it with prior knowledge. This enables the first steps in data analysis, such as identification of relevant genes and disease specific regulation. To facilitate this, we will develop new ways to store data and generate noSQL database models. Options are Elasticsearch, mongoDB, Cassandra or neo4j.
Actual Challenge
Challenge owner: Martin Preusse (Neo4j)