Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, commonly used to define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without assuming the prior knowledge of a biological network. To this end we train a deep auto-encoder on a large transcriptional data-set. Our hypothesis is that such modules could be discovered in the deep representations within the auto-encoder when trained to capture the variance in the input-output map of the transcriptional profiles. Using a three-layer deep auto-encoder we find a statistically significant enrichment of GWAS relevant genes in the third layer, and to a successively lesser degree in the second and first layers respectively. In contrast, we found an opposite gradient where a modular protein-protein interaction signal was strongest in the first layer but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach, without assuming a particular biological network, is sufficient to discover groups of disease-related genes.
Deriving Disease Modules from the Compressed Transcriptional Space Embedded in a Deep Auto-encoder
"KAUST shall be a beacon for peace, hope and reconciliation, and shall serve the people of the Kingdom and the world."
King Abdullah bin Abdulaziz Al Saud, 1924 – 2015
Thuwal 23955-6900, Kingdom of Saudi Arabia
© King Abdullah University of Science and Technology. All rights reserved