Bayesian Approaches to Functional Integration of Genomic Data

Dr. Jingjing Yang
Department of Human Genetics
Emory University

Although genome-wide association studies (GWAS) have identified thousands of SNP-trait associations (>55K reported on GWAS catalog), the biological mechanisms underlying these associations are largely unknown. Here, we propose a Bayesian variable selection model to integrate variant functional annotations and help understand and prioritize causal variants and mechanisms. Our method improves upon previous approaches by accounting for multiple categories of functional annotations, for genotype correlation due to linkage disequilibrium (LD) and, importantly, by quantifying the proportion of causal variants and relative effect sizes of variants with different functional annotation. To apply our model to very large GWAS and sequencing data sets, we present a novel scalable Bayesian computation method through a block-wise expectation maximization Markov Chain Monte Carlo (EM-MCMC) algorithm. Our algorithm dramatically improves both computational speed and posterior sampling convergence by taking advantage of the block-like LD structure of the human genome. In simulations, we show that our method increases power and identifies more true signals compared with competing methods. In real data, we show that previous greedy approaches and MCMC implementations lead to apparently sub-optimal sets of likely causal variants because they fail to fully explore the set of possible causal variants. We applied our method to a genome-wide association study of age-related macular degeneration with ~33 thousand individuals and >12 million genotyped and imputed variants. Our results show that the non-synonymous markers are about 20 times more likely to be causal than the other markers, and that the effect size of associated non-synonymous variants is about 3 times larger than for other variants. Importantly, our method can help prioritize likely functional candidates for follow-up while disentangling the effects of genotype, linkage disequilibrium and functional annotation. Further, we implemented this method using only summary level data from standard GWAS, which saves up to 85% CPU time while producing the same results as using individual-level data. In conclusion, our method has the potential to shed light on the biological mechanism of SNP associations and can help prioritize SNPs for downstream analysis.

Host: Greg Gibson

Event Details


  • Thursday, February 15, 2018
    10:55 pm
Location: Room 1005, Roger A. and Helen B. Krone Engineered Biosystems Building (EBB), 950 Atlantic Dr NW, Atlanta, GA 30332

For More Information Contact

If you have questions about logistics or would like to set up an appointment with the speaker, please contact the School of Biological Science's administrative office at