Genetics; Public Health; Computational Biology; Statistics; Genomics; Herbal Medicine; Proteomics; Biostatistics; Microbiota
Center for Polycystic Kidney Disease Research
Keck: High Performance Computation | NIDA Neuroproteomics Center
Our research is driven by the need to analyze and interpret large and complex data sets in biomedical research. For example, in genome wide association studies involving thousands of individuals, millions of DNA variants are analyzed for each person. Such data offer people the opportunity to identify variants affecting disease susceptibility and develop risk prediction models to facilitate disease prevention and treatment. There are many statistical challenges arising from the analysis of such data, including the very high dimensionality, the relatively weak signals, and the need to incorporate prior knowledge and other data sets in analysis. Another example is the analysis of next generation sequence data which present even greater statistical and computational challenges. Our group has been developing statistical methods to address these challenges, such as empirical Bayes methods to borrow information across different data sets, different generalizations of Gaussian graphical models for network inference, Markov random field models for spatial and temporal modeling, and general machine learning methods for high dimensional data.
Specialized Terms: Statistical genomics and proteomics; Bioinformatics; Data integration; High dimensional data; Network and graphical models; Disease risk prediction; Herbal medicine; Microbiome
Extensive Research Description
- Genome Wide Associatio Studies: We are developing statisticla methods to integrate diverse data types and prior biological knowledge to identify genes for common diseases and risk prediction models. The diseases we work on include Crohn's disease, substance abuse, schizophrenia, bipolar, obesity, aneurysm, and autism.
- Network Modeling: We are developing statistical methods to model biological networks under the general framework of Gaussian and other graphical models. Specific networks we are working on include gene expression regulatory networks, signaling networks, and eQTL networks.
- Cancer Genomics: We are developing statistical and computational methods to analyze cancer genomics data, e.g. microarrays and next generation sequencing, to identify cancer subtypes, driver mutations, and appropriate treatments for cancer patients.
- Microbiome: We are developing modeling and analysis approaches for microbiome data generated from next generation sequencing data.
- Proteomics: Our current focus is on targeted proteomics, such as Multiple Reaction Monitoring.
- Herbal Medicine: Through systems biology approach, we are identifying tissue-specific target pathways of herbal medicine.
- D. Chung, C. Yang, C. Li, J. Gelernter, H. Zhao (2014) GPA: A statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genetics, in press.
- B. Li, H. Chun, H. Zhao (2014) On an additive semi-graphoid model for statistical networks with application to pathway analysis. Journal of American Statistical Association, in press.
- X. Chen, D. Chung, G. Stefani, F. J. Slack, H. Zhao (2014) Statistical issues in binding site identification through CLIP-seq. Statistics and Its Interface, in press.
- L. Chung, C. Colangelo, H. Zhao (2014) Data pre-processing for label-free multiple reaction monitoring (MRM) experiments. Biology, 3: 383-402.
- L. Hou, M. Chen, C. K. Zhang, J. Cho, H. Zhao (2014) Guilt by Rewiring: Gene prioritization through network rewiring in genome wide association studies. Human Molecular Genetics, 23: 2780-2790.
- C. Li, C. Yang, J. Gelernter, H. Zhao (2014) Improving genetic risk prediction by leveraging pleiotropy. Human Genetics, 133: 639-650.
- C. Yang, L. Wang, S. Zhang, H. Zhao (2013) Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping. Bioinformatics, 29: 1026-1034.
- L. Wang, W. Zheng, H. Zhao, M. Deng (2013) Statistical analysis reveals co-expression patterns of many pairs of genes in yeast are jointly regulated by interacting loci. PLoS Genetics, 9: e1003414.
- X. Qi, H. Zhao (2013) Sparse principal component analysis by choice of norm. Journal of Multivariate Analysis, 114: 127-160.
- J. Ferguson, C. Yang, J. Cho, H. Zhao (2013) Empirical Bayes correction for the winner's curse in genetic association studies. Genetic Epidemiology, 37: 60-68.
- B. Li, H. Chun, H. Zhao (2012) Sparse estimation of conditional graphical models with application to gene networks. Journal of American Statistical Association, 107: 152-167.
- H. Ma, H. Zhao (2012) iFad: an integrative factor analysis model for drug-pathway association inference. Bioinformatics, 28: 1911-1918.
- R. Luo, H. Zhao (2011) Bayesian hierarchical modeling for signaling pathway inference from single cell interventional data. Annals of Applied Statistics, 5: 725–745.
- M. Chen, J. Cho, H. Zhao (2011) Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLoS Genetics, 7: e1001353.