Biochemistry; Biophysics; DNA; Medical Informatics; Computational Biology; Genomics
We do research in bioinformatics, applying computational approaches to problems in molecular biology. Broadly, we are interested in large-scale analyses of genome sequences, macromolecular structures, and functional-genomics datasets. It is hoped that these will allow us to address a number of overall statistical questions about macromolecules, relating to their physical properties, cellular function, interactions, and phylogenetic distribution. We are especially focused on the human genome and proteome. Our research involves a number of quantitative techniques, including database design, systematic datamining and machine learning, visualization of high-dimensional data, and molecular simulation. More specifically, we focus on three questions. First, we are interested in annotating the raw human genome sequence, especially in characterizing the vast intergenic regions. Next, we are trying to get at the function of all the genes encoded by the genome. Here, we try to characterize function on a large-scale through the use of molecular networks. Finally, for the group of protein-coding genes that have known 3D structures, we are trying to see how their function is carried out through motion and how motion can be predicted from packing geometry.
Specialized Terms: Biochemistry; Bioinformatics; Biophysics; Computational Biology; DNA; Genomics; Molecular Simulation; Proteins; Sequence Alignment; Structural Biology
Extensive Research Description
Soon, sequencing one’s genome may become as commonplace as getting an X-ray. Consequently, personal genomes will increasingly serve as the lenses through which the public views biology. Addressing this, the focus of the Gerstein Lab is interpreting personal genomes, particularly in relation to disorders, such as cancer. This endeavor has a number of related aspects described below. Moreover, the approaches we take have broad connections to a variety of data-intensive fields, within the emerging discipline of data science.
Personal Genome Variation: SVs
We are involved in finding variants in personal genomes. We focus on particular types of variants, which involve the re-arrangement of large blocks of the genome (structural variation). It is believed that structural variants involve as many nucleotides in the genome as the better-known SNPs. Moreover, re-arrangements are very prevalent in genomic diseases such as cancer, and we have developed tools for identifying them (e.g. using split reads and fusion genes).
Human Genome Annotation: Processing Next-Gen Sequencing Data
After one has determined all of the variants in an individual’s genome, the next step is understanding what they mean. This involves genome annotation, where one places each base within a biochemical context. Our focus has been on transcription-factor binding sites and non-coding RNAs (ncRNAs). We have carried out this effort by processing next-generation sequencing data (i.e. RNA-seq and ChIP-seq). We have developed tools to identify ncRNAs and regions of intragenic transcription. We also have developed methods for finding transcription-factor binding sites by processing ChIP-seq reads and using the level of this binding to predict statistically the expression of target genes.
Comparative Genomics: Pseudogenes as Molecular Fossils
Pseudogenes provide a contrasting annotation to binding sites and ncRNAs in being derived from comparative rather than functional genomics data. They provide information about human molecular history. We have developed methods for identifying them. We were one of the first groups to perform comprehensive surveys, illustrating the different pseudogene repertoires in different organisms. Moreover, we have found hints that some supposedly "dead" pseudogenes may actually harbor biochemical activity.
Protein Structure and Function: Macromolecular Motions
While non-coding regions play an important, if underappreciated, role in genome function and disease, we also work on characterizing coding sequences, drilling deep into their protein products. We have a particular focus on loss-of-function mutations. Moreover, by analyzing protein motions we can better predict how a mutation affects function. This effort involves devising a system for characterizing motions in standardized fashion in terms of key statistics, such as the degree of rotation about hinges. It is guided by the fact that protein mobility is highly restricted by tight packing. We have developed tools for measuring packing efficiency using specialized geometric constructions (e.g. Voronoi polyhedra).
Analysis of Diverse Networks
Networks are a way of tying together much of our research. Network representations can be applied consistently to many different types of biological data; thus, we have developed tools to build and analyze regulatory networks, protein-protein interactions and metabolic pathways, identifying key nodes such as hubs and bottlenecks. Moreover, because they are generic and flexible representation, networks provide an ideal framework for data integration. We have integrated networks with dynamic gene-expression data (identifying transient hubs), 3D-protein structures, and even satellite imagery. Finally, as people have more intuition for commonplace networks, such as those in social and computer systems, we have found cross-disciplinary comparisons helpful elucidating system-level properties of biological networks, such as the association of greater connectivity with more evolutionary constraint.
Genomics at the Forefront of Data Science
Overall the Gerstein lab acts a connector, bringing quantitative approaches from disciplines such as computer science and statistics to bear on practical questions and large-scale data in molecular biology. In particular, we have focused on applying technical approaches in simulation, machine learning, and knowledgebase design. Often, we carry out our work in multi-disciplinary teams. Some of the key collaborative efforts that we are involved in include KBase, Brainspan, ENCODE, modENCODE, 1000 Genomes, PCAWG, the exRNA Consortium and the Centers for Mendelian Genomics.
As a discipline, genomics is an exemplar for using big data to construct a resource and answer questions. Consequently, it is at the forefront in the emerging field of data science and provides an ideal training for future data scientists.
Personal genomics also acts as a bridge connecting the biological sciences to larger issues facing other big-data disciplines. For instance, data mining generally poses questions related to privacy. We study the fundamental privacy implications of mining personal genomes, which contain immutable information, shared amongst relatives that will be increasingly revealing in generations to come. Also, we have examined how general knowledge-representation issues associated with publishing and digital libraries relate to biological databases. We envision a future of structured literature, with less distinction between databases and journals.
The real life of pseudogenes
Gerstein, M. and Zheng, D. (2006). The real life of pseudogenes. Sci. Am. 295: 48-55.
Integrative annotation of variants from 1092 humans: application to cancer genomics
E Khurana, Y Fu, V Colonna, XJ Mu, HM Kang, T Lappalainen, A Sboner, L Lochovsky, J Chen, A Harmanci, J Das, A Abyzov, S Balasubramanian, K Beal, D Chakravarty, D Challis, Y Chen, D Clarke, L Clarke, F Cunningham, US Evani, P Flicek, R Fragoza, E Garrison, R Gibbs, ZH Gumus, J Herrero, N Kitabayashi, Y Kong, K Lage, V Liluashvili, SM Lipkin, DG MacArthur, G Marth, D Muzny, TH Pers, GR Ritchie, JA Rosenfeld, C Sisu, X Wei, M Wilson, Y Xue, F Yu, 1000 Genomes Project Consortium, ET Dermitzakis, H Yu, MA Rubin, C Tyler-Smith, M Gerstein (2013). Science 342: 1235587
Architecture of the human regulatory network derived from ENCODE data.
MB Gerstein, A Kundaje, M Hariharan, SG Landt, KK Yan, C Cheng, XJ Mu, E Khurana, J Rozowsky, R Alexander, R Min, P Alves, A Abyzov, N Addleman, N Bhardwaj, AP Boyle, P Cayting, A Charos, DZ Chen, Y Cheng, D Clarke, C Eastman, G Euskirchen, S Frietze, Y Fu, J Gertz, F Grubert, A Harmanci, P Jain, M Kasowski, P Lacroute, J Leng, J Lian, H Monahan, H O'Geen, Z Ouyang, EC Partridge, D Patacsil, F Pauli, D Raha, L Ramirez, TE Reddy, B Reed, M Shi, T Slifer, J Wang, L Wu, X Yang, KY Yip, G Zilberman-Schapira, S Batzoglou, A Sidow, PJ Farnham, RM Myers, SM Weissman, M Snyder (2012). Nature 489:91-100.
Relating three-dimensional structures to protein networks provides evolutionary insights.
PM Kim, LJ Lu, Y Xia, MB Gerstein (2006). Science 314:1938-41.
Full List of PubMed Publications
- Despic V, Dejung M, Gu M, Krishnan J, Zhang J, Herzel L, Straube K, Gerstein MB, Butter F, Neugebauer KM: Dynamic RNA-protein interactions underlie the zebrafish maternal-to-zygotic transition. Genome Res. 2017 Apr 5; 2017 Apr 5. PMID: 28381614
- Li S, Shuch BM, Gerstein MB: Whole-genome analysis of papillary kidney cancer finds significant noncoding alterations. PLoS Genet. 2017 Mar; 2017 Mar 30. PMID: 28358873
- Choo SW, Rayko M, Tan TK, Hari R, Komissarov A, Wee WY, Yurchenko AA, Kliver S, Tamazian G, Antunes A, Wilson RK, Warren WC, Koepfli KP, Minx P, Krasheninnikova K, Kotze A, Dalton DL, Vermaak E, Paterson IC, Dobrynin P, Sitam FT, Rovie-Ryan JJ, Johnson WE, Yusoff AM, Luo SJ, Karuppannan KV, Fang G, Zheng D, Gerstein MB, Lipovich L, O'Brien SJ, Wong GJ: Pangolin genomes and the evolution of mammalian scales and immunity. Genome Res. 2016 Oct; 2016 Aug 10. PMID: 27510566
- Cheung KH, Keerthikumar S, Roncaglia P, Subramanian SL, Roth ME, Samuel M, Anand S, Gangoda L, Gould S, Alexander R, Galas D, Gerstein MB, Hill AF, Kitchen RR, Lötvall J, Patel T, Procaccini DC, Quesenberry P, Rozowsky J, Raffai RL, Shypitsyna A, Su AI, Théry C, Vickers K, Wauben MH, Mathivanan S, Milosavljevic A, Laurent LC: Extending gene ontology in the context of extracellular RNA and vesicle communication. J Biomed Semantics. 2016; 2016 Apr 12. PMID: 27076901
- Abyzov A, Li S, Gerstein MB: Understanding genome structural variations. Oncotarget. 2016 Feb 16. PMID: 26657727
- PsychENCODE Consortium., Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ, Crawford GE, Jaffe AE, Pinto D, Dracheva S, Geschwind DH, Mill J, Nairn AC, Abyzov A, Pochareddy S, Prabhakar S, Weissman S, Sullivan PF, State MW, Weng Z, Peters MA, White KP, Gerstein MB, Amiri A, Armoskus C, Ashley-Koch AE, Bae T, Beckel-Mitchener A, Berman BP, Coetzee GA, Coppola G, Francoeur N, Fromer M, Gao R, Grennan K, Herstein J, Kavanagh DH, Ivanov NA, Jiang Y, Kitchen RR, Kozlenkov A, Kundakovic M, Li M, Li Z, Liu S, Mangravite LM, Mattei E, Markenscoff-Papadimitriou E, Navarro FC, North N, Omberg L, Panchision D, Parikshak N, Poschmann J, Price AJ, Purcaro M, Reddy TE, Roussos P, Schreiner S, Scuderi S, Sebra R, Shibata M, Shieh AW, Skarica M, Sun W, Swarup V, Thomas A, Tsuji J, van Bakel H, Wang D, Wang Y, Wang K, Werling DM, Willsey AJ, Witt H, Won H, Wong CC, Wray GA, Wu EY, Xu X, Yao L, Senthil G, Lehner T, Sklar P, Sestan N: The PsychENCODE project. Nat Neurosci. 2015 Dec. PMID: 26605881
- Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M, Konkel MK, Malhotra A, Stütz AM, Shi X, Paolo Casale F, Chen J, Hormozdiari F, Dayama G, Chen K, Malig M, Chaisson MJ, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HY, Jasmine Mu X, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong Z, Clarke L, Dal E, Ding L, Emery S, Fan X, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer EW, McCarthy S, Flicek P, Gibbs RA, Marth G, Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A, Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA, Wang M, Yu F, Zhang C, Zhang J, Zheng-Bradley X, Zhou W, Zichner T, Sebat J, Batzer MA, McCarroll SA, 1000 Genomes Project Consortium., Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO: An integrated map of structural variation in 2,504 human genomes. Nature. 2015 Oct 1. PMID: 26432246
- Mu JC, Tootoonchi Afshar P, Mohiyuddin M, Chen X, Li J, Bani Asadi N, Gerstein MB, Wong WH, Lam HY: Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods. Sci Rep. 2015 Sep 28; 2015 Sep 28. PMID: 26412485
- Fang LT, Afshar PT, Chhibber A, Mohiyuddin M, Fan Y, Mu JC, Gibeling G, Barr S, Asadi NB, Gerstein MB, Koboldt DC, Wang W, Wong WH, Lam HY: An ensemble approach to accurately detect somatic mutations using SomaticSeq. Genome Biol. 2015 Sep 17; 2015 Sep 17. PMID: 26381235
- Duffy EE, Rutenberg-Schoenberg M, Stark CD, Kitchen RR, Gerstein MB, Simon MD: Tracking Distinct RNA Populations Using Efficient and Reversible Covalent Chemistry. Mol Cell. 2015 Sep 3. PMID: 26340425
- Subramanian SL, Kitchen RR, Alexander R, Carter BS, Cheung KH, Laurent LC, Pico A, Roberts LR, Roth ME, Rozowsky JS, Su AI, Gerstein MB, Milosavljevic A: Integration of extracellular RNA profiling data using metadata, biomedical ontologies and Linked Data technologies. J Extracell Vesicles. 2015; 2015 Aug 28. PMID: 26320941
- Mohiyuddin M, Mu JC, Li J, Bani Asadi N, Gerstein MB, Abyzov A, Wong WH, Lam HY: MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics. 2015 Aug 15; 2015 Apr 10. PMID: 25861968
- Abyzov A, Li S, Kim DR, Mohiyuddin M, Stütz AM, Parrish NF, Mu XJ, Clark W, Chen K, Hurles M, Korbel JO, Lam HY, Lee C, Gerstein MB: Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat Commun. 2015 Jun 1; 2015 Jun 1. PMID: 26028266
- Mu JC, Mohiyuddin M, Li J, Bani Asadi N, Gerstein MB, Abyzov A, Wong WH, Lam HY: VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications. Bioinformatics. 2015 May 1; 2014 Dec 17. PMID: 25524895
- Wang D, Yan KK, Sisu C, Cheng C, Rozowsky J, Meyerson W, Gerstein MB: Loregic: a method to characterize the cooperative logic of regulatory factors. PLoS Comput Biol. 2015 Apr; 2015 Apr 17. PMID: 25884877
- Kitchen RR, Rozowsky JS, Gerstein MB, Nairn AC: Decoding neuroproteomics: integrating the genome, translatome and functional anatomy. Nat Neurosci. 2014 Nov; 2014 Oct 28. PMID: 25349915
- Sisu C, Pei B, Leng J, Frankish A, Zhang Y, Balasubramanian S, Harte R, Wang D, Rutenberg-Schoenberg M, Clark W, Diekhans M, Rozowsky J, Hubbard T, Harrow J, Gerstein MB: Comparative analysis of pseudogenes across three phyla. Proc Natl Acad Sci U S A. 2014 Sep 16; 2014 Aug 25. PMID: 25157146
- Gerstein MB, Rozowsky J, Yan KK, Wang D, Cheng C, Brown JB, Davis CA, Hillier L, Sisu C, Li JJ, Pei B, Harmanci AO, Duff MO, Djebali S, Alexander RP, Alver BH, Auerbach R, Bell K, Bickel PJ, Boeck ME, Boley NP, Booth BW, Cherbas L, Cherbas P, Di C, Dobin A, Drenkow J, Ewing B, Fang G, Fastuca M, Feingold EA, Frankish A, Gao G, Good PJ, Guigó R, Hammonds A, Harrow J, Hoskins RA, Howald C, Hu L, Huang H, Hubbard TJ, Huynh C, Jha S, Kasper D, Kato M, Kaufman TC, Kitchen RR, Ladewig E, Lagarde J, Lai E, Leng J, Lu Z, MacCoss M, May G, McWhirter R, Merrihew G, Miller DM, Mortazavi A, Murad R, Oliver B, Olson S, Park PJ, Pazin MJ, Perrimon N, Pervouchine D, Reinke V, Reymond A, Robinson G, Samsonova A, Saunders GI, Schlesinger F, Sethi A, Slack FJ, Spencer WC, Stoiber MH, Strasbourger P, Tanzer A, Thompson OA, Wan KH, Wang G, Wang H, Watkins KL, Wen J, Wen K, Xue C, Yang L, Yip K, Zaleski C, Zhang Y, Zheng H, Brenner SE, Graveley BR, Celniker SE, Gingeras TR, Waterston R: Comparative analysis of the transcriptome across distant species. Nature. 2014 Aug 28. PMID: 25164755
- Miller JA, Ding SL, Sunkin SM, Smith KA, Ng L, Szafer A, Ebbert A, Riley ZL, Royall JJ, Aiona K, Arnold JM, Bennet C, Bertagnolli D, Brouner K, Butler S, Caldejon S, Carey A, Cuhaciyan C, Dalley RA, Dee N, Dolbeare TA, Facer BA, Feng D, Fliss TP, Gee G, Goldy J, Gourley L, Gregor BW, Gu G, Howard RE, Jochim JM, Kuan CL, Lau C, Lee CK, Lee F, Lemon TA, Lesnar P, McMurray B, Mastan N, Mosqueda N, Naluai-Cecchini T, Ngo NK, Nyhus J, Oldre A, Olson E, Parente J, Parker PD, Parry SE, Stevens A, Pletikos M, Reding M, Roll K, Sandman D, Sarreal M, Shapouri S, Shapovalova NV, Shen EH, Sjoquist N, Slaughterbeck CR, Smith M, Sodt AJ, Williams D, Zöllei L, Fischl B, Gerstein MB, Geschwind DH, Glass IA, Hawrylycz MJ, Hevner RF, Huang H, Jones AR, Knowles JA, Levitt P, Phillips JW, Sestan N, Wohnoutka P, Dang C, Bernard A, Hohmann JG, Lein ES: Transcriptional landscape of the prenatal human brain. Nature. 2014 Apr 10; 2014 Apr 2. PMID: 24695229
- Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB: The GENCODE pseudogene resource. Genome Biol. 2012 Sep 26; 2012 Sep 26. PMID: 22951037
- Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O'Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M, Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M: Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 Sep 6. PMID: 22955619
- Clarke D, Bhardwaj N, Gerstein MB: Novel insights through the integration of structural and functional genomics data with protein networks. J Struct Biol. 2012 Sep; 2012 Feb 11. PMID: 22343087
- Cotney J, Leng J, Oh S, DeMare LE, Reilly SK, Gerstein MB, Noonan JP: Chromatin state signatures associated with tissue-specific gene expression and enhancer activity in the embryonic limb. Genome Res. 2012 Jun; 2012 Mar 15. PMID: 22421546
- Cheng C, Shou C, Yip KY, Gerstein MB: Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors. Genome Biol. 2011 Nov 7; 2011 Nov 7. PMID: 22060676
- Bhardwaj N, Abyzov A, Clarke D, Shou C, Gerstein MB: Integration of protein motions with molecular networks reveals different mechanisms for permanent and transient interactions. Protein Sci. 2011 Oct; 2011 Sep 15. PMID: 21826754
- Mu XJ, Lu ZJ, Kong Y, Lam HY, Gerstein MB: Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project. Nucleic Acids Res. 2011 Sep 1; 2011 May 19. PMID: 21596777
- Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB: The real cost of sequencing: higher than you think! Genome Biol. 2011 Aug 25; 2011 Aug 25. PMID: 21867570
- Lu ZJ, Yip KY, Wang G, Shou C, Hillier LW, Khurana E, Agarwal A, Auerbach R, Rozowsky J, Cheng C, Kato M, Miller DM, Slack F, Snyder M, Waterston RH, Reinke V, Gerstein MB: Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data. Genome Res. 2011 Feb; 2010 Dec 22. PMID: 21177971
- Shou C, Bhardwaj N, Lam HY, Yan KK, Kim PM, Snyder M, Gerstein MB: Measuring the evolutionary rewiring of biological networks. PLoS Comput Biol. 2011 Jan 6; 2011 Jan 6. PMID: 21253555
- Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, Ikegami K, Alves P, Chateigner A, Perry M, Morris M, Auerbach RK, Feng X, Leng J, Vielle A, Niu W, Rhrissorrakrai K, Agarwal A, Alexander RP, Barber G, Brdlik CM, Brennan J, Brouillet JJ, Carr A, Cheung MS, Clawson H, Contrino S, Dannenberg LO, Dernburg AF, Desai A, Dick L, Dosé AC, Du J, Egelhofer T, Ercan S, Euskirchen G, Ewing B, Feingold EA, Gassmann R, Good PJ, Green P, Gullier F, Gutwein M, Guyer MS, Habegger L, Han T, Henikoff JG, Henz SR, Hinrichs A, Holster H, Hyman T, Iniguez AL, Janette J, Jensen M, Kato M, Kent WJ, Kephart E, Khivansara V, Khurana E, Kim JK, Kolasinska-Zwierz P, Lai EC, Latorre I, Leahey A, Lewis S, Lloyd P, Lochovsky L, Lowdon RF, Lubling Y, Lyne R, MacCoss M, Mackowiak SD, Mangone M, McKay S, Mecenas D, Merrihew G, Miller DM 3rd, Muroyama A, Murray JI, Ooi SL, Pham H, Phippen T, Preston EA, Rajewsky N, Rätsch G, Rosenbaum H, Rozowsky J, Rutherford K, Ruzanov P, Sarov M, Sasidharan R, Sboner A, Scheid P, Segal E, Shin H, Shou C, Slack FJ, Slightam C, Smith R, Spencer WC, Stinson EO, Taing S, Takasaki T, Vafeados D, Voronina K, Wang G, Washington NL, Whittle CM, Wu B, Yan KK, Zeller G, Zha Z, Zhong M, Zhou X, modENCODE Consortium., Ahringer J, Strome S, Gunsalus KC, Micklem G, Liu XS, Reinke V, Kim SK, Hillier LW, Henikoff S, Piano F, Snyder M, Stein L, Lieb JD, Waterston RH: Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science. 2010 Dec 24; 2010 Dec 22. PMID: 21177976
- Bhardwaj N, Kim PM, Gerstein MB: Rewiring of transcriptional regulatory networks: hierarchy, rather than connectivity, better reflects the importance of regulators. Sci Signal. 2010 Nov 2; 2010 Nov 2. PMID: 21045205
- Khurana E, Lam HY, Cheng C, Carriero N, Cayting P, Gerstein MB: Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Res. 2010 Nov; 2010 Jul 8. PMID: 20615899
- Cheung KH, Samwald M, Auerbach RK, Gerstein MB: Structured digital tables on the Semantic Web: toward a structured digital literature. Mol Syst Biol. 2010 Aug 24. PMID: 20739925
- Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB: Annotating non-coding regions of the genome. Nat Rev Genet. 2010 Aug; 2010 Jul 13. PMID: 20628352
- Patel PV, Gianoulis TA, Bjornson RD, Yip KY, Engelman DM, Gerstein MB: Analysis of membrane proteins in metagenomics: networks of correlated environmental features and protein families. Genome Res. 2010 Jul; 2010 Apr 29. PMID: 20430783
- Bhardwaj N, Carson MB, Abyzov A, Yan KK, Lu H, Gerstein MB: Analysis of combinatorial regulation: scaling of partnerships between regulators with the number of governed targets. PLoS Comput Biol. 2010 May 27; 2010 May 27. PMID: 20523742
- Lam HY, Kim PM, Mok J, Tonikian R, Sidhu SS, Turk BE, Snyder M, Gerstein MB: MOTIPS: automated motif analysis for predicting targets of modular protein domains. BMC Bioinformatics. 2010 May 11; 2010 May 11. PMID: 20459839
- Bhardwaj N, Yan KK, Gerstein MB: Analysis of diverse regulatory networks in a hierarchical context shows consistent tendencies for collaboration in the middle levels. Proc Natl Acad Sci U S A. 2010 Apr 13; 2010 Mar 29. PMID: 20351254
- Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, Waszak SM, Habegger L, Rozowsky J, Shi M, Urban AE, Hong MY, Karczewski KJ, Huber W, Weissman SM, Gerstein MB, Korbel JO, Snyder M: Variation in transcription factor binding among humans. Science. 2010 Apr 9; 2010 Mar 18. PMID: 20299548
- Fang G, Bhardwaj N, Robilotto R, Gerstein MB: Getting started in gene orthology and functional analysis. PLoS Comput Biol. 2010 Mar 26; 2010 Mar 26. PMID: 20361041
- Mok J, Kim PM, Lam HY, Piccirillo S, Zhou X, Jeschke GR, Sheridan DL, Parker SA, Desai V, Jwa M, Cameroni E, Niu H, Good M, Remenyi A, Ma JL, Sheu YJ, Sassi HE, Sopko R, Chan CS, De Virgilio C, Hollingsworth NM, Lim WA, Stern DF, Stillman B, Andrews BJ, Gerstein MB, Snyder M, Turk BE: Deciphering protein kinase specificity through large-scale analysis of yeast phosphorylation site motifs. Sci Signal. 2010 Feb 16; 2010 Feb 16. PMID: 20159853
- Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, Tewari AK, Kitabayashi N, Moss BJ, Chee MS, Demichelis F, Rubin MA, Gerstein MB: FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data. Genome Biol. 2010; 2010 Oct 21. PMID: 20964841
- Lam HY, Mu XJ, Stütz AM, Tanzer A, Cayting PD, Snyder M, Kim PM, Korbel JO, Gerstein MB: Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat Biotechnol. 2010 Jan; 2009 Dec 27. PMID: 20037582
- Arinaminpathy Y, Khurana E, Engelman DM, Gerstein MB: Computational analysis of membrane proteins: the largest class of drug targets. Drug Discov Today. 2009 Dec; 2009 Sep 3. PMID: 19733256
- Sboner A, Karpikov A, Chen G, Smith M, Mattoon D, Freeman-Cook L, Schweitzer B, Gerstein MB: Robust-linear-model normalization to reduce technical variability in functional protein microarrays. J Proteome Res. 2009 Dec. PMID: 19817483
- Liu YJ, Zheng D, Balasubramanian S, Carriero N, Khurana E, Robilotto R, Gerstein MB: Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics. 2009 Oct 16; 2009 Oct 16. PMID: 19835609
- Alexander RP, Kim PM, Emonet T, Gerstein MB: Understanding modularity in molecular networks requires dynamics. Sci Signal. 2009 Jul 28; 2009 Jul 28. PMID: 19638611
- Korbel JO, Tirosh-Wagner T, Urban AE, Chen XN, Kasowski M, Dai L, Grubert F, Erdman C, Gao MC, Lange K, Sobel EM, Barlow GM, Aylsworth AS, Carpenter NJ, Clark RD, Cohen MY, Doran E, Falik-Zaccai T, Lewin SO, Lott IT, McGillivray BC, Moeschler JB, Pettenati MJ, Pueschel SM, Rao KW, Shaffer LG, Shohat M, Van Riper AJ, Warburton D, Weissman S, Gerstein MB, Snyder M, Korenberg JR: The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. Proc Natl Acad Sci U S A. 2009 Jul 21; 2009 Jul 13. PMID: 19597142
- Du J, Bjornson RD, Zhang ZD, Kong Y, Snyder M, Gerstein MB: Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants. PLoS Comput Biol. 2009 Jul; 2009 Jul 10. PMID: 19593373
- Ni L, Bruce C, Hart C, Leigh-Bell J, Gelperin D, Umansky L, Gerstein MB, Snyder M: Dynamic and complex transcription factor binding during an inducible response in yeast. Genes Dev. 2009 Jun 1. PMID: 19487574
- Gianoulis TA, Raes J, Patel PV, Bjornson R, Korbel JO, Letunic I, Yamada T, Paccanaro A, Jensen LJ, Snyder M, Bork P, Gerstein MB: Quantifying environmental adaptation of metabolic pathways in metagenomics. Proc Natl Acad Sci U S A. 2009 Feb 3; 2009 Jan 22. PMID: 19164758
- Keating KS, Flores SC, Gerstein MB, Kuhn LA: StoneHinge: hinge prediction by network analysis of individual protein structures. Protein Sci. 2009 Feb. PMID: 19180449
- Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein MB: PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol. 2009 Jan; 2009 Jan 4. PMID: 19122651
- Lam HY, Khurana E, Fang G, Cayting P, Carriero N, Cheung KH, Gerstein MB: Pseudofam: the pseudogene families database. Nucleic Acids Res. 2009 Jan; 2008 Oct 28. PMID: 18957444
- Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, Grubert F, Chen X, Weissman S, Snyder M, Gerstein MB: Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res. 2008 Dec; 2008 Oct 8. PMID: 18842824
- Flores SC, Keating KS, Painter J, Morcos F, Nguyen K, Merritt EA, Kuhn LA, Gerstein MB: HingeMaster: normal mode hinge prediction approach and integration of complementary predictors. Proteins. 2008 Nov 1. PMID: 18433058
- Korbel JO, Kim PM, Chen X, Urban AE, Weissman S, Snyder M, Gerstein MB: The current excitement about copy-number variation: how it relates to gene duplications and protein families. Curr Opin Struct Biol. 2008 Jun; 2008 May 27. PMID: 18511261
- Seringhaus MR, Cayting PD, Gerstein MB: Uncovering trends in gene naming. Genome Biol. 2008 Jan 31; 2008 Jan 31. PMID: 18254929
- Kim PM, Korbel JO, Gerstein MB: Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci U S A. 2007 Dec 18; 2007 Dec 12. PMID: 18077332
- Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, Chen Z, Tanzer A, Saunders AC, Chi J, Yang F, Carter NP, Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm M, Snyder M: Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007 Oct 19; 2007 Sep 27. PMID: 17901297
- Lu LJ, Sboner A, Huang YJ, Lu HX, Gianoulis TA, Yip KY, Kim PM, Montelione GT, Gerstein MB: Comparing classical pathways and modern networks: towards the development of an edge ontology. Trends Biochem Sci. 2007 Jul; 2007 Jun 20. PMID: 17583513
- Flores SC, Gerstein MB: FlexOracle: predicting flexible hinges by identification of stable domains. BMC Bioinformatics. 2007 Jun 22; 2007 Jun 22. PMID: 17587456
- Korbel JO, Urban AE, Grubert F, Du J, Royce TE, Starr P, Zhong G, Emanuel BS, Weissman SM, Snyder M, Gerstein MB: Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. Proc Natl Acad Sci U S A. 2007 Jun 12; 2007 Jun 5. PMID: 17551006
- Royce TE, Carriero NJ, Gerstein MB: An efficient pseudomedian filter for tiling microrrays. BMC Bioinformatics. 2007 Jun 7; 2007 Jun 7. PMID: 17555595
- Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB, Ruan Y, Snyder M: Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 2007 Jun. PMID: 17568005
- Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, Ruan Y, Wei CL, Gingeras TR, Guigó R, Harrow J, Gerstein MB: Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res. 2007 Jun. PMID: 17568002
- Zhang ZD, Paccanaro A, Fu Y, Weissman S, Weng Z, Chang J, Snyder M, Gerstein MB: Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 2007 Jun. PMID: 17567997
- Rozowsky JS, Newburger D, Sayward F, Wu J, Jordan G, Korbel JO, Nagalakshmi U, Yang J, Zheng D, Guigó R, Gingeras TR, Weissman S, Miller P, Snyder M, Gerstein MB: The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci. Genome Res. 2007 Jun. PMID: 17567993
- Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Korbel JO, Emanuelsson O, Zhang ZD, Weissman S, Snyder M: What is a gene, post-ENCODE? History and updated definition. Genome Res. 2007 Jun. PMID: 17567988
- Emanuelsson O, Nagalakshmi U, Zheng D, Rozowsky JS, Urban AE, Du J, Lian Z, Stolc V, Weissman S, Snyder M, Gerstein MB: Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome. Genome Res. 2007 Jun; 2006 Nov 21. PMID: 17119069
- Flores SC, Lu LJ, Yang J, Carriero N, Gerstein MB: Hinge Atlas: relating protein sequence to sites of structural flexibility. BMC Bioinformatics. 2007 May 22; 2007 May 22. PMID: 17519025
- Zheng D, Gerstein MB: The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends Genet. 2007 May; 2007 Mar 26. PMID: 17382428
- Royce TE, Rozowsky JS, Gerstein MB: Assessing the need for sequence-based normalization in tiling microarray experiments. Bioinformatics. 2007 Apr 15; 2007 Mar 25. PMID: 17387113
- Seringhaus MR, Gerstein MB: Publishing perishing? Towards tomorrow's information architecture. BMC Bioinformatics. 2007 Jan 19; 2007 Jan 19. PMID: 17239245
- Royce TE, Rozowsky JS, Gerstein MB: Toward a universal microarray: prediction of gene expression through nearest-neighbor probe sequence identification. Nucleic Acids Res. 2007; 2007 Aug 7. PMID: 17686789
- Kim PM, Lu LJ, Xia Y, Gerstein MB: Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006 Dec 22. PMID: 17185604
- Urban AE, Korbel JO, Selzer R, Richmond T, Hacker A, Popescu GV, Cubells JF, Green R, Emanuel BS, Gerstein MB, Weissman SM, Snyder M: High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. Proc Natl Acad Sci U S A. 2006 Mar 21; 2006 Mar 14. PMID: 16537408
- Zheng D, Gerstein MB: A computational approach for identifying pseudogenes in the ENCODE regions. Genome Biol. 2006; 2006 Aug 7. PMID: 16925835
- Royce TE, Rozowsky JS, Luscombe NM, Emanuelsson O, Yu H, Zhu X, Snyder M, Gerstein MB: Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies. Methods Enzymol. 2006. PMID: 16939796
- Kumar A, Seringhaus M, Biery MC, Sarnovsky RJ, Umansky L, Piccirillo S, Heidtman M, Cheung KH, Dobry CJ, Gerstein MB, Craig NL, Snyder M: Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon. Genome Res. 2004 Oct. PMID: 15466296
- Kumar A, Harrison PM, Cheung KH, Lan N, Echols N, Bertone P, Miller P, Gerstein MB, Snyder M: An integrated approach for finding overlooked genes in yeast. Nat Biotechnol. 2002 Jan. PMID: 11753363