Fast, sensitive detection of protein homologs using deep dense retrieval
Hong L, Hu Z, Sun S, Tang X, Wang J, Tan Q, Zheng L, Wang S, Xu S, King I, Gerstein M, Li Y. Fast, sensitive detection of protein homologs using deep dense retrieval. Nature Biotechnology 2024, 1-13. PMID: 39123049, DOI: 10.1038/s41587-024-02353-6.Peer-Reviewed Original ResearchProtein language modelsRemote homologsProtein homologsProtein sequence comparisonsAlignment-based approachesWell-characterized proteinsPSI-BLASTSuperfamily levelProtein evolutionSequence comparisonProtein sequencesHomologyProteinSensitivity compared to previous methodsSensitive detectionHMMERSuperfamilyStructural informationSequenceMolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations
Tang X, Tran A, Tan J, Gerstein M. MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics 2024, 40: i357-i368. PMID: 38940177, PMCID: PMC11256921, DOI: 10.1093/bioinformatics/btae260.Peer-Reviewed Original ResearchConceptsTransformer encoderDownstream tasksLanguage modelBiomedical textSelf-supervised pre-trainingExplicit 3D representationRepresentation improves performanceDeep learning modelsRepresentation of moleculesContrastive learningSupervisory signalExtract embeddingsRepresentation capabilityJoint representationBiomedical domainPre-trainingTextual dataLearning modelsMolecular representationsModel weightsJupyter NotebookStep-by-step guidanceEncodingProperty predictionStructural information