ProGen, a generative AI language model developed by Salesforce Research, is fundamentally transforming protein design and biomedical research by learning to speak the “language of biology”. By adapting the predictive mechanics used in large language models (LLMs) like ChatGPT, ProGen designs functional, synthetic proteins from scratch. This approach sidesteps billions of years of natural evolution to create targeted medical and industrial solutions in a matter of weeks. The Shift from Alphabet to Amino Acids
Traditional protein engineering relies on modifying existing, naturally occurring proteins or using directed evolution—a tedious process of manual trial and error. ProGen approaches this challenge by treating amino acids as letters and protein sequences as sentences.
Massive Datasets: The model was trained on 280 million protein sequences spanning 19,000 distinct families.
Contextual Tagging: Researchers feed the model conditioning tags like taxonomic, functional, or locational data.
Next-Token Prediction: Just as a text AI predicts the next word in a sentence, ProGen predicts the next amino acid in a molecular sequence.
Through this unsupervised sequence learning, ProGen bypasses the need for intensive, expensive structural imaging to map how a protein will fold. Outperforming Nature in the Lab
What sets ProGen apart from early AI models is its remarkable success rate during physical laboratory testing. Researchers partnered with Tierra Biosciences and the University of California, San Francisco (UCSF) to synthesize ProGen’s designs. They tasked the AI with creating synthetic lysozymes—antibacterial enzymes found in human tears and saliva.
Progen: Ushering in a New Era in Protein Design – Distilled Post
Leave a Reply