Sharing Evo 2, a new foundation model for biomolecular sciences, now available on NVIDIA BioNeMo. This is a collaboration between the Arc Institute, Stanford, UC Berkeley, UCSF, and NVIDIA. It's significant because it's trained on a massive dataset – nearly 9 trillion nucleotides of DNA, RNA, and protein sequences from across the tree of life.
Key aspects:
🧬 Genomic Scale: Trained on an enormous dataset covering diverse species. 🔬 Multimodal: Understands DNA, RNA, and protein sequences. 🧠 Long Context: Can process sequences up to 1 million nucleotides at once. 🚀 Powerful Architecture: Uses a "StripedHyena 2" architecture for efficiency. ✅ Open Components: Key parts, including fine-tuning, are available via the open-source NVIDIA BioNeMo Framework. 🔓Available as NVIDIA NIM microservice.
They've already shown it can predict the effects of gene mutations with high accuracy, and even design functional CRISPR-Cas systems. It's a powerful tool for anyone working with biological sequence data.
So, while AlphaFold primarily predicted existing structures, Evo 2 opens the door to designing entirely new biological sequences for things like drug discovery, agriculture, and materials science. What new possibilities does this unlock?