ZONTAL’s Post

View organization page for ZONTAL, graphic

4,617 followers

1mo Edited

Unlock the potential of Meta’s Llama3 for biotech applications with torchtune, a PyTorch-native library that simplifies fine-tuning of large language models. Whether you're looking to enhance drug discovery, genetic analysis, or scientific writing, our latest blog provides a guide to leverage this powerful tool effectively. 🔗 Learn more here: https://lnkd.in/g9WnrNSN #Biotechnology #Llama3 #torchtune #AI #MachineLearning

Getting Started with Fine-Tuning Llama3 Using Torchtune

https://zontal.io

To view or add a comment, sign in

More Relevant Posts

Diogo Camacho

Biotech Executive | Comp Bio | AI/ML | Computing Biology
3mo
Report this post
Because the answer was not explicit and Matt Goddeeris was curious, a follow up on how much data we need. Follow me for more explorations of Computational Biology and ML for drug discovery and beyond (we'll find out together what's beyond...) Happy computing! #datascience #machinelearning #ai #computationalbiology #drugdiscovery https://lnkd.in/ePuH3UUz

How much data, again?

http://computingbiology.blog
Like Comment
To view or add a comment, sign in
Hilbert Lam

Part-time researcher and biological sciences student at Nanyang Technological University
7mo
Report this post
📝 Did you know that large language models (LLMs) are not restricted to generating conversational text but can also be used to analyse nucleic acid, amino acid sequences and even single-cell multi-omics data? 👷 By deconstructing pre-trained LLMs on biological data and leveraging on its internal workings, numerical representations of sequences can be obtained - these representations can then be used to decipher phylogeny, and even biological function. 🧬 Intuitively, these LLMs can be trained and used to understand the fundamental structure of protein and nucleic acid sequences by treating them as a "language", and be deployed to annotate, compare, and analyse genomes. If you are interested and would like to know more, please check out this recent review entitled "Large Language Models in Plant Biology", which I wrote with A/Prof. Marek Mutwil and Ong Xing Er. The review covers: 💻 1) Different types of language models and their usage in biology 📊 2) Transformer-based models and their anatomy 📈 3) How the models can be interpreted to gain new insights ☘ 4) How plant scientists could deploy LLMs https://lnkd.in/gX_UrA8q

Large Language Models in Plant Biology

arxiv.org
Like Comment
To view or add a comment, sign in
Felipe Villena, PhD

Target discovery | Data science | Biology | NGS
9mo
Report this post
🔬 Navigating biological metadata with AI! In the vast landscape of biological data, repositories like the Gene Expression Omnibus (GEO), which have minimal data entry standards, pose a significant challenge for researchers. Delving into the metadata to grasp the essence and rationale of experiments often becomes an arduous task. The unstructured nature of this data demands substantial manual efforts, as scientists strive to identify potentially valuable information. 🎯 To address this, I harnessed the prowess of large language models. By Integrating GPT-4 with Langchain, I've sculpted a transformative approach to streamline metadata processing and being able to get insights quickly. To demonstrate its capabilities, I've spotlighted metadata related to NGS experiments in the context of Non-Alcoholic SteatoHepatitis (NASH). 🔗 Curious about the transformation? Dive into the deployed app: https://lnkd.in/esWzvwDa 🚀 Future developments: If this is of interest to some of you, I may think in incorporating the following features: Automating differential expression workflow. Introducing support for proteomics identification. UI improvements Refining the approach in Llama 2 for optimal cost and efficiency. Feedback and potential collaborations are all welcome. #TargetDiscovery #Bioinformatics #GPT4 #NGS #LargeLanguageModels
Like Comment
To view or add a comment, sign in
Ravi Shankar

Principal Scientist at CSIR-Institute of Himalayan Bioresource Technology
3w Edited
Report this post
Our very important work on plant transcription factors binding sites discovery is out now. It is a revolutionary algorithm which will enable the plant scientists to accurately identify DNA binding spots for any given Transcription factor for any plant genome, whether old or totally unseen before. This work also brings forth a fact that gross neglect of immense genomic variability in plants was made where wrong practise of TFBS finding became an accepted norm, generating largely wrong interpretations and misleading findings in so many years. The software developed here, PTFSpot, will also empower the plant biologists to bypass DAP-seq like costly experiments and identify TFs and their binding preferred regions across any genome without any boundings, without any need of prior annotations, and knowledge, and with utmost accuracy. Thanks to Department of Biotechnology 🙏🏽whose funding and support helps us to make this world witness such work. Here is the publication:

PTFSpot: deep co-learning on transcription factors and their binding regions attains impeccable universality in plants

academic.oup.com

10 Comments
Like Comment
To view or add a comment, sign in
Lies Benmiloud-Béchet

Digital Excellence Center Director
6mo Edited
Report this post
Biomimicry for Coding: "Romera-Paredes and colleagues’ work is the latest step in a long line of research that attempts to create programs automatically by taking inspiration from biological evolution, a field called genetic programming"

Large language models help computer programs to evolve

nature.com

1 Comment
Like Comment
To view or add a comment, sign in
Vector Laboratories, Inc.

8,746 followers
8mo
Report this post
We get by with a little help from our robot friends. Given that post-translational modifications, such as glycosylation, add a new layer of complexity to analyzing protein-protein and protein-drug interactions, the application of AI to glycobiology is necessary to understand and may predict the role of glycans in various forms of cellular behavior. Dive deeper into the world of bioinformatics with our newest article: https://bit.ly/47T5q6N #VectorLaboratories #Bioinformatics #Glycoinformatics

SugarGPT: Envisioning the Future of Glycoinformatics

https://vectorlabs.com
Like Comment
To view or add a comment, sign in
Dharmesh Patel, Ph.D

Cell Therapy l Gene editing l CMC l Process development l Cancer Immunotherapies I MBA Candidate
3mo
Report this post
Machine learning, particularly through genomic language models (gLMs), has been effectively used to learn latent functional and regulatory relationships between genes by training on millions of metagenomic scaffolds. gLM not only captures the genomic context and protein sequence but also encodes biologically meaningful information, such as enzymatic functions and taxonomy, through contextualized protein embeddings. The analysis of gLM’s attention patterns reveals that it successfully learns co-regulated functional modules, like operons, demonstrating its potential to uncover complex relationships between genes in genomic regions.

New Harvard-Developed AI System Unlocks Biology’s Source Code

https://scitechdaily.com
Like Comment
To view or add a comment, sign in
Sergei RYBALKO

Senior Data Scientist, PhD @SAP Labs
5mo
Report this post
"achieved remarkable success in various domains such as language and computer vision. Specifically, the combination of large-scale diverse datasets and pretrained transformers has emerged as a promising approach for developing foundation models. Drawing parallels between language and cellular biology (in which texts comprise words; similarly, cells are defined by genes), our study probes the applicability of foundation models to advance cellular biology and genetic research. Using burgeoning single-cell sequencing data, we have constructed a foundation model for single-cell biology, scGPT, based on a generative pretrained transformer across a repository of over 33 million cells. Our findings illustrate that scGPT effectively distills critical biological insights concerning genes and cells. Through further adaptation of transfer learning, scGPT can be optimized to achieve superior performance across diverse downstream applications. This includes tasks such as cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction and gene network inference" https://lnkd.in/dzVmmM7P

scGPT: toward building a foundation model for single-cell multi-omics using generative AI - Nature Methods

nature.com
Like Comment
To view or add a comment, sign in
Ahmed Essaghir

Product Director, Tech R&D @ GSK
1mo
Report this post
We all understand that a protein's sequence determines its 3D structure, which in turn dictates its function. The interplay of sequence, structure, and function are the key to understanding every mecanism proteins are involved in, including diseases. A multi-modal protein language model that « understands » this relationship is poised to revolutionize Bioinformatics and potentially becoming the 'intelligent Swiss knife' for protein design. It’s amazing how Biology is now being demystified by matrix multiplications inside a GPU ;) https://lnkd.in/eixAj9wC https://lnkd.in/e925u_EF

Ex-Meta scientists debut gigantic AI protein design model

nature.com

1 Comment
Like Comment
To view or add a comment, sign in
Syngens

211 followers
9mo
Report this post
In the fast-growing landscape of synthetic biology, the conventional trial-and-error methods often act as roadblocks, slowing our momentum in organism engineering and biomanufacturing. While these traditional approaches have served us, there's an undeniable need for more predictive and efficient strategies. Inspired by the precision and computational prowess observed in sectors like architecture and aerospace, we're poised to take synthetic biology to the next level. We're pleased to announce that our latest preprint* is now available on bioRxiv. The paper outlines the innovative mechanics behind our DNA Design Platform, a system that leverages the power of artificial intelligence to enable context-sensitive decision-making. This capability allows us to introduce a more nuanced, scalable approach to gene expression optimisation across a variety of host organisms. Our platform signifies a crucial step forward in the field. It employs advanced deep learning algorithms, setting the stage for a transformative change in the way we approach the 'design, build, test, learn' cycle. This leads to a more efficient, predictive approach that constitutes a significant leap forward in the existing paradigms of synthetic biology. -- * From Context to Code: Rational De Novo DNA Design and Predicting Cross-Species DNA Functionality Using Deep Learning Transformer Models https://lnkd.in/d3qwj4CJ

From Context to Code: Rational De Novo DNA Design and Predicting Cross-Species DNA Functionality Using Deep Learning Transformer Models

biorxiv.org
Like Comment
To view or add a comment, sign in

4,617 followers

View Profile Follow

ZONTAL’s Post

Getting Started with Fine-Tuning Llama3 Using Torchtune

https://zontal.io

More from this author

New work from the creators of AlphaFold pushes the frontiers of Density Functional Theory

Predicting Transition State Structures with Tensor Field Networks and Transfer Learning

New work from the creators of AlphaFold pushes the frontiers of Density Functional Theory

Explore topics