TxGemma-27b-chat Open-Source Language Model - Free Support for Therapeutic Development, Multiple Sizes Available

Txgemma 27b Chat

Developed by google

TxGemma is a lightweight open-source language model based on Gemma 2, fine-tuned specifically for therapeutic development, available in 2B, 9B, and 27B sizes.

Large Language Model

Transformers

EnglishOpen Source License:Other #Drug Property Prediction #Therapeutic Development Dialogue #Multimodal Therapy Understanding

Downloads 1,221

Release Time : 3/21/2025

Model Overview

TxGemma is a set of language models optimized for therapeutic development, capable of processing and understanding information related to various therapeutic modalities and targets, including small molecules, proteins, nucleic acids, diseases, and cell lines.

Model Features

Versatility

Excels in a wide range of therapeutic tasks, outperforming or matching the best performance in numerous benchmarks.

Data Efficiency

Demonstrates competitive performance even with limited data, showing improvements over previous models.

Conversational Ability

Includes a conversational variant (TxGemma-Chat) capable of natural language dialogue and explaining the logic behind predictions.

Fine-Tuning Foundation

Serves as a pre-trained base for further fine-tuning for specialized purposes.

Model Capabilities

Therapeutic Property Prediction

Drug-Target Interaction Prediction

Clinical Trial Approval Prediction

Natural Language Dialogue

Prediction Logic Explanation

Use Cases

Drug Discovery

Blood-Brain Barrier Penetration Prediction

Predicts whether a drug can penetrate the blood-brain barrier based on its SMILES string.

Excellent performance on the BBB_Martins task

Target Identification

Assists researchers in identifying potential therapeutic targets.

Therapeutic Development

Drug Property Prediction

Predicts properties of various therapeutic drugs and targets.

🚀 TxGemma Model Card

TxGemma is a collection of lightweight, state-of-the-art open language models fine - tuned for therapeutic development. It offers high versatility, data efficiency, and conversational capabilities, serving as a powerful tool for drug discovery and related research.

🚀 Quick Start

To quickly start using the TxGemma model, you can refer to the following code snippets to run the model locally on GPU. If you plan to run inference on a large number of inputs, it is recommended to create a production version using Model Garden.

💻 Usage Examples

Basic Usage

import json
from huggingface_hub import hf_hub_download

# Load prompt template for tasks from TDC
tdc_prompts_filepath = hf_hub_download(
    repo_id="google/txgemma-27b-chat",
    filename="tdc_prompts.json",
)
with open(tdc_prompts_filepath, "r") as f:
    tdc_prompts_json = json.load(f)

# Set example TDC task and input
task_name = "BBB_Martins"
input_type = "{Drug SMILES}"
drug_smiles = "CN1C(=O)CN=C(C2=CCCCC2)c2cc(Cl)ccc21"

# Construct prompt using template and input drug SMILES string
TDC_PROMPT = tdc_prompts_json[task_name].replace(input_type, drug_smiles)
print(TDC_PROMPT)

Advanced Usage

# pip install accelerate transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model directly from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained("google/txgemma-27b-chat")
model = AutoModelForCausalLM.from_pretrained(
    "google/txgemma-27b-chat",
    device_map="auto",
)

# Formatted TDC prompt (see "Formatting prompts for therapeutic tasks" section above)
prompt = TDC_PROMPT

# Prepare tokenized inputs
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate response
outputs = model.generate(**input_ids, max_new_tokens=8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

✨ Features

Versatility: Demonstrates strong performance across a wide range of therapeutic tasks, outperforming or matching best - in - class performance on many benchmarks.
Data Efficiency: Shows competitive performance with limited data compared to larger models, and offers improvements over its predecessors.
Conversational Capability (TxGemma - Chat): Includes conversational variants that can engage in natural language dialogue and explain the reasoning behind their predictions.
Foundation for Fine - tuning: Can be used as a pre - trained foundation for specialized use cases.

📦 Installation

The code snippets provided assume you have installed the necessary libraries. You can install them using the following commands:

pip install accelerate transformers

📚 Documentation

Model Information

TxGemma is a collection of lightweight, state - of - the - art, open language models built upon Gemma 2, fine - tuned for therapeutic development. It comes in 3 sizes: 2B, 9B, and 27B.

Potential Applications:

Accelerated Drug Discovery: Streamline the therapeutic development process by predicting properties of therapeutics and targets for various tasks, such as target identification, drug - target interaction prediction, and clinical trial approval prediction.

How to Use

Formatting prompts for therapeutic tasks: Refer to the code example above for formatting prompts according to the TDC structure.
Running the model on predictive tasks: You can use the AutoTokenizer and AutoModelForCausalLM classes from the transformers library, or the pipeline API to run the model.
Applying the chat template for conversational use: Use the tokenizer's built - in chat template to format prompts for conversational use.

Examples

Quick start: Quick start notebook in Colab
Fine - tuning: Fine - tuning notebook in Colab
Agentic workflow: Agentic workflow notebook in Colab

Model architecture overview

Base Model: Gemma 2 (2B, 9B, and 27B parameter versions).
Fine - tuning Data: Therapeutics Data Commons.
Training Approach: Instruction fine - tuning using a mixture of therapeutic data (TxT) and, for conversational variants, general instruction - tuning data.

Technical Specifications

Property	Details
Model Type	Decoder - only Transformer (based on Gemma 2)
Key publication	TxGemma: Efficient and Agentic LLMs for Therapeutics
Model created	2025 - 03 - 18 (From the TxGemma Variant Proposal)
Model Version	1.0.0

Performance & Validation

TxGemma's performance has been validated on a comprehensive benchmark of 66 therapeutic tasks derived from TDC.

Key performance metrics

Aggregated Improvement: Improves over the original Tx - LLM paper on 45 out of 66 therapeutic tasks.
Best - in - Class Performance: Surpasses or matches best - in - class performance on 50 out of 66 tasks, exceeding specialist models on 26 tasks.

Inputs and outputs

Input: Text. For best performance, text prompts should be formatted according to the TDC structure.
Output: Text.

Dataset details

Training dataset: Therapeutics Data Commons and General Instruction - Tuning Data (for TxGemma - Chat).
Evaluation dataset: Therapeutics Data Commons, using the same 66 tasks for evaluation.

Implementation information

Training was done using JAX, which allows for faster and more efficient training of large models on the latest generation of hardware, including TPUs.

Use and limitations

Intended use: Research and development of therapeutics.
Benefits: Strong performance, data efficiency, fine - tuning foundation, and integration into agentic workflows.
Limitations: Trained on public data from TDC, task - specific validation is required, and downstream applications need to be validated with appropriate data.

🔧 Technical Details

TxGemma is based on the Gemma 2 family of lightweight, state - of - the - art open LLMs. It utilizes a decoder - only transformer architecture. The fine - tuning data comes from the Therapeutics Data Commons, which covers diverse therapeutic modalities and targets. The training approach involves instruction fine - tuning using a mixture of therapeutic data and, for conversational variants, general instruction - tuning data.

📄 License

The use of TxGemma is governed by the Health AI Developer Foundations terms of use.

Citation

@article{wang2025txgemma,
    title={TxGemma: Efficient and Agentic LLMs for Therapeutics},
    author={Wang, Eric and Schmidgall, Samuel and Jaeger, Paul F. and Zhang, Fan and Pilgrim, Rory and Matias, Yossi and Barral, Joelle and Fleet, David and Azizi, Shekoofeh},
    year={2025},
}

Find the paper here.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご