🚀 Cephalo-Gemma-3-4b
This project focuses on the Cephalo-Gemma-3-4b
checkpoint, which is more intensively fine - tuned with biological materials and spider silk datasets compared to lamm-mit/Cephalo-Gemma-3-4b-it-04-15-2025
. It provides a way to load the model and conduct inference, along with the results and related references.
🚀 Quick Start
Load model and do inference
import torch
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from transformers.image_utils import load_image
from PIL import Image as PILImage
ckpt = "lamm-mit/Cephalo-Gemma-3-4b-it-04-16-2025"
model = Gemma3ForConditionalGeneration.from_pretrained(
ckpt, device_map="auto", torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained(ckpt)
image=PILImage.open(f'./spiderweb.png').convert("RGB")
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a materials scientist."}
],
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "What does this image show? Provide a detailed analysis."}
]
}
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device)
input_len = inputs["input_ids"].shape[-1]
generation = model.generate(**inputs, max_new_tokens=512, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

💻 Usage Examples
Basic Usage
The above code demonstrates the basic process of loading the Cephalo-Gemma-3-4b
model and performing inference on an image of a spider's web. It first loads the model and processor, then prepares the input messages including the image and a question, and finally generates and decodes the output.
Results
The image shows a spider's web, which is a structure of silk, in a red-lit, glass-enclosed cube. The web is the result of a spider's natural behavior and is a complex, three-dimensional pattern. The cube, which is a 3D-printed structure, is the environment in which the spider has created the web. The red lighting and the glass enclosure are used to highlight the web and the cube, and the lighting and the cube's material (glass) are used to show the web's structure.
The spider's web is a natural and intricate design, and the cube is a man-made, 3D-printed structure. The image is a combination of the natural and the artificial, and the red lighting and the glass enclosure are used to show the web and the cube in a new and interesting way.
The image is a reminder of the beauty and complexity of the natural world and the possibilities of the artificial world. The spider's web is a natural and intricate design, and the cube is a man-made, 3D-printed structure. The image is a combination of the natural and the artificial, and the red lighting and the glass enclosure are used to show the web and the cube in a new and interesting way.
📚 Documentation
Reference
@article{Buehler_Cephalo_2024_journal,
title={Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design},
author={Markus J. Buehler},
journal={Advanced Functional Materials},
year={2024},
volume={34},
issue={49},
doi={2409531},
url={https://advanced.onlinelibrary.wiley.com/doi/full/10.1002/adfm.202409531}
}