G

Gemma 3n E4B It

Developed by google
Gemma 3n is a lightweight and state-of-the-art open-source multimodal model family launched by Google. It is built on the same research and technology as the Gemini model and supports text, audio, and visual inputs.
Downloads 1,690
Release Time : 6/3/2025

Model Overview

Gemma 3n is a multimodal model capable of processing text, audio, image, and video inputs, suitable for various tasks such as automatic speech recognition and automatic speech translation.

Model Features

Multimodal input support
Capable of simultaneously processing text, audio, image, and video inputs to achieve cross-modal understanding and generation.
Efficient resource utilization
Adopts selective parameter activation technology to achieve high performance with 4B effective parameters, and the memory usage is comparable to that of traditional 4B models.
Extensive training data
Trained on a diverse dataset containing approximately 11 trillion tokens, covering web documents, code, mathematics, images, and audio.
Architectural innovation
Adopts the MatFormer architecture, allowing nested sub - models in the E4B model to improve model efficiency.

Model Capabilities

Text generation
Image content analysis
Speech recognition
Multilingual translation
Code generation
Mathematical reasoning
Visual question answering

Use Cases

Content creation and communication
Creative text generation
Generate creative text formats such as poems, scripts, and marketing copy.
Can generate diverse text content that meets the theme and style
Image description generation
Generate detailed descriptions based on the input image.
Can accurately identify objects, scenes, and activities in the image
Research and education
NLP research
Serve as a base model for natural language processing and generation model research.
Supports experiments and development of various NLP tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase