G

Granite Speech 3.3 8b

Developed by ibm-granite
A compact and efficient speech-language model designed for Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST), featuring a two-stage design for processing audio and text
Downloads 5,532
Release Time : 4/14/2025

Model Overview

A speech-language model adapted from Granite-3.3-8b-instruct, excelling in English speech-to-text and English-to-multilingual speech translation, trained with modality alignment techniques

Model Features

Two-stage processing design
First transcribes audio into text, then processes the text through the underlying language model, reducing the risk of modality interference
Multi-task support
Simultaneously supports both speech recognition (ASR) and speech translation (AST) tasks
Efficient architecture
10-layer Conformer encoder combined with a 2-layer Transformer downsampler achieves 10x temporal compression
Enterprise-grade optimization
Optimized for enterprise speech processing scenarios, particularly excels in English and major European languages

Model Capabilities

English speech-to-text
English-to-multilingual speech translation
Plain text task processing
Long audio processing (supports 128k context)

Use Cases

Speech transcription
Meeting minutes automation
Real-time transcription of English meeting recordings into text records
Achieves SOTA performance on the CommonVoice-17 test set
Cross-language communication
Real-time speech translation
Real-time conversion of English to French/Spanish and other languages
Outperforms similar 8B-parameter models on the IWSLT test set
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase