whisper-small-ita Open-source Speech Recognition Model - Optimized for Italian, Enhanced Meta-information Capture Ability

Whisper Small Ita

Developed by litus-ai

An Italian-optimized speech recognition model based on OpenAI Whisper-small, enhanced with special tags for improved metadata capture

Speech Recognition

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Italian speech transcription #Paralinguistic tagging enhancement #Lightweight high-precision ASR

Downloads 193

Release Time : 10/23/2024

Model Overview

This model provides high-precision Italian speech transcription in computationally constrained environments, with special optimizations for paralinguistic elements and audio quality markers

Model Features

Metadata Tagging Enhancement

Supports recognition of paralinguistic elements like [laughter][sigh] and audio quality markers like [noise][inaudible]

Cost-Effectiveness Balance

Maintains high accuracy with limited computational resources, suitable for small-to-medium scale deployments

Code-Switching Recognition

Can identify and tag cross-language phenomena like [English code-mixing]

Model Capabilities

Italian speech transcription

English speech transcription

Non-verbal element recognition

Low-quality audio processing

Use Cases

Business scenarios

Meeting minutes transcription

Accurately transcribes Italian business meetings, including hesitations and corrections in speech

Recognition accuracy outperforms standard Whisper-small model

Customer service scenarios

Call content analysis

Processes customer service call recordings with background noise

Can tag [noise] and [inaudible] segments

🚀 Litus-ai/whisper-small-ita

This model is an optimized version of openai/whisper-small for the Italian language, offering an excellent balance between value and cost. It's ideal for scenarios with limited computational budgets but requiring accurate speech transcription.

✨ Features

This model is a version of openai/whisper-small optimized for the Italian language, trained using a portion of the proprietary data of Litus AI. litus-ai/whisper-small-ita represents a great value/cost compromise and is optimal for contexts where the computational budget is limited, but an accurate transcription of speech is still required.

Special Tokens

The main peculiarity of the model is the integration of special tokens that enrich the transcription with meta - information:

Paralinguistic elements: [LAUGH], [MHMH], [SIGH], [UHM]
Audio quality: [NOISE], [UNINT] (unintelligible)
Speech characteristics: [AUTOCOR] (autocorrections), [L - EN] (English code - switching)

These tokens allow for a richer transcription that captures not only the verbal content but also relevant contextual elements.

Evaluation

In the following graph, you can find the Accuracy of openai/whisper-small, openai/whisper-medium, litus-ai/whisper-small-ita, and Litus AI's proprietary model, litus-proprietary, on proprietary benchmarks for Italian meetings and voice calls.

📦 Installation

Since this model uses the transformers library, you need to install it if you haven't already. You can install it using pip:

pip install transformers datasets

💻 Usage Examples

Basic Usage

You can use litus-ai/whisper-small-ita through the "automatic-speech-recognition" pipeline of Hugging Face!

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset

# load model and processor
model_id = "litus-ai/whisper-small-ita"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# load Meta voxpopuli in italian
ds = load_dataset("facebook/voxpopuli", "it", split="test")
sample = ds[171]["audio"]  # sample having an "[UNINT]" token

input_features = processor(
  sample["array"],
  sampling_rate=sample["sampling_rate"],
  return_tensors="pt",
).input_features 

# generate token ids
predicted_ids = model.generate(input_features)

# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
# ["<|startoftranscript|><|it|><|transcribe|><|notimestamps|> Siamo all'ultimo miglio, non sprechiamo un'occasione per dimostrare che siamo autonomi [UNINT]<|endoftext|>"]

📚 Documentation

For any information on the architecture, the data used for pretraining, and the intended use, please refer to the Paper, the Model Card, and the Repository.

📄 License

This model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Optimized version of openai/whisper-small for Italian
Training Data	Part of the proprietary data of Litus AI
Pipeline Tag	automatic - speech - recognition
Tags	audio, automatic - speech - recognition, hf - asr - leaderboard
Library Name	transformers
Metrics	wer

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご