đ BioLingual
Transferable Models for bioacoustics with Human Language Supervision, an audio - text model for bioacoustics based on contrastive language - audio pretraining.
đ Quick Start
This model can be used for bioacoustic zero - shot audio classification or fine - tuning on bioacoustic tasks.
⨠Features
- An audio - text model for bioacoustics based on contrastive language - audio pretraining.
- Enables zero - shot audio classification in bioacoustics.
- Can be fine - tuned on bioacoustic tasks.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
Perform zero - shot audio classification using pipeline
from datasets import load_dataset
from transformers import pipeline
dataset = load_dataset("ashraq/esc50")
audio = dataset["train"]["audio"][-1]["array"]
audio_classifier = pipeline(task="zero-shot-audio-classification", model="davidrrobinson/BioLingual")
output = audio_classifier(audio, candidate_labels=["Sound of a sperm whale", "Sound of a sea lion"])
print(output)
>>> [{"score": 0.999, "label": "Sound of a dog"}, {"score": 0.001, "label": "Sound of vaccum cleaner"}]
Advanced Usage
Run the model on CPU
from datasets import load_dataset
from transformers import ClapModel, ClapProcessor
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_sample = librispeech_dummy[0]
model = ClapModel.from_pretrained("laion/clap-htsat-unfused")
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt")
audio_embed = model.get_audio_features(**inputs)
Run the model on GPU
from datasets import load_dataset
from transformers import ClapModel, ClapProcessor
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_sample = librispeech_dummy[0]
model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(0)
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt").to(0)
audio_embed = model.get_audio_features(**inputs)
đ Documentation
Datasets
- davidrrobinson/AnimalSpeak
đ§ Technical Details
No technical details are provided in the original document, so this section is skipped.
đ License
No license information is provided in the original document, so this section is skipped.