Bloomz Open-Source Large Language Model - Supports 46 Languages and Programming, Focuses on Text Generation Tasks

Bloomz

Developed by bigscience

BLOOMZ is a multilingual large language model supporting 46 languages and 13 programming languages, specializing in text generation tasks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Openrail #Multilingual text generation #Cross-lingual reasoning #Coreference resolution

Downloads 1,223

Release Time : 9/17/2022

Model Overview

BLOOMZ is a multilingual large language model developed by the BigScience team, featuring robust cross-lingual text generation capabilities suitable for various natural language processing tasks.

Model Features

Multilingual support

Supports 46 human languages and 13 programming languages with strong cross-lingual processing capabilities

Diverse task handling

Capable of performing various NLP tasks including text generation, sentiment analysis, Q&A, and code generation

Cross-lingual transfer ability

Excels in cross-lingual coreference resolution tasks like XWinograd

Model Capabilities

Text generation

Sentiment analysis

Q&A systems

Code generation

Multilingual translation

Fable creation

Mathematical problem solving

Use Cases

Sentiment analysis

Chinese-English sentiment analysis

Analyze sentiment polarity (positive/neutral/negative) of given text

Can accurately identify sentiment polarity in Chinese and English texts

Educational assistance

Multilingual Q&A

Answer various knowledge questions in different languages

Can accurately answer multilingual questions across fields like science and mathematics

Content creation

Multilingual story generation

Generate stories or fables based on given themes and languages

Can produce multilingual stories that meet thematic requirements, including moral lessons

Programming assistance

Code generation

Generate code snippets with specific algorithmic complexity based on requirements

Can produce executable code that meets complexity requirements

🚀 Project Introduction

This project is a text - generation model with a wide range of supported languages and programming languages. It has been tested on multiple datasets and shows performance in various NLP tasks.

📚 Documentation

Datasets

The model uses the following dataset:

bigscience/xP3

Supported Languages

The model supports the following languages:

ak, ar, as, bm, bn, ca, code, en, es, eu, fon, fr, gu, hi, id, ig, ki, kn, lg, ln, ml, mr, ne, nso, ny, or, pa, pt, rn, rw, sn, st, sw, ta, te, tn, ts, tum, tw, ur, vi, wo, xh, yo, zh, zu

Supported Programming Languages

The model supports the following programming languages:

C, C++, C#, Go, Java, JavaScript, Lua, PHP, Python, Ruby, Rust, Scala, TypeScript

Pipeline Tag

text - generation

Inference

false

Widget Examples

Example Title	Text
zh - en sentiment	"A legendary beginning, an immortal myth. This is not just a movie, but a label for entering a new era, always inscribed in the annals of history. Would you rate the previous review as positive, neutral or negative?"
zh - zh sentiment	"一个传奇的开端，一个不灭的神话，这不仅仅是一部电影，而是作为一个走进新时代的标签，永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评？"
vi - en query	"Suggest at least five related search terms to "Mạng neural nhân tạo"."
fr - fr query	"Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels»."
te - en qa	"Explain in a sentence in Telugu what is backpropagation in neural networks."
en - en qa	"Why is the sky blue?"
zh - en qa	"Explain to me in Traditional Chinese what is the difference between Bitcoin and Ethereum."
code - en	"Write a code snippet with O(log(n)) computational complexity."
es - en fable	"Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is "Heroes Come in All Shapes and Sizes". Story (in Spanish):"
hi - en fable	"Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is "Violence is the last refuge of the incompetent". Fable (in Hindi):"
en - de - ar - fr - zh math	"How many sides does a rectangle and heptagon have, when combined? Answer this question with some math. Ein Rechteck hat 4 Seiten. Ein Siebeneck hat 7 Seiten. In Kombination haben sie 4 + 7 = 11 Seiten. كم عدد الأضلاع التي يجمعها المربع والمثلث؟ Répondez à cette question en chinois."

Model Index

The model named "bloomz" has the following performance results on different tasks and datasets:

Coreference Resolution

Dataset	Name	Config	Split	Revision	Accuracy
winogrande	Winogrande XL (xl)	xl	validation	a80f460359d1e9a67c006011c94de42a8759430c	59.27
Muennighoff/xwinograd	XWinograd (en)	en	test	9dd5ea5505fad86b7bedad667955577815300cee	69.08
Muennighoff/xwinograd	XWinograd (fr)	fr	test	9dd5ea5505fad86b7bedad667955577815300cee	68.67
Muennighoff/xwinograd	XWinograd (jp)	jp	test	9dd5ea5505fad86b7bedad667955577815300cee	59.65
Muennighoff/xwinograd	XWinograd (pt)	pt	test	9dd5ea5505fad86b7bedad667955577815300cee	64.26
Muennighoff/xwinograd	XWinograd (ru)	ru	test	9dd5ea5505fad86b7bedad667955577815300cee	60.95
Muennighoff/xwinograd	XWinograd (zh)	zh	test	9dd5ea5505fad86b7bedad667955577815300cee	70.24

Natural Language Inference

Dataset	Name	Config	Split	Revision	Accuracy
anli	ANLI (r1)	r1	validation	9dbd830a06fea8b1c49d6e5ef2004a08d9f45094	48.6
anli	ANLI (r2)	r2	validation	9dbd830a06fea8b1c49d6e5ef2004a08d9f45094	44.1
anli	ANLI (r3)	r3	validation	9dbd830a06fea8b1c49d6e5ef2004a08d9f45094	45.5
super_glue	SuperGLUE (cb)	cb	validation	9e12063561e7e6c79099feb6d5a493142584e9e2	82.14
super_glue	SuperGLUE (rte)	rte	validation	9e12063561e7e6c79099feb6d5a493142584e9e2	85.56
xnli	XNLI (ar)	ar	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	60.68
xnli	XNLI (bg)	bg	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	48.43
xnli	XNLI (de)	de	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	54.38
xnli	XNLI (el)	el	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	47.43
xnli	XNLI (en)	en	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	67.47
xnli	XNLI (es)	es	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	61.24
xnli	XNLI (fr)	fr	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	61.37
xnli	XNLI (hi)	hi	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	60.2
xnli	XNLI (ru)	ru	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	54.02
xnli	XNLI (sw)	sw	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	52.09
xnli	XNLI (th)	th	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	43.78
xnli	XNLI (tr)	tr	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	45.7
xnli	XNLI (ur)	ur	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	50.8
xnli	XNLI (vi)	vi	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	61.0
xnli	XNLI (zh)	zh	validation	a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16	56.91

Program Synthesis

Dataset	Name	Config	Split	Revision	Pass@1	Pass@10	Pass@100
openai_humaneval	HumanEval	None	test	e8dc562f5de170c54b5481011dd9f4fa04845771	12.06	26.53	48.44

Sentence Completion

Dataset	Name	Config	Split	Revision	Accuracy
story_cloze	StoryCloze (2016)	"2016"	validation	e724c6f8cdf7c7a2fb229d862226e15b023ee4db	96.26
super_glue	SuperGLUE (copa)	copa	validation	9e12063561e7e6c79099feb6d5a493142584e9e2	91.0
xcopa	XCOPA (et)	et	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	51.0
xcopa	XCOPA (ht)	ht	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	58.0
xcopa	XCOPA (id)	id	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	86.0
xcopa	XCOPA (it)	it	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	74.0
xcopa	XCOPA (qu)	qu	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	56.0
xcopa	XCOPA (sw)	sw	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	64.0
xcopa	XCOPA (ta)	ta	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	69.0
xcopa	XCOPA (th)	th	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	58.0
xcopa	XCOPA (tr)	tr	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	57.0
xcopa	XCOPA (vi)	vi	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	87.0
xcopa	XCOPA (zh)	zh	validation	37f73c60fb123111fa5af5f9b705d0b3747fd187	90.0
Muennighoff/xstory_cloze	XStoryCloze (ar)	ar	validation	8bb76e594b68147f1a430e86829d07189622b90d	...

License

bigscience - bloom - rail - 1.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご