Galactica-6.7b Open-source Scientific Language Model - Free deployment for tasks such as executable reference prediction and scientific Q&A

Galactica 6.7b

Developed by facebook

GALACTICA is a language model trained on a large-scale scientific corpus and can perform various scientific tasks, such as citation prediction and scientific Q&A.

Large Language Model

Transformers

#Scientific literature processing #Multimodal scientific tasks #Academic citation prediction

Downloads 1,321

Release Time : 11/16/2022

Model Overview

The GALACTICA model is designed to perform scientific tasks, including citation prediction, scientific Q&A, mathematical reasoning, summary generation, document generation, molecular property prediction, and entity extraction.

Model Features

Scientific task execution

Trained on a large-scale scientific corpus and capable of performing various scientific tasks.

Multiple parameter scales

Offers parameter scales ranging from 125M to 120B to meet different application requirements.

Low toxicity rate

Compared with other large language models, the toxicity rate is significantly reduced.

Model Capabilities

Citation prediction

Scientific Q&A

Mathematical reasoning

Summary generation

Document generation

Molecular property prediction

Entity extraction

Use Cases

Academic research

Academic literature search

As an alternative to standard search tools, it helps discover academic literature.

Outperforms existing language models in knowledge exploration, reasoning, and knowledge-intensive scientific tasks.

Scientific tool development

Scientific Q&A system

Build a scientific Q&A tool to answer questions in professional fields.

Outperforms other open-source general language models in general NLP tasks.

🚀 GALACTICA 6.7B (standard)

This model card provides information about the GALACTICA model, including its training details and intended use cases. It's designed for scientific tasks and trained on a large - scale scientific corpus.

logo

Model card from the original repo

Following Mitchell et al. (2018), this model card offers details about the GALACTICA model, its training process, and the intended use cases. Full details about its training and evaluation can be found in the release paper.

✨ Features

The GALACTICA models are trained on a large - scale scientific corpus. They are designed to perform various scientific tasks, such as citation prediction, scientific QA, mathematical reasoning, summarization, document generation, molecular property prediction, and entity extraction. The models were developed by the Papers with Code team at Meta AI to study the use of language models for the automatic organization of science. We train models with parameter sizes ranging from 125M to 120B.

Size	Parameters
`mini`	125 M
`base`	1.3 B
`standard`	6.7 B
`large`	30 B
`huge`	120 B

📦 Installation

No specific installation steps are provided in the original README. However, usage examples imply that you need to install relevant libraries like transformers, accelerate, bitsandbytes etc.

💻 Usage Examples

Basic Usage

Running the model on a CPU

from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b")

input_text = "The Transformer architecture [START_REF]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Advanced Usage

Running the model on a GPU

# pip install accelerate
from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto")

input_text = "The Transformer architecture [START_REF]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using different precisions

FP16

# pip install accelerate
import torch
from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)

input_text = "The Transformer architecture [START_REF]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

INT8

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", load_in_8bit=True)

input_text = "The Transformer architecture [START_REF]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

📚 Documentation

Release Date

November 2022

Model Type

Transformer based architecture in a decoder - only setup with a few modifications (see paper for more details).

Paper & Demo

Paper / Demo

Model Use

The primary intended users of the GALACTICA models are researchers studying language models applied to the scientific domain. We also anticipate the model will be useful for developers who wish to build scientific tooling. However, we caution against production use without safeguards given the potential of language models to hallucinate.

The models are made available under a non - commercial CC BY - NC 4.0 license. More information about how to use the model can be found in the README.md of this repository.

Training Data

The GALACTICA models are trained on 106 billion tokens of open - access scientific text and data. This includes papers, textbooks, scientific websites, encyclopedias, reference material, knowledge bases, and more. We tokenize different modalities to provide a natural language interface for different tasks. See the README.md for more information. See the paper for full information on the training data.

🔧 Technical Details

Performance and Limitations

The model outperforms several existing language models on a range of knowledge probes, reasoning, and knowledge - intensive scientific tasks. This also extends to general NLP tasks, where GALACTICA outperforms other open source general language models.

However, like other language models, GALACTICA is often prone to hallucination, especially for less popular and less cited scientific concepts. There are no guarantees of truthful output when generating from the model, which also applies to specific modalities such as citation prediction. While GALACTICA's citation behaviour approaches the ground truth citation behaviour with scale, the model continues to exhibit a popularity bias at larger scales.

In addition, we evaluated the model on several types of benchmarks related to stereotypes and toxicity. Overall, the model exhibits substantially lower toxicity rates compared to other large language models. But the model continues to exhibit bias on certain measures (see the paper for details). So we recommend care when using the model for generations.

Broader Implications

GALACTICA can potentially be used as a new way to discover academic literature. We also expect a lot of downstream use for application to particular domains, such as mathematics, biology, and chemistry. In the paper, we demonstrated several examples of the model acting as an alternative to standard search tools. We expect a new generation of scientific tools to be built upon large language models such as GALACTICA.

We encourage researchers to investigate beneficial and new use cases for these models. However, it is important to be aware of the current limitations of large language models. Researchers should pay attention to common issues such as hallucination and biases that could emerge from using these models.

📄 License

The models are made available under a non - commercial CC BY - NC 4.0 license.

📖 Citation

@inproceedings{GALACTICA,
    title={GALACTICA: A Large Language Model for Science},
    author={Ross Taylor and Marcin Kardas and Guillem Cucurull and Thomas Scialom and Anthony Hartshorn and Elvis Saravia and Andrew Poulton and Viktor Kerkez and Robert Stojnic},
    year={2022}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご