Model Overview
Model Features
Model Capabilities
Use Cases
đ NuExtract 2.0 8B by NuMind
NuExtract 2.0 is a family of models specifically designed for structured information extraction tasks. It supports multimodal inputs and is multilingual, offering several versions of different sizes based on pre - trained models from the QwenVL family.
API / Platform   |   Blog   |   Discord
đ Quick Start
To start using NuExtract 2.0, you can follow the steps and examples provided in this README. First, you need to understand the basic usage of the model, including how to prepare input templates and process different types of input data (text or images).
⨠Features
- Multimodal Support: Supports both text and image inputs, enabling more comprehensive information extraction.
- Multilingual: Can handle information extraction tasks in multiple languages.
- Multiple Model Sizes: Offers different model sizes (2B, 4B, 8B) to meet various resource and performance requirements.
- Flexible Template Definition: Allows users to define custom JSON templates for information extraction.
đĻ Installation
Although the original README doesn't provide detailed installation steps, you can use the transformers
library to load the model. Here is an example of loading the model:
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
model_name = "numind/NuExtract-2.0-2B"
# model_name = "numind/NuExtract-2.0-8B"
model = AutoModelForVision2Seq.from_pretrained(model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto")
processor = AutoProcessor.from_pretrained(model_name,
trust_remote_code=True,
padding_side='left',
use_fast=True)
đģ Usage Examples
Basic Usage
To perform a basic extraction of names from a text document:
template = """{"names": ["string"]}"""
document = "John went to the restaurant with Mary. James went to the cinema."
# prepare the user message content
messages = [{"role": "user", "content": document}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template, # template is specified here
tokenize=False,
add_generation_prompt=True,
)
print(text)
""""<|im_start|>user
# Template:
{"names": ["string"]}
# Context:
John went to the restaurant with Mary. James went to the cinema.<|im_end|>
<|im_start|>assistant"""
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
# we choose greedy sampling here, which works well for most information extraction tasks
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"names": ["John", "Mary", "James"]}']
Advanced Usage
In - Context Examples
Sometimes, providing in - context examples can help the model better understand the task. Here is an example:
template = """{"names": ["string"]}"""
document = "John went to the restaurant with Mary. James went to the cinema."
examples = [
{
"input": "Stephen is the manager at Susan's store.",
"output": """{"names": ["-STEPHEN-", "-SUSAN-"]}"""
}
]
messages = [{"role": "user", "content": document}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template,
examples=examples, # examples provided here
tokenize=False,
add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages, examples)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
# we choose greedy sampling here, which works well for most information extraction tasks
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"names": ["-JOHN-", "-MARY-", "-JAMES-"]}']
Image Inputs
If you want to give image inputs to NuExtract:
template = """{"store": "verbatim-string"}"""
document = {"type": "image", "image": "file://1.jpg"}
messages = [{"role": "user", "content": [document]}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template,
tokenize=False,
add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"store": "Trader Joe\'s"}']
Batch Inference
inputs = [
# image input with no ICL examples
{
"document": {"type": "image", "image": "file://0.jpg"},
"template": """{"store_name": "verbatim-string"}""",
},
# image input with 1 ICL example
{
"document": {"type": "image", "image": "file://0.jpg"},
"template": """{"store_name": "verbatim-string"}""",
"examples": [
{
"input": {"type": "image", "image": "file://1.jpg"},
"output": """{"store_name": "Trader Joe's"}""",
}
],
},
# text input with no ICL examples
{
"document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
"template": """{"names": ["string"]}""",
},
# text input with ICL example
{
"document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
"template": """{"names": ["string"]}""",
"examples": [
{
"input": "Stephen is the manager at Susan's store.",
"output": """{"names": ["STEPHEN", "SUSAN"]}"""
}
],
},
]
# messages should be a list of lists for batch processing
messages = [
[
{
"role": "user",
"content": [x['document']],
}
]
for x in inputs
]
# apply chat template to each example individually
texts = [
processor.tokenizer.apply_chat_template(
messages[i], # Now this is a list containing one message
template=x['template'],
examples=x.get('examples', None),
tokenize=False,
add_generation_prompt=True)
for i, x in enumerate(inputs)
]
image_inputs = process_all_vision_info(messages, [x.get('examples') for x in inputs])
inputs = processor(
text=texts,
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Batch Inference
generated_ids = model.generate(**inputs, **generation_config)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_texts = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for y in output_texts:
print(y)
# {"store_name": "WAL-MART"}
# {"store_name": "Walmart"}
# {"names": ["John", "Mary", "James"]}
# {"names": ["JOHN", "MARY", "JAMES"]}
Template Generation
You can use NuExtract 2.0 to generate templates from existing schema files or natural language descriptions.
Convert XML into a NuExtract template
xml_template = """<SportResult>
<Date></Date>
<Sport></Sport>
<Venue></Venue>
<HomeTeam></HomeTeam>
<AwayTeam></AwayTeam>
<HomeScore></HomeScore>
<AwayScore></AwayScore>
<TopScorer></TopScorer>
</SportResult>"""
messages = [
{
"role": "user",
"content": [{"type": "text", "text": xml_template}],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
# {
# "Date": "date-time",
# "Sport": "verbatim-string",
# "Venue": "verbatim-string",
# "HomeTeam": "verbatim-string",
# "AwayTeam": "verbatim-string",
# "HomeScore": "integer",
# "AwayScore": "integer",
# "TopScorer": "verbatim-string"
# }
Generate a template from natural language description
description = "I would like to extract important details from the contract."
messages = [
{
"role": "user",
"content": [{"type": "text", "text": description}],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
# {
# "Contract": {
# "Title": "verbatim-string",
# "Description": "verbatim-string",
# "Terms": [
# {
# "Term": "verbatim-string",
# "Description": "verbatim-string"
# }
# ],
# "Date": "date-time",
# "Signatory": "verbatim-string"
# }
# }
đ Documentation
Model Overview
The NuExtract 2.0 family of models is designed for structured information extraction. It supports multimodal inputs (text and images) and is multilingual. The models are based on pre - trained models from the QwenVL family, and different sizes are available to meet various needs.
Input Template
To use the model, you need to provide an input text/image and a JSON template describing the information you need to extract. The template should be a JSON object, specifying field names and their expected type.
Supported Types
verbatim-string
: Extract text that is present verbatim in the input.string
: A generic string field that can incorporate paraphrasing/abstraction.integer
: A whole number.number
: A whole or decimal number.date-time
: ISO formatted date.- Array of any of the above types (e.g.
["string"]
). enum
: A choice from a set of possible answers (represented in the template as an array of options, e.g.["yes", "no", "maybe"]
).multi-label
: An enum that can have multiple possible answers (represented in the template as a double - wrapped array, e.g.[["A", "B", "C"]]
).
If the model does not identify relevant information for a field, it will return null
or []
(for arrays and multi - labels).
đ§ Technical Details
The model is based on the transformers
library and uses the AutoProcessor
and AutoModelForVision2Seq
classes to load and process the model. It supports multimodal inputs by handling both text and image data. The process_all_vision_info
function is used to handle the loading and processing of image input data.
đ License
The NuExtract-2.0-2B
and NuExtract-2.0-8B
models are under the MIT license. The NuExtract-2.0-4B
model is under the Qwen Research License.
Property | Details |
---|---|
Model Type | Structured information extraction model |
Training Data | Not provided in the original README |
â ī¸ Important Note
We recommend using NuExtract with a temperature at or very close to 0. Some inference frameworks, such as Ollama, use a default of 0.7 which is not well suited to many extraction tasks.
đĄ Usage Tip
When using NuExtract, providing multiple in - context examples usually leads to better results.







