Model Overview
Model Features
Model Capabilities
Use Cases
đ NuExtract 2.0 4B
NuExtract 2.0 is a family of models specifically designed for structured information extraction tasks. It supports multimodal inputs and is multilingual, offering several versions of different sizes based on pre - trained models from the QwenVL family.
API / Platform   |   Blog   |   Discord
đ Quick Start
To use the NuExtract model, you need to provide an input text/image and a JSON template describing the information to be extracted. The template should be a JSON object specifying field names and their expected types.
⨠Features
- Multimodal Support: Supports both text and image inputs.
- Multilingual: Suitable for various languages.
- Multiple Model Sizes: Offers models of different sizes (2B, 4B, 8B) to meet different needs.
- In - Context Examples: Can use in - context examples to improve performance.
- Template Generation: Can automatically generate templates from existing schema files or natural language descriptions.
đĻ Installation
This README does not provide specific installation steps.
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
model_name = "numind/NuExtract-2.0-2B"
# model_name = "numind/NuExtract-2.0-8B"
model = AutoModelForVision2Seq.from_pretrained(model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto")
processor = AutoProcessor.from_pretrained(model_name,
trust_remote_code=True,
padding_side='left',
use_fast=True)
# You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost.
# min_pixels = 256*28*28
# max_pixels = 1280*28*28
# processor = AutoProcessor.from_pretrained(model_name, min_pixels=min_pixels, max_pixels=max_pixels)
Advanced Usage
In - Context Examples
template = """{"names": ["string"]}"""
document = "John went to the restaurant with Mary. James went to the cinema."
examples = [
{
"input": "Stephen is the manager at Susan's store.",
"output": """{"names": ["-STEPHEN-", "-SUSAN-"]}"""
}
]
messages = [{"role": "user", "content": document}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template,
examples=examples, # examples provided here
tokenize=False,
add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages, examples)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
# we choose greedy sampling here, which works well for most information extraction tasks
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"names": ["-JOHN-", "-MARY-", "-JAMES-"]}']
Image Inputs
template = """{"store": "verbatim-string"}"""
document = {"type": "image", "image": "file://1.jpg"}
messages = [{"role": "user", "content": [document]}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template,
tokenize=False,
add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"store": "Trader Joe\'s"}']
Batch Inference
inputs = [
# image input with no ICL examples
{
"document": {"type": "image", "image": "file://0.jpg"},
"template": """{"store_name": "verbatim-string"}""",
},
# image input with 1 ICL example
{
"document": {"type": "image", "image": "file://0.jpg"},
"template": """{"store_name": "verbatim-string"}""",
"examples": [
{
"input": {"type": "image", "image": "file://1.jpg"},
"output": """{"store_name": "Trader Joe's"}""",
}
],
},
# text input with no ICL examples
{
"document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
"template": """{"names": ["string"]}""",
},
# text input with ICL example
{
"document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
"template": """{"names": ["string"]}""",
"examples": [
{
"input": "Stephen is the manager at Susan's store.",
"output": """{"names": ["STEPHEN", "SUSAN"]}"""
}
],
},
]
# messages should be a list of lists for batch processing
messages = [
[
{
"role": "user",
"content": [x['document']],
}
]
for x in inputs
]
# apply chat template to each example individually
texts = [
processor.tokenizer.apply_chat_template(
messages[i], # Now this is a list containing one message
template=x['template'],
examples=x.get('examples', None),
tokenize=False,
add_generation_prompt=True)
for i, x in enumerate(inputs)
]
image_inputs = process_all_vision_info(messages, [x.get('examples') for x in inputs])
inputs = processor(
text=texts,
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Batch Inference
generated_ids = model.generate(**inputs, **generation_config)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_texts = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for y in output_texts:
print(y)
# {"store_name": "WAL-MART"}
# {"store_name": "Walmart"}
# {"names": ["John", "Mary", "James"]}
# {"names": ["JOHN", "MARY", "JAMES"]}
Template Generation
xml_template = """<SportResult>
<Date></Date>
<Sport></Sport>
<Venue></Venue>
<HomeTeam></HomeTeam>
<AwayTeam></AwayTeam>
<HomeScore></HomeScore>
<AwayScore></AwayScore>
<TopScorer></TopScorer>
</SportResult>"""
messages = [
{
"role": "user",
"content": [{"type": "text", "text": xml_template}],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
# {
# "Date": "date-time",
# "Sport": "verbatim-string",
# "Venue": "verbatim-string",
# "HomeTeam": "verbatim-string",
# "AwayTeam": "verbatim-string",
# "HomeScore": "integer",
# "AwayScore": "integer",
# "TopScorer": "verbatim-string"
# }
description = "I would like to extract important details from the contract."
messages = [
{
"role": "user",
"content": [{"type": "text", "text": description}],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
# {
# "Contract": {
# "Title": "verbatim-string",
# "Description": "verbatim-string",
# "Terms": [
# {
# "Term": "verbatim-string",
# "Description": "verbatim-string"
# }
# ],
# "Date": "date-time",
# "Signatory": "verbatim-string"
# }
# }
đ Documentation
Model Information
Property | Details |
---|---|
Model Type | Structured information extraction model |
Base Model | Qwen/Qwen2.5 - VL - 3B - Instruct |
License | MIT (for some models), Qwen Research License (for NuExtract - 2.0 - 4B) |
Supported Types
verbatim-string
: Extract text that is present verbatim in the input.string
: A generic string field that can incorporate paraphrasing/abstraction.integer
: A whole number.number
: A whole or decimal number.date-time
: ISO formatted date.- Array of any of the above types (e.g.
["string"]
). enum
: A choice from a set of possible answers (represented in template as an array of options, e.g.["yes", "no", "maybe"]
).multi-label
: An enum that can have multiple possible answers (represented in template as a double - wrapped array, e.g.[["A", "B", "C"]]
).
Important Notes
â ī¸ Important Note
We recommend using NuExtract with a temperature at or very close to 0. Some inference frameworks, such as Ollama, use a default of 0.7 which is not well suited to many extraction tasks.
Usage Tips
đĄ Usage Tip
Usually providing multiple in - context examples will lead to better results.
đ§ Technical Details
This README does not provide detailed technical implementation information.
đ License
The models in this project are under different licenses:
NuExtract-2.0-2B
andNuExtract-2.0-8B
are under the MIT license.NuExtract-2.0-4B
is under the Qwen Research License.
Fine - Tuning
You can find a fine - tuning tutorial notebook in the cookbooks folder of the GitHub repo.
vLLM Deployment
Serve an OpenAI - compatible API
vllm serve numind/NuExtract-2.0-8B --trust_remote_code --limit-mm-per-prompt image=6 --chat-template-content-format openai
If you encounter memory issues, set --max - model - len
accordingly.
Send requests to the model
import json
from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
model="numind/NuExtract-2.0-8B",
temperature=0,
messages=[
{
"role": "user",
"content": [{"type": "text", "text": "Yesterday I went shopping at Bunnings"}],
},
],
extra_body={
"chat_template_kwargs": {
"template": json.dumps(json.loads("""{\"store\": \"verbatim-string\"}"""), indent=4)
},
}
)
print("Chat response:", chat_response)
For image inputs, structure requests as shown in the relevant code examples above. Make sure to order the images in "content"
as they appear in the prompt (i.e. any in - context examples before the main input).






