モデル概要
モデル特徴
モデル能力
使用事例
🚀 NuExtract 2.0 2B by NuMind
NuExtract 2.0は、構造化情報抽出タスク用に特別に訓練されたモデルファミリーです。マルチモーダル入力をサポートし、多言語対応です。
API / Platform   |   Blog   |   Discord
✨ 主な機能
- 構造化情報抽出タスクに特化した訓練
- マルチモーダル入力(テキストと画像)のサポート
- 多言語対応
📦 モデル情報
Property | Details |
---|---|
Model Type | NuExtract 2.0 |
Base Model | Qwen2-VL-2B-Instructなど |
License | MITやQwen Research License |
Huggingface Link | NuExtract-2.0-2Bなど |
🚀 クイックスタート
モデルを使用するには、入力テキストまたは画像と、抽出する情報を記述したJSONテンプレートを提供する必要があります。テンプレートはJSONオブジェクトで、フィールド名とその期待されるタイプを指定する必要があります。
サポートされるタイプは以下の通りです。
verbatim-string
- 入力にそのまま存在するテキストを抽出するようにモデルに指示します。string
- 言い換えや抽象化を含めることができる汎用的な文字列フィールドです。integer
- 整数です。number
- 整数または小数です。date-time
- ISO形式の日付です。- 上記のタイプの配列(例:
["string"]
) enum
- 可能な回答のセットからの選択(テンプレートではオプションの配列として表されます。例:["yes", "no", "maybe"]
)multi-label
- 複数の回答が可能なenum(テンプレートでは二重にラップされた配列として表されます。例:[["A", "B", "C"]]
)
モデルがフィールドに関連する情報を識別できない場合、null
または[]
(配列とマルチラベルの場合)を返します。
以下はテンプレートの例です。
{
"first_name": "verbatim-string",
"last_name": "verbatim-string",
"description": "string",
"age": "integer",
"gpa": "number",
"birth_date": "date-time",
"nationality": ["France", "England", "Japan", "USA", "China"],
"languages_spoken": [["English", "French", "Japanese", "Mandarin", "Spanish"]]
}
出力の例:
{
"first_name": "Susan",
"last_name": "Smith",
"description": "A student studying computer science.",
"age": 20,
"gpa": 3.7,
"birth_date": "2005-03-01",
"nationality": "England",
"languages_spoken": ["English", "French"]
}
⚠️ 重要提示
NuExtractを使用する際は、温度を0またはそれに非常に近い値に設定することをお勧めします。Ollamaなどの一部の推論フレームワークでは、デフォルトで0.7が使用されていますが、多くの抽出タスクには適していません。
💻 使用例
基本的な使用法
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
model_name = "numind/NuExtract-2.0-2B"
# model_name = "numind/NuExtract-2.0-8B"
model = AutoModelForVision2Seq.from_pretrained(model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto")
processor = AutoProcessor.from_pretrained(model_name,
trust_remote_code=True,
padding_side='left',
use_fast=True)
# You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost.
# min_pixels = 256*28*28
# max_pixels = 1280*28*28
# processor = AutoProcessor.from_pretrained(model_name, min_pixels=min_pixels, max_pixels=max_pixels)
画像入力データの読み込みを処理するには、以下の関数が必要です。
def process_all_vision_info(messages, examples=None):
"""
Process vision information from both messages and in-context examples, supporting batch processing.
Args:
messages: List of message dictionaries (single input) OR list of message lists (batch input)
examples: Optional list of example dictionaries (single input) OR list of example lists (batch)
Returns:
A flat list of all images in the correct order:
- For single input: example images followed by message images
- For batch input: interleaved as (item1 examples, item1 input, item2 examples, item2 input, etc.)
- Returns None if no images were found
"""
from qwen_vl_utils import process_vision_info, fetch_image
# Helper function to extract images from examples
def extract_example_images(example_item):
if not example_item:
return []
# Handle both list of examples and single example
examples_to_process = example_item if isinstance(example_item, list) else [example_item]
images = []
for example in examples_to_process:
if isinstance(example.get('input'), dict) and example['input'].get('type') == 'image':
images.append(fetch_image(example['input']))
return images
# Normalize inputs to always be batched format
is_batch = messages and isinstance(messages[0], list)
messages_batch = messages if is_batch else [messages]
is_batch_examples = examples and isinstance(examples, list) and (isinstance(examples[0], list) or examples[0] is None)
examples_batch = examples if is_batch_examples else ([examples] if examples is not None else None)
# Ensure examples batch matches messages batch if provided
if examples and len(examples_batch) != len(messages_batch):
if not is_batch and len(examples_batch) == 1:
# Single example set for a single input is fine
pass
else:
raise ValueError("Examples batch length must match messages batch length")
# Process all inputs, maintaining correct order
all_images = []
for i, message_group in enumerate(messages_batch):
# Get example images for this input
if examples and i < len(examples_batch):
input_example_images = extract_example_images(examples_batch[i])
all_images.extend(input_example_images)
# Get message images for this input
input_message_images = process_vision_info(message_group)[0] or []
all_images.extend(input_message_images)
return all_images if all_images else None
例えば、テキストドキュメントから名前を基本的に抽出するには、次のようにします。
template = """{"names": ["string"]}"""
document = "John went to the restaurant with Mary. James went to the cinema."
# prepare the user message content
messages = [{"role": "user", "content": document}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template, # template is specified here
tokenize=False,
add_generation_prompt=True,
)
print(text)
""""<|im_start|>user
# Template:
{"names": ["string"]}
# Context:
John went to the restaurant with Mary. James went to the cinema.<|im_end|>
<|im_start|>assistant"""
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
# we choose greedy sampling here, which works well for most information extraction tasks
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"names": ["John", "Mary", "James"]}']
高度な使用法
文脈内の例の使用
template = """{"names": ["string"]}"""
document = "John went to the restaurant with Mary. James went to the cinema."
examples = [
{
"input": "Stephen is the manager at Susan's store.",
"output": """{"names": ["-STEPHEN-", "-SUSAN-"]}"""
}
]
messages = [{"role": "user", "content": document}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template,
examples=examples, # examples provided here
tokenize=False,
add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages, examples)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
# we choose greedy sampling here, which works well for most information extraction tasks
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"names": ["-JOHN-", "-MARY-", "-JAMES-"]}']
画像入力の使用
template = """{"store": "verbatim-string"}"""
document = {"type": "image", "image": "file://1.jpg"}
messages = [{"role": "user", "content": [document]}]
text = processor.tokenizer.apply_chat_template(
messages,
template=template,
tokenize=False,
add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Inference: Generation of the output
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# ['{"store": "Trader Joe\'s"}']
バッチ推論
inputs = [
# image input with no ICL examples
{
"document": {"type": "image", "image": "file://0.jpg"},
"template": """{"store_name": "verbatim-string"}""",
},
# image input with 1 ICL example
{
"document": {"type": "image", "image": "file://0.jpg"},
"template": """{"store_name": "verbatim-string"}""",
"examples": [
{
"input": {"type": "image", "image": "file://1.jpg"},
"output": """{"store_name": "Trader Joe's"}""",
}
],
},
# text input with no ICL examples
{
"document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
"template": """{"names": ["string"]}""",
},
# text input with ICL example
{
"document": {"type": "text", "text": "John went to the restaurant with Mary. James went to the cinema."},
"template": """{"names": ["string"]}""",
"examples": [
{
"input": "Stephen is the manager at Susan's store.",
"output": """{"names": ["STEPHEN", "SUSAN"]}"""
}
],
},
]
# messages should be a list of lists for batch processing
messages = [
[
{
"role": "user",
"content": [x['document']],
}
]
for x in inputs
]
# apply chat template to each example individually
texts = [
processor.tokenizer.apply_chat_template(
messages[i], # Now this is a list containing one message
template=x['template'],
examples=x.get('examples', None),
tokenize=False,
add_generation_prompt=True)
for i, x in enumerate(inputs)
]
image_inputs = process_all_vision_info(messages, [x.get('examples') for x in inputs])
inputs = processor(
text=texts,
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generation_config = {"do_sample": False, "num_beams": 1, "max_new_tokens": 2048}
# Batch Inference
generated_ids = model.generate(**inputs, **generation_config)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_texts = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for y in output_texts:
print(y)
# {"store_name": "WAL-MART"}
# {"store_name": "Walmart"}
# {"names": ["John", "Mary", "James"]}
# {"names": ["JOHN", "MARY", "JAMES"]}
テンプレート生成
# XMLをNuExtractテンプレートに変換する例
xml_template = """<SportResult>
<Date></Date>
<Sport></Sport>
<Venue></Venue>
<HomeTeam></HomeTeam>
<AwayTeam></AwayTeam>
<HomeScore></HomeScore>
<AwayScore></AwayScore>
<TopScorer></TopScorer>
</SportResult>"""
messages = [
{
"role": "user",
"content": [{"type": "text", "text": xml_template}],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
# {
# "Date": "date-time",
# "Sport": "verbatim-string",
# "Venue": "verbatim-string",
# "HomeTeam": "verbatim-string",
# "AwayTeam": "verbatim-string",
# "HomeScore": "integer",
# "AwayScore": "integer",
# "TopScorer": "verbatim-string"
# }
# 自然言語の説明からテンプレートを生成する例
description = "I would like to extract important details from the contract."
messages = [
{
"role": "user",
"content": [{"type": "text", "text": description}],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True,
)
image_inputs = process_all_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(
**inputs,
**generation_config
)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
# {
# "Contract": {
# "Title": "verbatim-string",
# "Description": "verbatim-string",
# "Terms": [
# {
# "Term": "verbatim-string",
# "Description": "verbatim-string"
# }
# ],
# "Date": "date-time",
# "Signatory": "verbatim-string"
# }
# }
🔧 ファインチューニング
ファインチューニングのチュートリアルノートブックは、GitHubリポジトリのcookbooksフォルダにあります。
🔧 vLLMデプロイメント
OpenAI互換APIを提供するには、以下のコマンドを実行します。
vllm serve numind/NuExtract-2.0-8B --trust_remote_code --limit-mm-per-prompt image=6 --chat-template-content-format openai
メモリの問題が発生した場合は、--max-model-len
を適切に設定してください。
モデルにリクエストを送信するには、次のようにします。
import json
from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
model="numind/NuExtract-2.0-8B",
temperature=0,
messages=[
{
"role": "user",
"content": [{"type": "text", "text": "Yesterday I went shopping at Bunnings"}],
},
],
extra_body={
"chat_template_kwargs": {
"template": json.dumps(json.loads("""{\"store\": \"verbatim-string\"}"""), indent=4)
},
}
)
print("Chat response:", chat_response)
画像入力の場合は、以下のようにリクエストを構築します。"content"
内の画像をプロンプト内の表示順に並べるようにしてください(つまり、メイン入力の前に文脈内の例があればそれを先に)。
import base64
def encode_image(image_path):
"""
Encode the image file to base64 string
"""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image("0.jpg")
base64_image2 = encode_image("1.jpg")
chat_response = client.chat.completions.create(
model="numind/NuExtract-2.0-8B",
temperature=0,
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}, # first ICL example image
# 他の画像やテキスト入力をここに追加
]
}
]
)
print("Chat response:", chat_response)
📄 ライセンス
このプロジェクトはMITライセンスの下でライセンスされています。









