JPharmatron - 7Bオープンソース大規模言語モデル - 製薬文書の作成と研究作業を無料でサポート

ホーム

Jpharmatron 7B

EQUESによって開発

JPharmatron-7Bは、製薬アプリケーションと研究用に特別に設計された70億パラメータの大規模言語モデルで、製薬文書や研究分野で重要な役割を果たすことができます。

大規模言語モデル

Transformers

複数言語対応#製薬分野専用 #日英バイリンガル対応 #薬学文書処理

ダウンロード数 749

リリース時間 : 4/22/2025

モデル概要

Qwen2.5 - 7Bをベースに継続的に事前学習を行い、日本語と英語のデータセットからの88億個のトークンを使用し、製薬文書や研究アプリケーション用に特別に設計されています。

モデル特徴

強化されたチャット能力

Qwen2.5 - 7B - Instructのチャットベクトルによりチャット能力が強化されました。

専門分野最適化

製薬文書や研究アプリケーション用に特別に設計され、日本語と英語のデータセットからの88億個のトークンを使用して継続的に事前学習を行っています。

高性能表現

同規模の他の汎用/特定分野モデルとの評価比較で、5つのベンチマークテストすべてで最高得点を獲得しました。

モデル能力

製薬文書生成

製薬研究支援

多言語対応（日本語、英語）

使用事例

製薬研究

製薬文献分析

研究者が製薬分野の文献資料を分析し理解するのを支援します。

研究効率を向上させ、知識獲得を加速します。

製薬文書作成

製薬関連のレポート、論文やその他の専門文書の作成を支援します。

文書の質と作成速度を向上させます。

🚀 JPharmatron-7B

JPharmatron-7Bは、医薬品のアプリケーションや研究に特化した70億パラメータの大規模言語モデルです。

🚀 クイックスタート

JPharmatron-7Bは、医薬品分野の文書作成や研究に最適化された大規模言語モデルです。以下に、このモデルの詳細を説明します。

✨ 主な機能

日本語と英語の医薬品データセットを用いて継続的に事前学習されています。
Qwen2.5-7Bをベースに開発され、チャット機能が強化されています。
医薬品の文書作成や研究に適した性能を備えています。

📦 インストール

このドキュメントには具体的なインストール手順が記載されていないため、このセクションをスキップします。

📚 ドキュメント

モデルの詳細

モデルの説明

JPharmatron-7Bは、Qwen2.5-7Bをベースに、日本語と英語のデータセットから88億トークンを用いて継続的に事前学習されています。JPharmatron-7B-baseモデルと比較して、Qwen2.5-7B-Instructのチャットベクトルを取得することで、チャット機能が強化されています。

プロパティ	詳細
開発元	EQUES Inc.
資金提供元	GENIAC Project
モデルタイプ	Causal decoder-only
言語	日本語、英語
ライセンス	CC-BY-SA-4.0

モデルのソース

リポジトリ: https://github.com/EQUES-Inc/pharma-LLM-eval
論文: A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP

モデルの用途

このモデルは、医薬品の文書作成や研究に使用することを想定しています。医療用途やその他のリスクの高い用途での使用は検証されていません。

評価

我々は、JPharmatron-7Bを同規模の他の汎用/ドメイン特化モデルと比較して評価しました。

テストデータ

JPharmaBenchと2つの既存のベンチマーク（JMMLU (pharma) とIgakuQA）を使用しました。

評価結果

JPharmatron-7Bは、Meditron3-Qwen2.5-7BやLlama3.1-Swallow-8B-Instruct-v0.3と比較して、5つのベンチマークすべてで最高のスコアを達成しました。

評価結果

引用

@misc{sukeda_japanese_2025,
  title     = {A {Japanese} {Language} {Model} and {Three} {New} {Evaluation} {Benchmarks} for {Pharmaceutical} {NLP}},
  url       = {http://arxiv.org/abs/2505.16661},
  doi       = {10.48550/arXiv.2505.16661},
  abstract  = {We present a Japanese domain-specific language model for the pharmaceutical field, developed through continual pretraining on 2 billion Japanese pharmaceutical tokens and 8 billion English biomedical tokens. To enable rigorous evaluation, we introduce three new benchmarks: YakugakuQA, based on national pharmacist licensing exams; NayoseQA, which tests cross-lingual synonym and terminology normalization; and SogoCheck, a novel task designed to assess consistency reasoning between paired statements. We evaluate our model against both open-source medical LLMs and commercial models, including GPT-4o. Results show that our domain-specific model outperforms existing open models and achieves competitive performance with commercial ones, particularly on terminology-heavy and knowledge-based tasks. Interestingly, even GPT-4o performs poorly on SogoCheck, suggesting that cross-sentence consistency reasoning remains an open challenge. Our benchmark suite offers a broader diagnostic lens for pharmaceutical NLP, covering factual recall, lexical variation, and logical consistency. This work demonstrates the feasibility of building practical, secure, and cost-effective language models for Japanese domain-specific applications, and provides reusable evaluation resources for future research in pharmaceutical and healthcare NLP. Our model, codes, and datasets are released at https://github.com/EQUES-Inc/pharma-LLM-eval.},
  urldate   = {2025-05-30},
  publisher = {arXiv},
  author    = {Sukeda, Issey and Fujii, Takuro and Buma, Kosei and Sasaki, Shunsuke and Ono, Shinnosuke},
  month     = may,
  year      = {2025},
  note      = {arXiv:2505.16661 [cs]},
  annote    = {Comment: 15 pages, 9 tables, 5 figures}
}