Open-source Thai QA model: xquad-th-mbert-base - Supports case sensitivity and suitable for Thai QA tasks

Home

Xquad Th Mbert Base

Developed by zhufy

A Thai extractive question answering model based on multilingual BERT, case-sensitive, suitable for Thai Q&A tasks.

Question Answering System

Transformers

#Thai Q&A #Multilingual BERT #Extractive Question Answering

Downloads 16

Release Time : 3/11/2022

Model Overview

This model is designed for Thai extractive question answering tasks, based on the bert-base-multilingual-cased model and is case-sensitive.

Model Features

Multilingual Support

Based on the multilingual BERT model, supporting multiple languages including Thai.

Case Sensitivity

Capable of distinguishing between uppercase and lowercase, such as differentiating 'english' from 'English'.

Question Answering Capability

Specifically optimized for Thai extractive question answering tasks.

Model Capabilities

Thai Text Understanding

Extractive Question Answering

Contextual Understanding

Use Cases

Question Answering Systems

Thai Q&A Application

Building a Thai question answering system to respond to user queries based on given text.

High-accuracy answer extraction

🚀 Thai Extractive Question Answering Model

This model is designed for Thai extractive question answering. It leverages the power of the multilingual BERT architecture to provide accurate answers to questions in Thai.

✨ Features

Thai Question Answering: Specifically tailored for extractive question answering in the Thai language.
Multilingual BERT Base: Built upon the bert-base-multilingual-cased model, ensuring robustness and cross - language capabilities.
Case - Sensitive: Distinguishes between different cases, which can be crucial for accurate language understanding.

📦 Installation

There is no specific installation command provided in the original document. However, to use the model, you need to have the transformers library installed. You can install it using pip:

pip install transformers

💻 Usage Examples

Basic Usage

>>> from transformers.pipelines import pipeline
>>> from transformers import AutoTokenizer, AutoModelForQuestionAnswering

>>> tokenizer = AutoTokenizer.from_pretrained("zhufy/xquad-th-mbert-base")
>>> model = AutoModelForQuestionAnswering.from_pretrained("zhufy/xquad-th-mbert-base")
>>> nlp = pipeline("question-answering", model=model, tokenizer=tokenizer)

>>> context = "ดินดอนสามเหลี่ยม ไรน์-เมิส ซึ่งเป็นภูมิภาคทางธรรมชาติที่สำคัญของเนเธอร์แลนด์เริ่มต้น \
               ใกล้มิลลิงเงิน อาน เดอ เรน ใกล้ชายแดนเนเธอร์แลนด์ติดกับเยอรมัน \
               โดยมีสาขาของไรน์ไหลเข้าสู่แม่น้ำวาลและเนเดอร์เรน เนื่องจากน้ำส่วนใหญ่จากแม่น้ำไรน์ \
               คำว่า ดินดอนสามเหลี่ยมไรน์ ซึ่งสั้นกว่าจึงเป็นคำที่ใช้เรียกกันทั่วไป อย่างไรก็ดี \
               ชื่อนี้ยังใช้เรียกดินดอนสามเหลี่ยมบริเวณแม่น้ำซึ่งแม่น้ำไรน์ไหลเข้าสู่ทะเลสาบคอนสแตนซ์อีกด้วย \
               ดังนั้นการเรียกดินดอนสามเหลี่ยมซึ่งใหญ่กว่าว่าไรน์-เมิส หรือแม้กระทั่งดินแดนสามเหลี่ยมไรน์ \
               -เมิส-สเกลต์จึงชัดเจนกว่า เนื่องจากแม่น้ำสเกลต์สิ้นสุดที่ดินดอนสามเหลี่ยมเดียวกัน"
              
>>> question = "ดินดอนสามเหลี่ยมในเนเธอร์แลนด์มีชื่อว่าอะไร?"

>>> inputs = {"question": question, 
            "context":context }
            
>>> nlp(inputs)

{'score': 0.9426798224449158,
 'start': 17,
 'end': 84,
 'answer': 'ไรน์-เมิส ซึ่งเป็นภูมิภาคทางธรรมชาติที่สำคัญของเนเธอร์แลนด์เริ่มต้น'}

📚 Documentation

Model Description

This model is designed for Thai extractive question answering. It is based on the multilingual BERT bert-base-multilingual-cased model, and it is case-sensitive: it makes a difference between "english" and "English".

Training data

We split the original xquad dataset into the training/validation/testing set. Totally, there are 876/161/153 question-answer pairs from 34/7/7 articles in the training/validation/testing set separately. You can find the details of the dataset here xquad_split.

Information Table

Property	Details
Model Type	Thai extractive question answering model based on multilingual BERT
Training Data	Split from the xquad dataset, with 876/161/153 question - answer pairs from 34/7/7 articles in the training/validation/testing set respectively. Details can be found at xquad_split

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご