bert-base-japanese-wikipedia-ud-head Open-source Model - Free Japanese Dependency Syntax Analysis and Detection of Long Unit Head Words

Bert Base Japanese Wikipedia Ud Head

Developed by KoichiYasuoka

This is a BERT model specifically designed for Japanese dependency parsing, used to detect head words in long unit words, implemented in a question-answering format.

Sequence Labeling

Transformers

Japanese#Japanese Dependency Parsing #BERT Question-Answering Format #Wikipedia Pre-training

Downloads 474

Release Time : 6/20/2022

Model Overview

The model is pre-trained on Japanese Wikipedia texts and is specialized for dependency parsing tasks, particularly head word detection in long unit words. It employs a question-answering system format to analyze dependency relations in Japanese texts.

Model Features

Japanese Dependency Parsing

Specifically designed for Japanese texts, it accurately analyzes dependency relations in sentences, especially head word detection in long unit words.

Question-Answering System Implementation

Implements dependency relation analysis in a question-answering format, allowing the detection of dependency head words by specifying words as questions.

Wikipedia-Based Pre-training

The model is pre-trained on Japanese Wikipedia texts, possessing rich linguistic knowledge and contextual understanding capabilities.

Model Capabilities

Japanese Text Analysis

Dependency Parsing

Head Word Detection

Question-Answering System

Use Cases

Natural Language Processing

Japanese Text Dependency Analysis

Analyzes dependency relations between words in Japanese texts to construct syntactic trees.

Accurately identifies dependency relations between words, especially head words in long unit words.

Japanese Language Teaching Aid

Used in Japanese language teaching to analyze sentence structures and help students understand Japanese grammar.

Provides accurate syntactic analysis results to assist in Japanese grammar instruction.

🚀 bert-base-japanese-wikipedia-ud-head

This is a BERT model designed for dependency - parsing (head - detection on long - unit - words) in Japanese texts. It uses a question - answering approach and is trained on Japanese Wikipedia data. This model can effectively handle Japanese language processing tasks, offering accurate results in dependency analysis.

🚀 Quick Start

This section provides a quick guide on how to use the bert - base - japanese - wikipedia - ud - head model for question - answering tasks.

✨ Features

Pretrained on Japanese Wikipedia: Trained on a large corpus of Japanese Wikipedia texts, making it well - suited for Japanese language tasks.
Dependency - Parsing via Question - Answering: Uses a question - answering mechanism for dependency - parsing, which is a unique and effective approach.
Derived from Reliable Sources: Based on [bert - base - japanese - char - extended](https://huggingface.co/KoichiYasuoka/bert - base - japanese - char - extended) and [UD_Japanese - GSDLUW](https://github.com/UniversalDependencies/UD_Japanese - GSDLUW), ensuring high - quality performance.

📦 Installation

No specific installation steps are provided in the original document. If you want to use this model, you need to have the transformers library installed. You can install it using pip install transformers.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer,AutoModelForQuestionAnswering,QuestionAnsweringPipeline
tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/bert-base-japanese-wikipedia-ud-head")
model=AutoModelForQuestionAnswering.from_pretrained("KoichiYasuoka/bert-base-japanese-wikipedia-ud-head")
qap=QuestionAnsweringPipeline(tokenizer=tokenizer,model=model,align_to_words=False)
print(qap(question="国語",context="全学年にわたって小学校の国語の教科書に挿し絵が用いられている"))

Advanced Usage

class TransformersUD(object):
  def __init__(self,bert):
    import os
    from transformers import (AutoTokenizer,AutoModelForQuestionAnswering,
      AutoModelForTokenClassification,AutoConfig,TokenClassificationPipeline)
    self.tokenizer=AutoTokenizer.from_pretrained(bert)
    self.model=AutoModelForQuestionAnswering.from_pretrained(bert)
    x=AutoModelForTokenClassification.from_pretrained
    if os.path.isdir(bert):
      d,t=x(os.path.join(bert,"deprel")),x(os.path.join(bert,"tagger"))
    else:
      from transformers.utils import cached_file
      c=AutoConfig.from_pretrained(cached_file(bert,"deprel/config.json"))
      d=x(cached_file(bert,"deprel/pytorch_model.bin"),config=c)
      s=AutoConfig.from_pretrained(cached_file(bert,"tagger/config.json"))
      t=x(cached_file(bert,"tagger/pytorch_model.bin"),config=s)
    self.deprel=TokenClassificationPipeline(model=d,tokenizer=self.tokenizer,
      aggregation_strategy="simple")
    self.tagger=TokenClassificationPipeline(model=t,tokenizer=self.tokenizer)
  def __call__(self,text):
    import numpy,torch,ufal.chu_liu_edmonds
    w=[(t["start"],t["end"],t["entity_group"]) for t in self.deprel(text)]
    z,n={t["start"]:t["entity"].split("|") for t in self.tagger(text)},len(w)
    r,m=[text[s:e] for s,e,p in w],numpy.full((n+1,n+1),numpy.nan)
    v,c=self.tokenizer(r,add_special_tokens=False)["input_ids"],[]
    for i,t in enumerate(v):
      q=[self.tokenizer.cls_token_id]+t+[self.tokenizer.sep_token_id]
      c.append([q]+v[0:i]+[[self.tokenizer.mask_token_id]]+v[i+1:]+[[q[-1]]])
    b=[[len(sum(x[0:j+1],[])) for j in range(len(x))] for x in c]
    with torch.no_grad():
      d=self.model(input_ids=torch.tensor([sum(x,[]) for x in c]),
        token_type_ids=torch.tensor([[0]*x[0]+[1]*(x[-1]-x[0]) for x in b]))
    s,e=d.start_logits.tolist(),d.end_logits.tolist()
    for i in range(n):
      for j in range(n):
        m[i+1,0 if i==j else j+1]=s[i][b[i][j]]+e[i][b[i][j+1]-1]
    h=ufal.chu_liu_edmonds.chu_liu_edmonds(m)[0]
    if [0 for i in h if i==0]!=[0]:
      i=([p for s,e,p in w]+["root"]).index("root")
      j=i+1 if i<n else numpy.nanargmax(m[:,0])
      m[0:j,0]=m[j+1:,0]=numpy.nan
      h=ufal.chu_liu_edmonds.chu_liu_edmonds(m)[0]
    u="# text = "+text.replace("\n"," ")+"\n"
    for i,(s,e,p) in enumerate(w,1):
      p="root" if h[i]==0 else "dep" if p=="root" else p
      u+="\t".join([str(i),r[i-1],"_",z[s][0][2:],"_","|".join(z[s][1:]),
        str(h[i]),p,"_","_" if i<n and e<w[i][0] else "SpaceAfter=No"])+"\n"
    return u+"\n"

nlp=TransformersUD("KoichiYasuoka/bert-base-japanese-wikipedia-ud-head")
print(nlp("全学年にわたって小学校の国語の教科書に挿し絵が用いられている"))

📚 Documentation

Model Description

This is a BERT model pretrained on Japanese Wikipedia texts for dependency - parsing (head - detection on long - unit - words) as question - answering. It is derived from [bert - base - japanese - char - extended](https://huggingface.co/KoichiYasuoka/bert - base - japanese - char - extended) and [UD_Japanese - GSDLUW](https://github.com/UniversalDependencies/UD_Japanese - GSDLUW). Use [MASK] inside context to avoid ambiguity when specifying a multiple - used word as question.

📄 License

The model is licensed under "cc - by - sa - 4.0".

Additional Information

Property	Details
Model Type	BERT model for Japanese dependency - parsing via question - answering
Training Data	Japanese Wikipedia texts, [UD_Japanese - GSDLUW](https://github.com/UniversalDependencies/UD_Japanese - GSDLUW)
Tags	japanese, wikipedia, question - answering, dependency - parsing
Base Model	KoichiYasuoka/bert - base - japanese - char - extended
Pipeline Tag	question - answering

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご