bert-base-japanese-basic-char-v2 Open-Source Japanese Pre-trained Model - Achieve Character Segmentation without Toolkits

Bert Base Japanese Basic Char V2

Developed by hiroshi-matsuda-rit

This is a Japanese BERT pre-trained model based on character-level tokenization and whole word masking techniques, requiring no dependency on `fugashi` or `unidic_lite` toolkits.

Large Language Model

Transformers

Japanese#Japanese Character-level BERT #Dependency-free Tokenization #Whole Word Masking

Downloads 14

Release Time : 3/2/2022

Model Overview

This model is a Japanese BERT pre-trained model that employs character-level tokenization and whole word masking techniques, suitable for various Japanese natural language processing tasks.

Model Features

Character-level Tokenization

Uses character-level tokenization, eliminating the need for external tokenization toolkits.

Whole Word Masking

Employs whole word masking during pre-training to enhance the model's understanding of Japanese text.

Lightweight Dependencies

No dependency on `fugashi` or `unidic_lite` toolkits, simplifying deployment and usage.

Model Capabilities

Text Classification

Named Entity Recognition

Question Answering

Text Generation

Use Cases

Natural Language Processing

Japanese Text Classification

Used for classifying Japanese text, such as sentiment analysis, topic classification, etc.

Japanese Named Entity Recognition

Used to identify named entities in Japanese text, such as person names, locations, organization names, etc.

Property	Details
Training Data	Wikipedia (jawiki - 20200831)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Bert Base Japanese Basic Char V2

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 BERT base Japanese (character-level tokenization with whole word masking, jawiki-20200831)

🚀 Quick Start

📄 License

📚 Documentation

Dataset