## đ NERmemBERTa-3entities
This project presents **NERmemBERTa-3entities**, a fine - tuned model based on [CamemBERTa v2 base](https://huggingface.co/almanach/camembertav2-base) for the Name Entity Recognition task in French. It is trained on five French NER datasets for 3 entities (LOC, PER, ORG), aiming to provide high - quality entity recognition services for the French language.
## đ Quick Start
The model can be used directly through the `transformers` library. You can load the model and perform entity recognition tasks according to the following steps. For detailed usage, please refer to the official documentation of the `transformers` library.
## ⨠Features
- **Fine - tuned for French**: Specifically optimized for the French language, it can accurately identify three types of named entities (LOC, PER, ORG).
- **Large - scale dataset**: Trained on a large - scale French NER dataset, with over 420,264 rows of data, ensuring the model's generalization ability.
- **High - performance**: Demonstrates excellent performance in multiple evaluation metrics such as precision, recall, and F1 - score.
## đ Documentation
### Model Description
We present **NERmemBERTa-3entities**, which is a [CamemBERTa v2 base](https://huggingface.co/almanach/camembertav2-base) fine - tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER_3entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities).
This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
### Evaluation results
#### frenchNER_3entities
For space reasons, we show only the F1 of the different models. You can see the full results below the table.
| Model | Parameters | Context | PER | LOC | ORG |
| ---- | ---- | ---- | ---- | ---- | ---- |
| [Jean - Baptiste/camembert - ner](https://hf.co/Jean - Baptiste/camembert - ner) | 110M | 512 tokens | 0.941 | 0.883 | 0.658 |
| [cmarkea/distilcamembert - base - ner](https://hf.co/cmarkea/distilcamembert - base - ner) | 67.5M | 512 tokens | 0.942 | 0.882 | 0.647 |
| [NERmembert - base - 3entities](https://hf.co/CATIE - AQ/NERmembert - base - 3entities) | 110M | 512 tokens | 0.966 | 0.940 | 0.876 |
| [NERmembert2 - 3entities](https://hf.co/CATIE - AQ/NERmembert2 - 3entities) | 111M | 1024 tokens | 0.967 | 0.942 | 0.875 |
| NERmemberta - 3entities (this model) | 111M | 1024 tokens | **0.970** | 0.943 | 0.881 |
| [NERmembert - large - 3entities](https://hf.co/CATIE - AQ/NERmembert - large - 3entities) | 336M | 512 tokens | 0.969 | **0.947** | **0.890** |
The results of the 4 - entity models on the 3 - entity dataset are given for information only. They are not reported in the following.
<details>
<summary>Full results</summary>
| Model | Metrics | PER | LOC | ORG | O | Overall |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| [Jean - Baptiste/camembert - ner (110M)](https://hf.co/Jean - Baptiste/camembert - ner) | Precision | 0.918 | 0.860 | 0.831 | 0.992 | 0.974 |
| | Recall | 0.964 | 0.908 | 0.544 | 0.964 | 0.948 |
| | F1 | 0.941 | 0.883 | 0.658 | 0.978 | 0.961 |
| [cmarkea/distilcamembert - base - ner (67.5M)](https://hf.co/cmarkea/distilcamembert - base - ner) | Precision | 0.929 | 0.861 | 0.813 | 0.991 | 0.974 |
| | Recall | 0.956 | 0.905 | 0.956 | 0.965 | 0.948 |
| | F1 | 0.942 | 0.882 | 0.647 | 0.978 | 0.961 |
| [NERmembert - base - 3entities (110M)](https://hf.co/CATIE - AQ/NERmembert - base - 3entities) | Precision | 0.961 | 0.935 | 0.877 | 0.995 | 0.986 |
| | Recall | 0.972 | 0.946 | 0.876 | 0.994 | 0.986 |
| | F1 | 0.966 | 0.940 | 0.876 | 0.994 | 0.986 |
| [NERmembert2 - 3entities (111M)](https://hf.co/CATIE - AQ/NERmembert2 - 3entities) | Precision | 0.964 | 0.935 | 0.872 | 0.995 | 0.985 |
| | Recall | 0.967 | 0.949 | 0.878 | 0.993 | 0.984 |
| | F1 | 0.967 | 0.942 | 0.875 | 0.994 | 0.985 |
| [NERmemberta - 3entities (111M) (this model)](https://hf.co/CATIE - AQ/NERmemberta - 3entities) | Precision | 0.966 | 0.934 | 0.880 | 0.995 | 0.985 |
| | Recall | 0.973 | 0.952 | 0.883 | 0.993 | 0.985 |
| | F1 | 0.970 | 0.943 | 0.881 | 0.994 | 0.985 |
| [NERmembert - large - 3entities (336M)](https://hf.co/CATIE - AQ/NERmembert - large - 3entities) | Precision | 0.946 | 0.884 | 0.859 | 0.993 | 0.971 |
| | Recall | 0.955 | 0.904 | 0.550 | 0.993 | 0.971 |
| | F1 | 0.951 | 0.894 | 0.671 | 0.988 | 0.971 |
</table>
</details>
#### multiconer
For space reasons, we show only the F1 of the different models. You can see the full results below the table.
| Model | PER | LOC | ORG |
| ---- | ---- | ---- | ---- |
| [Jean - Baptiste/camembert - ner (110M)](https://hf.co/Jean - Baptiste/camembert - ner) | 0.940 | 0.761 | 0.723 |
| [cmarkea/distilcamembert - base - ner (67.5M)](https://hf.co/cmarkea/distilcamembert - base - ner) | 0.921 | 0.748 | 0.694 |
| [NERmembert - base - 3entities (110M)](https://hf.co/CATIE - AQ/NERmembert - base - 3entities) | 0.960 | 0.887 | 0.876 |
| [NERmembert2 - 3entities (111M)](https://hf.co/CATIE - AQ/NERmembert2 - 3entities) | 0.958 | 0.876 | 0.863 |
| NERmemberta - 3entities (111M) (this model) | 0.964 | 0.865 | 0.859 |
| [NERmembert - large - 3entities (336M)](https://hf.co/CATIE - AQ/NERmembert - large - 3entities) | **0.965** | **0.902** | **0.896** |
<details>
<summary>Full results</summary>
| Model | Metrics | PER | LOC | ORG | O | Overall |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| [Jean - Baptiste/camembert - ner (110M)](https://hf.co/Jean - Baptiste/camembert - ner) | Precision | 0.908 | 0.717 | 0.753 | 0.987 | 0.947 |
| | Recall | 0.975 | 0.811 | 0.696 | 0.878 | 0.880 |
| | F1 | 0.940 | 0.761 | 0.723 | 0.929 | 0.912 |
| [cmarkea/distilcamembert - base - ner (67.5M)](https://hf.co/cmarkea/distilcamembert - base - ner) | Precision | 0.885 | 0.738 | 0.737 | 0.983 | 0.943 |
| | Recall | 0.960 | 0.759 | 0.655 | 0.882 | 0.877 |
| | F1 | 0.921 | 0.748 | 0.694 | 0.930 | 0.909 |
| [NERmembert - base - 3entities (110M)](https://hf.co/CATIE - AQ/NERmembert - base - 3entities) | Precision | 0.957 | 0.894 | 0.876 | 0.986 | 0.972 |
| | Recall | 0.962 | 0.880 | 0.878 | 0.985 | 0.972 |
| | F1 | 0.960 | 0.887 | 0.876 | 0.985 | 0.972 |
| [NERmembert2 - 3entities (111M)](https://hf.co/CATIE - AQ/NERmembert2 - 3entities) | Precision | 0.951 | 0.906 |... |... |... |
|... |... |... |... |... |... |... |
</table>
</details>
## đ License
This project is licensed under the MIT license.
In this beautified README, the content has been translated into English, and the structure has been optimized. Emojis are added to enhance readability, and the table formatting is improved for better visual presentation. The license information is also included at the end.