Distilbert Base Multilingual Cased Pii
D
Distilbert Base Multilingual Cased Pii
Developed by yonigo
A multilingual PII recognition model fine-tuned based on distilbert-base-multilingual-cased, used to identify personally identifiable information in text.
Downloads 531
Release Time : 6/25/2024
Model Overview
This model is fine-tuned on the ai4privacy/pii-masking-300k dataset, specifically designed to identify and classify personally identifiable information (PII) in text, such as names, addresses, phone numbers, etc.
Model Features
Multilingual Support
Based on the multilingual DistilBERT model, it supports PII recognition in multiple languages.
High-precision Recognition
Demonstrates high F1 scores across multiple PII categories, such as Email F1 reaching 0.9833 and Ip F1 reaching 0.9842.
Lightweight Model
Based on the DistilBERT architecture, it is more lightweight compared to the full BERT model while maintaining high performance.
Model Capabilities
Identify personally identifiable information
Multilingual text processing
Entity classification
Use Cases
Data Privacy Protection
Automatic PII Masking
Automatically identifies personally identifiable information in text and performs masking to protect user privacy.
Accurately identifies various PII types such as names, phone numbers, addresses, etc.
Compliance Checking
Document Compliance Review
Checks whether documents contain sensitive information that needs protection to ensure compliance with privacy regulations.
High accuracy in identifying various PII types, helping ensure compliance
license: apache-2.0 base_model: distilbert-base-multilingual-cased tags:
- generated_from_trainer metrics:
- precision
- recall
- f1
- accuracy model-index:
- name: distilbert-base-multilingual-cased-pii results: [] datasets:
- ai4privacy/pii-masking-300k pipeline_tag: token-classification
widget:
- text: "My name is Yoni Go and I live in Israel. My phone number is 054-1234567"
inference: parameters: aggregation_strategy: "first"
Usage:
from transformers import pipeline
pipe = pipeline("token-classification", model="yonigo/distilbert-base-multilingual-cased-pii", aggregation_strategy="first")
pipe("My name is Yoni Go and I live in Israel. My phone number is 054-1234567")
training code git
distilbert-base-multilingual-cased-pii
This model is a fine-tuned version of distilbert-base-multilingual-cased on ai4privacy/pii-masking-300k.. It achieves the following results on the evaluation set:
- Loss: 0.0470
- Bod F1: 0.9642
- Building F1: 0.9789
- Cardissuer F1: 0.9697
- City F1: 0.9566
- Country F1: 0.9737
- Date F1: 0.9264
- Driverlicense F1: 0.9633
- Email F1: 0.9833
- Geocoord F1: 0.9654
- Givenname1 F1: 0.8653
- Givenname2 F1: 0.8170
- Idcard F1: 0.9390
- Ip F1: 0.9842
- Lastname1 F1: 0.8495
- Lastname2 F1: 0.7609
- Lastname3 F1: 0.7281
- Pass F1: 0.9247
- Passport F1: 0.9540
- Postcode F1: 0.9808
- Secaddress F1: 0.9732
- Sex F1: 0.9700
- Socialnumber F1: 0.9689
- State F1: 0.9761
- Street F1: 0.9609
- Tel F1: 0.9777
- Time F1: 0.9701
- Title F1: 0.9572
- Username F1: 0.9594
- Precision: 0.9428
- Recall: 0.9582
- F1: 0.9504
- Accuracy: 0.9909
Training results
Training Loss | Epoch | Step | Validation Loss | Bod F1 | Building F1 | Cardissuer F1 | City F1 | Country F1 | Date F1 | Driverlicense F1 | Email F1 | Geocoord F1 | Givenname1 F1 | Givenname2 F1 | Idcard F1 | Ip F1 | Lastname1 F1 | Lastname2 F1 | Lastname3 F1 | Pass F1 | Passport F1 | Postcode F1 | Secaddress F1 | Sex F1 | Socialnumber F1 | State F1 | Street F1 | Tel F1 | Time F1 | Title F1 | Username F1 | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.2604 | 0.3601 | 1000 | 0.1439 | 0.8486 | 0.8928 | 0.0 | 0.6347 | 0.7409 | 0.6650 | 0.4865 | 0.9454 | 0.8685 | 0.4884 | 0.0 | 0.4298 | 0.9051 | 0.4869 | 0.0 | 0.0 | 0.6948 | 0.5073 | 0.7842 | 0.4352 | 0.6765 | 0.7223 | 0.7680 | 0.6802 | 0.8438 | 0.9211 | 0.5403 | 0.8180 | 0.6715 | 0.7248 | 0.6971 | 0.9663 |
0.0866 | 0.7202 | 2000 | 0.0707 | 0.9385 | 0.9611 | 0.0 | 0.9027 | 0.9564 | 0.8655 | 0.8200 | 0.9750 | 0.9546 | 0.7057 | 0.2081 | 0.8231 | 0.9689 | 0.6300 | 0.1133 | 0.0 | 0.8483 | 0.8467 | 0.9453 | 0.9564 | 0.9319 | 0.8831 | 0.9450 | 0.9101 | 0.9487 | 0.9529 | 0.8716 | 0.9285 | 0.8700 | 0.8839 | 0.8769 | 0.9839 |
0.0659 | 1.0803 | 3000 | 0.0554 | 0.9507 | 0.9705 | 0.0 | 0.9241 | 0.9644 | 0.8952 | 0.8736 | 0.9792 | 0.9280 | 0.8046 | 0.6345 | 0.8698 | 0.9748 | 0.7571 | 0.5305 | 0.0 | 0.8533 | 0.8883 | 0.9659 | 0.9678 | 0.9571 | 0.9209 | 0.9615 | 0.9303 | 0.9617 | 0.9630 | 0.9145 | 0.9455 | 0.9014 | 0.9216 | 0.9114 | 0.9868 |
0.0523 | 1.4404 | 4000 | 0.0484 | 0.9553 | 0.9766 | 0.0 | 0.9358 | 0.9677 | 0.9017 | 0.8924 | 0.9758 | 0.9645 | 0.8305 | 0.7005 | 0.8966 | 0.9765 | 0.7978 | 0.5920 | 0.0 | 0.8963 | 0.9195 | 0.9741 | 0.9688 | 0.9644 | 0.9266 | 0.9696 | 0.9421 | 0.9706 | 0.9656 | 0.9301 | 0.9520 | 0.9183 | 0.9325 | 0.9253 | 0.9884 |
0.0465 | 1.8005 | 5000 | 0.0467 | 0.9576 | 0.9759 | 0.0 | 0.9400 | 0.9701 | 0.9138 | 0.9209 | 0.9837 | 0.9568 | 0.8423 | 0.7384 | 0.9088 | 0.9835 | 0.8042 | 0.6235 | 0.2139 | 0.8985 | 0.9308 | 0.9711 | 0.9673 | 0.9649 | 0.9450 | 0.9714 | 0.9471 | 0.9708 | 0.9672 | 0.9447 | 0.9532 | 0.9206 | 0.9445 | 0.9324 | 0.9890 |
0.0401 | 2.1606 | 6000 | 0.0441 | 0.9629 | 0.9755 | 0.0 | 0.9486 | 0.9700 | 0.9154 | 0.9288 | 0.9809 | 0.9619 | 0.8485 | 0.7652 | 0.9180 | 0.9826 | 0.8231 | 0.6677 | 0.4724 | 0.8883 | 0.9343 | 0.9777 | 0.9734 | 0.9685 | 0.9490 | 0.9733 | 0.9529 | 0.9743 | 0.9672 | 0.9482 | 0.9555 | 0.9300 | 0.9454 | 0.9377 | 0.9895 |
0.0401 | 2.5207 | 7000 | 0.0428 | 0.9619 | 0.9769 | 0.0 | 0.9492 | 0.9709 | 0.9206 | 0.9401 | 0.9795 | 0.9615 | 0.8550 | 0.7776 | 0.9274 | 0.9827 | 0.8267 | 0.6742 | 0.5845 | 0.9085 | 0.9427 | 0.9798 | 0.9755 | 0.9690 | 0.9515 | 0.9736 | 0.9557 | 0.9764 | 0.9700 | 0.9479 | 0.9580 | 0.9340 | 0.9491 | 0.9415 | 0.9900 |
0.0394 | 2.8808 | 8000 | 0.0420 | 0.9616 | 0.9770 | 0.0 | 0.9481 | 0.9730 | 0.9185 | 0.9451 | 0.9832 | 0.9569 | 0.8526 | 0.7895 | 0.9269 | 0.9852 | 0.8312 | 0.7121 | 0.6234 | 0.9168 | 0.9441 | 0.9778 | 0.9737 | 0.9700 | 0.9514 | 0.9738 | 0.9565 | 0.9751 | 0.9674 | 0.9512 | 0.9562 | 0.9324 | 0.9535 | 0.9429 | 0.9901 |
0.0323 | 3.2409 | 9000 | 0.0422 | 0.9575 | 0.9781 | 0.0 | 0.9521 | 0.9725 | 0.9215 | 0.9445 | 0.9787 | 0.9601 | 0.8459 | 0.7863 | 0.9238 | 0.9834 | 0.8189 | 0.7040 | 0.6460 | 0.9117 | 0.9393 | 0.9792 | 0.9748 | 0.9679 | 0.9575 | 0.9746 | 0.9569 | 0.9732 | 0.9688 | 0.9509 | 0.9557 | 0.9336 | 0.9500 | 0.9418 | 0.9899 |
0.0313 | 3.6010 | 10000 | 0.0412 | 0.9630 | 0.9784 | 0.0 | 0.9551 | 0.9741 | 0.9235 | 0.9460 | 0.9826 | 0.9646 | 0.8619 | 0.7991 | 0.9277 | 0.9829 | 0.8386 | 0.7306 | 0.6767 | 0.9199 | 0.9454 | 0.9810 | 0.9746 | 0.9692 | 0.9598 | 0.9746 | 0.9589 | 0.9731 | 0.9685 | 0.9547 | 0.9583 | 0.9390 | 0.9527 | 0.9458 | 0.9904 |
0.0304 | 3.9611 | 11000 | 0.0404 | 0.9587 | 0.9792 | 0.1333 | 0.9511 | 0.9725 | 0.9219 | 0.9538 | 0.9769 | 0.9578 | 0.8589 | 0.8061 | 0.9255 | 0.9845 | 0.8402 | 0.7395 | 0.6790 | 0.9136 | 0.9479 | 0.9801 | 0.9748 | 0.9698 | 0.9628 | 0.9752 | 0.9581 | 0.9775 | 0.9695 | 0.9501 | 0.9597 | 0.9373 | 0.9540 | 0.9456 | 0.9904 |
0.0264 | 4.3212 | 12000 | 0.0416 | 0.9599 | 0.9794 | 0.5 | 0.9547 | 0.9735 | 0.9271 | 0.9557 | 0.9809 | 0.9537 | 0.8510 | 0.8016 | 0.9316 | 0.9816 | 0.8358 | 0.7412 | 0.6877 | 0.9212 | 0.9476 | 0.9779 | 0.9729 | 0.9682 | 0.9611 | 0.9748 | 0.9593 | 0.9742 | 0.9697 | 0.9551 | 0.9590 | 0.9370 | 0.9550 | 0.9459 | 0.9904 |
0.0266 | 4.6813 | 13000 | 0.0412 | 0.9629 | 0.9800 | 0.5 | 0.9511 | 0.9697 | 0.9276 | 0.9564 | 0.9826 | 0.9578 | 0.8590 | 0.8078 | 0.9303 | 0.9830 | 0.8423 | 0.7470 | 0.6945 | 0.9162 | 0.9468 | 0.9789 | 0.9713 | 0.9692 | 0.9597 | 0.9748 | 0.9584 | 0.9759 | 0.9698 | 0.9555 | 0.9575 | 0.9355 | 0.9579 | 0.9466 | 0.9905 |
0.0236 | 5.0414 | 14000 | 0.0414 | 0.9614 | 0.9786 | 0.6061 | 0.9562 | 0.9736 | 0.9223 | 0.9595 | 0.9821 | 0.9537 | 0.8673 | 0.8108 | 0.9367 | 0.9811 | 0.8422 | 0.7523 | 0.7140 | 0.9190 | 0.9503 | 0.9807 | 0.9679 | 0.9689 | 0.9676 | 0.9750 | 0.9611 | 0.9758 | 0.9699 | 0.9556 | 0.9589 | 0.9426 | 0.9543 | 0.9484 | 0.9907 |
0.0221 | 5.4015 | 15000 | 0.0420 | 0.9597 | 0.9797 | 0.6667 | 0.9554 | 0.9734 | 0.9210 | 0.9587 | 0.9832 | 0.9667 | 0.8637 | 0.8121 | 0.9367 | 0.9852 | 0.8449 | 0.7509 | 0.7145 | 0.9178 | 0.9498 | 0.9808 | 0.9746 | 0.9707 | 0.9650 | 0.9746 | 0.9604 | 0.9749 | 0.9692 | 0.9556 | 0.9591 | 0.9405 | 0.9563 | 0.9484 | 0.9906 |
0.021 | 5.7616 | 16000 | 0.0421 | 0.9613 | 0.9794 | 0.6667 | 0.9532 | 0.9736 | 0.9287 | 0.9554 | 0.9792 | 0.9599 | 0.8624 | 0.8146 | 0.9334 | 0.9790 | 0.8445 | 0.7534 | 0.7154 | 0.9181 | 0.9487 | 0.9791 | 0.9721 | 0.9691 | 0.9646 | 0.9748 | 0.9534 | 0.9757 | 0.9693 | 0.9561 | 0.9586 | 0.9403 | 0.9545 | 0.9473 | 0.9905 |
0.0174 | 6.1217 | 17000 | 0.0433 | 0.9617 | 0.9788 | 0.7879 | 0.9545 | 0.9738 | 0.9241 | 0.9598 | 0.9829 | 0.9589 | 0.8570 | 0.8131 | 0.9369 | 0.9838 | 0.8449 | 0.7581 | 0.7242 | 0.9230 | 0.9488 | 0.9798 | 0.9690 | 0.9691 | 0.9652 | 0.9759 | 0.9563 | 0.9769 | 0.9700 | 0.9556 | 0.9581 | 0.9403 | 0.9563 | 0.9482 | 0.9907 |
0.017 | 6.4818 | 18000 | 0.0442 | 0.9623 | 0.9790 | 0.9697 | 0.9566 | 0.9744 | 0.9258 | 0.9608 | 0.9833 | 0.9574 | 0.8565 | 0.8130 | 0.9350 | 0.9845 | 0.8450 | 0.7552 | 0.7329 | 0.9216 | 0.9519 | 0.9800 | 0.9723 | 0.9703 | 0.9675 | 0.9762 | 0.9605 | 0.9775 | 0.9713 | 0.9545 | 0.9582 | 0.9398 | 0.9582 | 0.9489 | 0.9907 |
0.017 | 6.8419 | 19000 | 0.0431 | 0.9639 | 0.9778 | 0.9697 | 0.9562 | 0.9738 | 0.9286 | 0.9612 | 0.9842 | 0.9607 | 0.8641 | 0.8160 | 0.9363 | 0.9828 | 0.8481 | 0.7610 | 0.7292 | 0.9198 | 0.9531 | 0.9800 | 0.9757 | 0.9699 | 0.9657 | 0.9751 | 0.9600 | 0.9767 | 0.9705 | 0.9565 | 0.9587 | 0.9414 | 0.9577 | 0.9495 | 0.9909 |
0.015 | 7.2020 | 20000 | 0.0438 | 0.9645 | 0.9795 | 0.9091 | 0.9550 | 0.9734 | 0.9295 | 0.9605 | 0.9824 | 0.9605 | 0.8594 | 0.8120 | 0.9382 | 0.9837 | 0.8452 | 0.7571 | 0.7222 | 0.9220 | 0.9540 | 0.9810 | 0.9745 | 0.9700 | 0.9672 | 0.9758 | 0.9599 | 0.9783 | 0.9702 | 0.9551 | 0.9596 | 0.9414 | 0.9576 | 0.9494 | 0.9908 |
0.0152 | 7.5621 | 21000 | 0.0451 | 0.9644 | 0.9795 | 0.9697 | 0.9570 | 0.9741 | 0.9271 | 0.9616 | 0.9826 | 0.9597 | 0.8649 | 0.8121 | 0.9374 | 0.9848 | 0.8469 | 0.7612 | 0.7261 | 0.9231 | 0.9530 | 0.9809 | 0.9747 | 0.9704 | 0.9661 | 0.9756 | 0.9618 | 0.9769 | 0.9706 | 0.9570 | 0.9601 | 0.9427 | 0.9573 | 0.9499 | 0.9908 |
0.0137 | 7.9222 | 22000 | 0.0450 | 0.9628 | 0.9780 | 0.9697 | 0.9565 | 0.9742 | 0.9289 | 0.9627 | 0.9832 | 0.9613 | 0.8643 | 0.8169 | 0.9374 | 0.9840 | 0.8497 | 0.7632 | 0.7292 | 0.9234 | 0.9514 | 0.9807 | 0.9737 | 0.9695 | 0.9674 | 0.9758 | 0.9610 | 0.9778 | 0.9701 | 0.9572 | 0.9596 | 0.9420 | 0.9582 | 0.9501 | 0.9908 |
0.0122 | 8.2823 | 23000 | 0.0463 | 0.9646 | 0.9789 | 0.9697 | 0.9560 | 0.9738 | 0.9276 | 0.9628 | 0.9835 | 0.9602 | 0.8643 | 0.8176 | 0.9386 | 0.9838 | 0.8494 | 0.7638 | 0.7275 | 0.9233 | 0.9519 | 0.9806 | 0.9739 | 0.9696 | 0.9682 | 0.9762 | 0.9604 | 0.9769 | 0.9698 | 0.9577 | 0.9592 | 0.9426 | 0.9578 | 0.9502 | 0.9908 |
0.0123 | 8.6424 | 24000 | 0.0459 | 0.9626 | 0.9782 | 0.9697 | 0.9566 | 0.9743 | 0.9276 | 0.9628 | 0.9839 | 0.9613 | 0.8670 | 0.8163 | 0.9394 | 0.9850 | 0.8487 | 0.7635 | 0.7357 | 0.9241 | 0.9539 | 0.9810 | 0.9737 | 0.9701 | 0.9680 | 0.9757 | 0.9617 | 0.9780 | 0.9702 | 0.9574 | 0.9601 | 0.9436 | 0.9578 | 0.9506 | 0.9909 |
0.0133 | 9.0025 | 25000 | 0.0462 | 0.9636 | 0.9788 | 0.9697 | 0.9563 | 0.9731 | 0.9273 | 0.9631 | 0.9835 | 0.9625 | 0.8672 | 0.8157 | 0.9393 | 0.9837 | 0.8495 | 0.7609 | 0.7289 | 0.9236 | 0.9541 | 0.9814 | 0.9737 | 0.9698 | 0.9684 | 0.9761 | 0.9618 | 0.9776 | 0.9698 | 0.9570 | 0.9591 | 0.9435 | 0.9574 | 0.9504 | 0.9909 |
0.0112 | 9.3626 | 26000 | 0.0467 | 0.9624 | 0.9789 | 0.9697 | 0.9567 | 0.9740 | 0.9243 | 0.9635 | 0.9832 | 0.9654 | 0.8643 | 0.8170 | 0.9375 | 0.9844 | 0.8489 | 0.7603 | 0.7303 | 0.9248 | 0.9534 | 0.9812 | 0.9735 | 0.9701 | 0.9685 | 0.9762 | 0.9617 | 0.9784 | 0.9698 | 0.9563 | 0.9594 | 0.9428 | 0.9576 | 0.9501 | 0.9909 |
0.0116 | 9.7227 | 27000 | 0.0464 | 0.9628 | 0.9789 | 0.9697 | 0.9562 | 0.9741 | 0.9260 | 0.9633 | 0.9826 | 0.9643 | 0.8637 | 0.8138 | 0.9379 | 0.9843 | 0.8492 | 0.7610 | 0.7278 | 0.9245 | 0.9536 | 0.9808 | 0.9725 | 0.9702 | 0.9686 | 0.9761 | 0.9613 | 0.9778 | 0.9698 | 0.9564 | 0.9591 | 0.9419 | 0.9583 | 0.9500 | 0.9908 |
0.011 | 10.0828 | 28000 | 0.0470 | 0.9637 | 0.9790 | 0.9697 | 0.9561 | 0.9736 | 0.9266 | 0.9632 | 0.9831 | 0.9646 | 0.8656 | 0.8160 | 0.9384 | 0.9843 | 0.8494 | 0.7597 | 0.7281 | 0.9239 | 0.9537 | 0.9805 | 0.9731 | 0.9701 | 0.9685 | 0.9759 | 0.9611 | 0.9778 | 0.9698 | 0.9573 | 0.9591 | 0.9423 | 0.9583 | 0.9502 | 0.9909 |
0.011 | 10.4429 | 29000 | 0.0469 | 0.9642 | 0.9790 | 0.9697 | 0.9567 | 0.9738 | 0.9267 | 0.9632 | 0.9834 | 0.9654 | 0.8653 | 0.8172 | 0.9393 | 0.9842 | 0.8495 | 0.7609 | 0.7287 | 0.9247 | 0.9544 | 0.9809 | 0.9732 | 0.9699 | 0.9687 | 0.9762 | 0.9614 | 0.9777 | 0.9699 | 0.9574 | 0.9596 | 0.9430 | 0.9581 | 0.9505 | 0.9909 |
0.0106 | 10.8030 | 30000 | 0.0470 | 0.9642 | 0.9789 | 0.9697 | 0.9566 | 0.9737 | 0.9264 | 0.9633 | 0.9833 | 0.9654 | 0.8653 | 0.8170 | 0.9390 | 0.9842 | 0.8495 | 0.7609 | 0.7281 | 0.9247 | 0.9540 | 0.9808 | 0.9732 | 0.9700 | 0.9689 | 0.9761 | 0.9609 | 0.9777 | 0.9701 | 0.9572 | 0.9594 | 0.9428 | 0.9582 | 0.9504 | 0.9909 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
Indonesian Roberta Base Posp Tagger
MIT
This is a POS tagging model fine-tuned based on the Indonesian RoBERTa model, trained on the indonlu dataset for Indonesian text POS tagging tasks.
Sequence Labeling
Transformers Other

I
w11wo
2.2M
7
Bert Base NER
MIT
BERT fine-tuned named entity recognition model capable of identifying four entity types: Location (LOC), Organization (ORG), Person (PER), and Miscellaneous (MISC)
Sequence Labeling English
B
dslim
1.8M
592
Deid Roberta I2b2
MIT
This model is a sequence labeling model fine-tuned on RoBERTa, designed to identify and remove Protected Health Information (PHI/PII) from medical records.
Sequence Labeling
Transformers Supports Multiple Languages

D
obi
1.1M
33
Ner English Fast
Flair's built-in fast English 4-class named entity recognition model, based on Flair embeddings and LSTM-CRF architecture, achieving an F1 score of 92.92 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
978.01k
24
French Camembert Postag Model
French POS tagging model based on Camembert-base, trained using the free-french-treebank dataset
Sequence Labeling
Transformers French

F
gilf
950.03k
9
Xlm Roberta Large Ner Spanish
A Spanish named entity recognition model fine-tuned based on the XLM-Roberta-large architecture, with excellent performance on the CoNLL-2002 dataset.
Sequence Labeling
Transformers Spanish

X
MMG
767.35k
29
Nusabert Ner V1.3
MIT
Named entity recognition model fine-tuned on Indonesian NER tasks based on NusaBert-v1.3
Sequence Labeling
Transformers Other

N
cahya
759.09k
3
Ner English Large
Flair framework's built-in large English NER model for 4 entity types, utilizing document-level XLM-R embeddings and FLERT technique, achieving an F1 score of 94.36 on the CoNLL-03 dataset.
Sequence Labeling
PyTorch English
N
flair
749.04k
44
Punctuate All
MIT
A multilingual punctuation prediction model fine-tuned based on xlm-roberta-base, supporting automatic punctuation completion for 12 European languages
Sequence Labeling
Transformers

P
kredor
728.70k
20
Xlm Roberta Ner Japanese
MIT
Japanese named entity recognition model fine-tuned based on xlm-roberta-base
Sequence Labeling
Transformers Supports Multiple Languages

X
tsmatz
630.71k
25
Featured Recommended AI Models