Distilbert Base Multilingual Cased Pii

Developed by yonigo

A multilingual PII recognition model fine-tuned based on distilbert-base-multilingual-cased, used to identify personally identifiable information in text.

Sequence Labeling

Transformers

Open Source License:Apache-2.0 #Multilingual PII Recognition #High-precision Entity Extraction #Privacy Information Detection

Downloads 531

Release Time : 6/25/2024

Model Overview

This model is fine-tuned on the ai4privacy/pii-masking-300k dataset, specifically designed to identify and classify personally identifiable information (PII) in text, such as names, addresses, phone numbers, etc.

Model Features

Multilingual Support

Based on the multilingual DistilBERT model, it supports PII recognition in multiple languages.

High-precision Recognition

Demonstrates high F1 scores across multiple PII categories, such as Email F1 reaching 0.9833 and Ip F1 reaching 0.9842.

Lightweight Model

Based on the DistilBERT architecture, it is more lightweight compared to the full BERT model while maintaining high performance.

Model Capabilities

Identify personally identifiable information

Multilingual text processing

Entity classification

Use Cases

Data Privacy Protection

Automatic PII Masking

Automatically identifies personally identifiable information in text and performs masking to protect user privacy.

Accurately identifies various PII types such as names, phone numbers, addresses, etc.

Compliance Checking

Document Compliance Review

Checks whether documents contain sensitive information that needs protection to ensure compliance with privacy regulations.

High accuracy in identifying various PII types, helping ensure compliance

license: apache-2.0 base_model: distilbert-base-multilingual-cased tags:

generated_from_trainer metrics:
precision
recall
f1
accuracy model-index:
name: distilbert-base-multilingual-cased-pii results: [] datasets:
ai4privacy/pii-masking-300k pipeline_tag: token-classification

widget:

text: "My name is Yoni Go and I live in Israel. My phone number is 054-1234567"

inference: parameters: aggregation_strategy: "first"

Usage:

from transformers import pipeline

pipe = pipeline("token-classification", model="yonigo/distilbert-base-multilingual-cased-pii", aggregation_strategy="first")
pipe("My name is Yoni Go and I live in Israel. My phone number is 054-1234567")

training code git

distilbert-base-multilingual-cased-pii

This model is a fine-tuned version of distilbert-base-multilingual-cased on ai4privacy/pii-masking-300k.. It achieves the following results on the evaluation set:

Loss: 0.0470
Bod F1: 0.9642
Building F1: 0.9789
Cardissuer F1: 0.9697
City F1: 0.9566
Country F1: 0.9737
Date F1: 0.9264
Driverlicense F1: 0.9633
Email F1: 0.9833
Geocoord F1: 0.9654
Givenname1 F1: 0.8653
Givenname2 F1: 0.8170
Idcard F1: 0.9390
Ip F1: 0.9842
Lastname1 F1: 0.8495
Lastname2 F1: 0.7609
Lastname3 F1: 0.7281
Pass F1: 0.9247
Passport F1: 0.9540
Postcode F1: 0.9808
Secaddress F1: 0.9732
Sex F1: 0.9700
Socialnumber F1: 0.9689
State F1: 0.9761
Street F1: 0.9609
Tel F1: 0.9777
Time F1: 0.9701
Title F1: 0.9572
Username F1: 0.9594
Precision: 0.9428
Recall: 0.9582
F1: 0.9504
Accuracy: 0.9909

Training results

Training Loss	Epoch	Step	Validation Loss	Bod F1	Building F1	Cardissuer F1	City F1	Country F1	Date F1	Driverlicense F1	Email F1	Geocoord F1	Givenname1 F1	Givenname2 F1	Idcard F1	Ip F1	Lastname1 F1	Lastname2 F1	Lastname3 F1	Pass F1	Passport F1	Postcode F1	Secaddress F1	Sex F1	Socialnumber F1	State F1	Street F1	Tel F1	Time F1	Title F1	Username F1	Precision	Recall	F1	Accuracy
0.2604	0.3601	1000	0.1439	0.8486	0.8928	0.0	0.6347	0.7409	0.6650	0.4865	0.9454	0.8685	0.4884	0.0	0.4298	0.9051	0.4869	0.0	0.0	0.6948	0.5073	0.7842	0.4352	0.6765	0.7223	0.7680	0.6802	0.8438	0.9211	0.5403	0.8180	0.6715	0.7248	0.6971	0.9663
0.0866	0.7202	2000	0.0707	0.9385	0.9611	0.0	0.9027	0.9564	0.8655	0.8200	0.9750	0.9546	0.7057	0.2081	0.8231	0.9689	0.6300	0.1133	0.0	0.8483	0.8467	0.9453	0.9564	0.9319	0.8831	0.9450	0.9101	0.9487	0.9529	0.8716	0.9285	0.8700	0.8839	0.8769	0.9839
0.0659	1.0803	3000	0.0554	0.9507	0.9705	0.0	0.9241	0.9644	0.8952	0.8736	0.9792	0.9280	0.8046	0.6345	0.8698	0.9748	0.7571	0.5305	0.0	0.8533	0.8883	0.9659	0.9678	0.9571	0.9209	0.9615	0.9303	0.9617	0.9630	0.9145	0.9455	0.9014	0.9216	0.9114	0.9868
0.0523	1.4404	4000	0.0484	0.9553	0.9766	0.0	0.9358	0.9677	0.9017	0.8924	0.9758	0.9645	0.8305	0.7005	0.8966	0.9765	0.7978	0.5920	0.0	0.8963	0.9195	0.9741	0.9688	0.9644	0.9266	0.9696	0.9421	0.9706	0.9656	0.9301	0.9520	0.9183	0.9325	0.9253	0.9884
0.0465	1.8005	5000	0.0467	0.9576	0.9759	0.0	0.9400	0.9701	0.9138	0.9209	0.9837	0.9568	0.8423	0.7384	0.9088	0.9835	0.8042	0.6235	0.2139	0.8985	0.9308	0.9711	0.9673	0.9649	0.9450	0.9714	0.9471	0.9708	0.9672	0.9447	0.9532	0.9206	0.9445	0.9324	0.9890
0.0401	2.1606	6000	0.0441	0.9629	0.9755	0.0	0.9486	0.9700	0.9154	0.9288	0.9809	0.9619	0.8485	0.7652	0.9180	0.9826	0.8231	0.6677	0.4724	0.8883	0.9343	0.9777	0.9734	0.9685	0.9490	0.9733	0.9529	0.9743	0.9672	0.9482	0.9555	0.9300	0.9454	0.9377	0.9895
0.0401	2.5207	7000	0.0428	0.9619	0.9769	0.0	0.9492	0.9709	0.9206	0.9401	0.9795	0.9615	0.8550	0.7776	0.9274	0.9827	0.8267	0.6742	0.5845	0.9085	0.9427	0.9798	0.9755	0.9690	0.9515	0.9736	0.9557	0.9764	0.9700	0.9479	0.9580	0.9340	0.9491	0.9415	0.9900
0.0394	2.8808	8000	0.0420	0.9616	0.9770	0.0	0.9481	0.9730	0.9185	0.9451	0.9832	0.9569	0.8526	0.7895	0.9269	0.9852	0.8312	0.7121	0.6234	0.9168	0.9441	0.9778	0.9737	0.9700	0.9514	0.9738	0.9565	0.9751	0.9674	0.9512	0.9562	0.9324	0.9535	0.9429	0.9901
0.0323	3.2409	9000	0.0422	0.9575	0.9781	0.0	0.9521	0.9725	0.9215	0.9445	0.9787	0.9601	0.8459	0.7863	0.9238	0.9834	0.8189	0.7040	0.6460	0.9117	0.9393	0.9792	0.9748	0.9679	0.9575	0.9746	0.9569	0.9732	0.9688	0.9509	0.9557	0.9336	0.9500	0.9418	0.9899
0.0313	3.6010	10000	0.0412	0.9630	0.9784	0.0	0.9551	0.9741	0.9235	0.9460	0.9826	0.9646	0.8619	0.7991	0.9277	0.9829	0.8386	0.7306	0.6767	0.9199	0.9454	0.9810	0.9746	0.9692	0.9598	0.9746	0.9589	0.9731	0.9685	0.9547	0.9583	0.9390	0.9527	0.9458	0.9904
0.0304	3.9611	11000	0.0404	0.9587	0.9792	0.1333	0.9511	0.9725	0.9219	0.9538	0.9769	0.9578	0.8589	0.8061	0.9255	0.9845	0.8402	0.7395	0.6790	0.9136	0.9479	0.9801	0.9748	0.9698	0.9628	0.9752	0.9581	0.9775	0.9695	0.9501	0.9597	0.9373	0.9540	0.9456	0.9904
0.0264	4.3212	12000	0.0416	0.9599	0.9794	0.5	0.9547	0.9735	0.9271	0.9557	0.9809	0.9537	0.8510	0.8016	0.9316	0.9816	0.8358	0.7412	0.6877	0.9212	0.9476	0.9779	0.9729	0.9682	0.9611	0.9748	0.9593	0.9742	0.9697	0.9551	0.9590	0.9370	0.9550	0.9459	0.9904
0.0266	4.6813	13000	0.0412	0.9629	0.9800	0.5	0.9511	0.9697	0.9276	0.9564	0.9826	0.9578	0.8590	0.8078	0.9303	0.9830	0.8423	0.7470	0.6945	0.9162	0.9468	0.9789	0.9713	0.9692	0.9597	0.9748	0.9584	0.9759	0.9698	0.9555	0.9575	0.9355	0.9579	0.9466	0.9905
0.0236	5.0414	14000	0.0414	0.9614	0.9786	0.6061	0.9562	0.9736	0.9223	0.9595	0.9821	0.9537	0.8673	0.8108	0.9367	0.9811	0.8422	0.7523	0.7140	0.9190	0.9503	0.9807	0.9679	0.9689	0.9676	0.9750	0.9611	0.9758	0.9699	0.9556	0.9589	0.9426	0.9543	0.9484	0.9907
0.0221	5.4015	15000	0.0420	0.9597	0.9797	0.6667	0.9554	0.9734	0.9210	0.9587	0.9832	0.9667	0.8637	0.8121	0.9367	0.9852	0.8449	0.7509	0.7145	0.9178	0.9498	0.9808	0.9746	0.9707	0.9650	0.9746	0.9604	0.9749	0.9692	0.9556	0.9591	0.9405	0.9563	0.9484	0.9906
0.021	5.7616	16000	0.0421	0.9613	0.9794	0.6667	0.9532	0.9736	0.9287	0.9554	0.9792	0.9599	0.8624	0.8146	0.9334	0.9790	0.8445	0.7534	0.7154	0.9181	0.9487	0.9791	0.9721	0.9691	0.9646	0.9748	0.9534	0.9757	0.9693	0.9561	0.9586	0.9403	0.9545	0.9473	0.9905
0.0174	6.1217	17000	0.0433	0.9617	0.9788	0.7879	0.9545	0.9738	0.9241	0.9598	0.9829	0.9589	0.8570	0.8131	0.9369	0.9838	0.8449	0.7581	0.7242	0.9230	0.9488	0.9798	0.9690	0.9691	0.9652	0.9759	0.9563	0.9769	0.9700	0.9556	0.9581	0.9403	0.9563	0.9482	0.9907
0.017	6.4818	18000	0.0442	0.9623	0.9790	0.9697	0.9566	0.9744	0.9258	0.9608	0.9833	0.9574	0.8565	0.8130	0.9350	0.9845	0.8450	0.7552	0.7329	0.9216	0.9519	0.9800	0.9723	0.9703	0.9675	0.9762	0.9605	0.9775	0.9713	0.9545	0.9582	0.9398	0.9582	0.9489	0.9907
0.017	6.8419	19000	0.0431	0.9639	0.9778	0.9697	0.9562	0.9738	0.9286	0.9612	0.9842	0.9607	0.8641	0.8160	0.9363	0.9828	0.8481	0.7610	0.7292	0.9198	0.9531	0.9800	0.9757	0.9699	0.9657	0.9751	0.9600	0.9767	0.9705	0.9565	0.9587	0.9414	0.9577	0.9495	0.9909
0.015	7.2020	20000	0.0438	0.9645	0.9795	0.9091	0.9550	0.9734	0.9295	0.9605	0.9824	0.9605	0.8594	0.8120	0.9382	0.9837	0.8452	0.7571	0.7222	0.9220	0.9540	0.9810	0.9745	0.9700	0.9672	0.9758	0.9599	0.9783	0.9702	0.9551	0.9596	0.9414	0.9576	0.9494	0.9908
0.0152	7.5621	21000	0.0451	0.9644	0.9795	0.9697	0.9570	0.9741	0.9271	0.9616	0.9826	0.9597	0.8649	0.8121	0.9374	0.9848	0.8469	0.7612	0.7261	0.9231	0.9530	0.9809	0.9747	0.9704	0.9661	0.9756	0.9618	0.9769	0.9706	0.9570	0.9601	0.9427	0.9573	0.9499	0.9908
0.0137	7.9222	22000	0.0450	0.9628	0.9780	0.9697	0.9565	0.9742	0.9289	0.9627	0.9832	0.9613	0.8643	0.8169	0.9374	0.9840	0.8497	0.7632	0.7292	0.9234	0.9514	0.9807	0.9737	0.9695	0.9674	0.9758	0.9610	0.9778	0.9701	0.9572	0.9596	0.9420	0.9582	0.9501	0.9908
0.0122	8.2823	23000	0.0463	0.9646	0.9789	0.9697	0.9560	0.9738	0.9276	0.9628	0.9835	0.9602	0.8643	0.8176	0.9386	0.9838	0.8494	0.7638	0.7275	0.9233	0.9519	0.9806	0.9739	0.9696	0.9682	0.9762	0.9604	0.9769	0.9698	0.9577	0.9592	0.9426	0.9578	0.9502	0.9908
0.0123	8.6424	24000	0.0459	0.9626	0.9782	0.9697	0.9566	0.9743	0.9276	0.9628	0.9839	0.9613	0.8670	0.8163	0.9394	0.9850	0.8487	0.7635	0.7357	0.9241	0.9539	0.9810	0.9737	0.9701	0.9680	0.9757	0.9617	0.9780	0.9702	0.9574	0.9601	0.9436	0.9578	0.9506	0.9909
0.0133	9.0025	25000	0.0462	0.9636	0.9788	0.9697	0.9563	0.9731	0.9273	0.9631	0.9835	0.9625	0.8672	0.8157	0.9393	0.9837	0.8495	0.7609	0.7289	0.9236	0.9541	0.9814	0.9737	0.9698	0.9684	0.9761	0.9618	0.9776	0.9698	0.9570	0.9591	0.9435	0.9574	0.9504	0.9909
0.0112	9.3626	26000	0.0467	0.9624	0.9789	0.9697	0.9567	0.9740	0.9243	0.9635	0.9832	0.9654	0.8643	0.8170	0.9375	0.9844	0.8489	0.7603	0.7303	0.9248	0.9534	0.9812	0.9735	0.9701	0.9685	0.9762	0.9617	0.9784	0.9698	0.9563	0.9594	0.9428	0.9576	0.9501	0.9909
0.0116	9.7227	27000	0.0464	0.9628	0.9789	0.9697	0.9562	0.9741	0.9260	0.9633	0.9826	0.9643	0.8637	0.8138	0.9379	0.9843	0.8492	0.7610	0.7278	0.9245	0.9536	0.9808	0.9725	0.9702	0.9686	0.9761	0.9613	0.9778	0.9698	0.9564	0.9591	0.9419	0.9583	0.9500	0.9908
0.011	10.0828	28000	0.0470	0.9637	0.9790	0.9697	0.9561	0.9736	0.9266	0.9632	0.9831	0.9646	0.8656	0.8160	0.9384	0.9843	0.8494	0.7597	0.7281	0.9239	0.9537	0.9805	0.9731	0.9701	0.9685	0.9759	0.9611	0.9778	0.9698	0.9573	0.9591	0.9423	0.9583	0.9502	0.9909
0.011	10.4429	29000	0.0469	0.9642	0.9790	0.9697	0.9567	0.9738	0.9267	0.9632	0.9834	0.9654	0.8653	0.8172	0.9393	0.9842	0.8495	0.7609	0.7287	0.9247	0.9544	0.9809	0.9732	0.9699	0.9687	0.9762	0.9614	0.9777	0.9699	0.9574	0.9596	0.9430	0.9581	0.9505	0.9909
0.0106	10.8030	30000	0.0470	0.9642	0.9789	0.9697	0.9566	0.9737	0.9264	0.9633	0.9833	0.9654	0.8653	0.8170	0.9390	0.9842	0.8495	0.7609	0.7281	0.9247	0.9540	0.9808	0.9732	0.9700	0.9689	0.9761	0.9609	0.9777	0.9701	0.9572	0.9594	0.9428	0.9582	0.9504	0.9909

Framework versions

Transformers 4.41.2
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご