Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Llama-3-70B-Special-Tokens-Adjusted
This is an ideal and stable Llama-3-70B model for fine - tuning, with adjusted special tokens.

This model is generously created and made open source by Astronomer.
Astronomer is the de facto company for Apache Airflow, the most trusted open - source framework for data orchestration and MLOps.
🚀 Quick Start
This model is an adjusted version of [meta - llama/Meta - Llama - 3 - 70B](https://huggingface.co/meta - llama/Meta - Llama - 3 - 70B), aiming to solve the training instability issues caused by untrained special tokens.
✨ Features
- Ideal for fine - tuning, providing a stable Llama - 3 - 70B base.
- Addresses the problem of training instabilities caused by untrained special tokens.
- Allows the community to use the Llama 3 model without complex pre - tuning fixes.
📦 Information Table
Property | Details |
---|---|
Base Model | meta - llama/Meta - Llama - 3 - 70B |
Model Creator | astronomer - io |
Model Name | Meta - Llama - 3 - 70B |
Model Type | llama |
Pipeline Tag | text - generation |
License | other |
License Name | llama - 3 |
License Link | [https://huggingface.co/meta - llama/Meta - Llama - 3 - 70B/blob/main/README.md](https://huggingface.co/meta - llama/Meta - Llama - 3 - 70B/blob/main/README.md) |
Tags | llama, llama - 3, facebook, meta, astronomer, pretrained, finetuned, autotrain_compatible, endpoints_compatible |
📚 Documentation
Description
This is the exact same model as [meta - llama/Meta - Llama - 3 - 70B](https://huggingface.co/meta - llama/Meta - Llama - 3 - 70B), with the weights for the input and output embeddings from the lm head and embedding matrix adjusted. The adjustment is based on the mean of the trained tokens for certain untrained tokens, which could cause widespread issues when fine - tuning this base model, such as adding custom tokens or using existing special tokens.
Why We Made This Model
The Llama 3 base (non - instruct) model, though powerful, has a significant oversight. Some special tokens for instruction following within its architecture were left untrained, as first noted by Daniel Han on X. This oversight could lead to training instabilities, like sudden gradient explosions or NaN
gradients.

The main goal of releasing this patched model is to solve this problem, enabling the community to use the Llama 3 model without facing training instabilities or having to perform complex self - fixing processes before fine - tuning.
Note: specifically for the 70B model, the untrained special tokens did not have all zero values for the embedding weights. So the severity of this problem may not be as high as that of the base 8B model. However, this model was created at the request of the community, though in theory, direct fine - tuning should be okay.
Details of the Adjustment
The [meta - llama/Meta - Llama - 3 - 70B](https://huggingface.co/meta - llama/Meta - Llama - 3 - 70B) model was directly pulled from HuggingFace and loaded using transformers. Then, the input embedding and output embedding values are retrieved using model.get_input_embeddings().weight.data
and model.get_output_embeddings().weight.data
. These two matrices have the same shape, with each row representing a token id and each column representing an embedding feature.
The special (untrained & problematic) tokens can be found by locating the rows where the entire row of the embedding values are less than 9e - 7 (for the 70B model, no row had all zeros, so thresholding using 9e - 7 was done to find under - trained tokens). These untrained tokens could lead to heavy computational issues, such as gradient explosions or NaN
gradients, during downstream fine - tuning on specific tasks.
See here for a list of the tokens we found that fit the "untrained" profile described:
['À', 'Á', 'õ', 'ö', '÷', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'þ', 'ÿ', '">ččĊ', ';čččĊ', 'ĉTokenNameIdentifier', 'ĠForCanBeConverted', 'ĠForCanBeConvertedToF', 'PostalCodesNL', '$PostalCodesNL', 'useRalative', 'Û±Û', 'аÑĢакÑĤ', 'аÑĤиÑģÑı', 'иÑĤиÑģÑı', 'ávajÃŃcÃŃ', 'İTESİ', 'илакÑĤи', 'илаÑģÑı', 'ÑĭÑŁN', 'ÐİÑĭÑŁN', 'ılmaktadır', 'ÐİÑĭÑŁNÐİÑĭÑŁN', 'ıldıģında', '<|reserved_special_token_0|>', '<|reserved_special_token_1|>', '<|reserved_special_token_2|>', '<|reserved_special_token_3|>', '<|start_header_id|>', '<|end_header_id|>', '<|reserved_special_token_4|>', '<|eot_id|>', '<|reserved_special_token_5|>', '<|reserved_special_token_6|>', '<|reserved_special_token_7|>', '<|reserved_special_token_8|>', '<|reserved_special_token_9|>', '<|reserved_special_token_10|>', '<|reserved_special_token_11|>', '<|reserved_special_token_12|>', '<|reserved_special_token_13|>', '<|reserved_special_token_14|>', '<|reserved_special_token_15|>', '<|reserved_special_token_16|>', '<|reserved_special_token_17|>', '<|reserved_special_token_18|>', '<|reserved_special_token_19|>', '<|reserved_special_token_20|>', '<|reserved_special_token_21|>', '<|reserved_special_token_22|>', '<|reserved_special_token_23|>', '<|reserved_special_token_24|>', '<|reserved_special_token_25|>', '<|reserved_special_token_26|>', '<|reserved_special_token_27|>', '<|reserved_special_token_28|>', '<|reserved_special_token_29|>', '<|reserved_special_token_30|>', '<|reserved_special_token_31|>', '<|reserved_special_token_32|>', '<|reserved_special_token_33|>', '<|reserved_special_token_34|>', '<|reserved_special_token_35|>', '<|reserved_special_token_36|>', '<|reserved_special_token_37|>', '<|reserved_special_token_38|>', '<|reserved_special_token_39|>', '<|reserved_special_token_40|>', '<|reserved_special_token_41|>', '<|reserved_special_token_42|>', '<|reserved_special_token_43|>', '<|reserved_special_token_44|>', '<|reserved_special_token_45|>', '<|reserved_special_token_46|>', '<|reserved_special_token_47|>', '<|reserved_special_token_48|>', '<|reserved_special_token_49|>', '<|reserved_special_token_50|>', '<|reserved_special_token_51|>', '<|reserved_special_token_52|>', '<|reserved_special_token_53|>', '<|reserved_special_token_54|>', '<|reserved_special_token_55|>', '<|reserved_special_token_56|>', '<|reserved_special_token_57|>', '<|reserved_special_token_58|>', '<|reserved_special_token_59|>', '<|reserved_special_token_60|>', '<|reserved_special_token_61|>', '<|reserved_special_token_62|>', '<|reserved_special_token_63|>', '<|reserved_special_token_64|>', '<|reserved_special_token_65|>', '<|reserved_special_token_66|>', '<|reserved_special_token_67|>', '<|reserved_special_token_68|>', '<|reserved_special_token_69|>', '<|reserved_special_token_70|>', '<|reserved_special_token_71|>', '<|reserved_special_token_72|>', '<|reserved_special_token_73|>', '<|reserved_special_token_74|>', '<|reserved_special_token_75|>', '<|reserved_special_token_76|>', '<|reserved_special_token_77|>', '<|reserved_special_token_78|>', '<|reserved_special_token_79|>', '<|reserved_special_token_80|>', '<|reserved_special_token_81|>', '<|reserved_special_token_82|>', '<|reserved_special_token_83|>', '<|reserved_special_token_84|>', '<|reserved_special_token_85|>', '<|reserved_special_token_86|>', '<|reserved_special_token_87|>', '<|reserved_special_token_88|>', '<|reserved_special_token_89|>', '<|reserved_special_token_90|>', '<|reserved_special_token_91|>', '<|reserved_special_token_92|>', '<|reserved_special_token_93|>', '<|reserved_special_token_94|>', '<|reserved_special_token_95|>', '<|reserved_special_token_96|>', '<|reserved_special_token_97|>', '<|reserved_special_token_98|>', '<|reserved_special_token_99|>', '<|reserved_special_token_100|>', '<|reserved_special_token_101|>', '<|reserved_special_token_102|>', '<|reserved_special_token_103|>', '<|reserved_special_token_104|>', '<|reserved_special_token_105|>', '<|reserved_special_token_106|>', '<|reserved_special_token_107|>', '<|reserved_special_token_108|>', '<|reserved_special_token_109|>', '<|reserved_special_token_110|>', '<|reserved_special_token_111|>', '<|reserved_special_token_112|>', '<|reserved_special_token_113|>', '<|reserved_special_token_114|>', '<|reserved_special_token_115|>', '<|reserved_special_token_116|>', '<|reserved_special_token_117|>', '<|reserved_special_token_118|>', '<|reserved_special_token_119|>', '<|reserved_special_token_120|>', '<|reserved_special_token_121|>', '<|reserved_special_token_122|>', '<|reserved_special_token_123|>', '<|reserved_special_token_124|>', '<|reserved_special_token_125|>', '<|reserved_special_token_126|>', '<|reserved_special_token_127|>', '<|reserved_special_token_128|>', '<|reserved_special_token_129|>', '<|reserved_special_token_130|>', '<|reserved_special_token_131|>', '<|reserved_special_token_132|>', '<|reserved_special_token_133|>', '<|reserved_special_token_134|>', '<|reserved_special_token_135|>', '<|reserved_special_token_136|>', '<|reserved_special_token_137|>', '<|reserved_special_token_138|>', '<|reserved_special_token_139|>', '<|reserved_special_token_140|>', '<|reserved_special_token_141|>', '<|reserved_special_token_142|>', '<|reserved_special_token_143|>', '<|reserved_special_token_144|>', '<|reserved_special_token_145|>', '<|reserved_special_token_146|>', '<|reserved_special_token_147|>', '<|reserved_special_token_148|>', '<|reserved_special_token_149|>', '<|reserved_special_token_150|>', '<|reserved_special_token_151|>', '<|reserved_special_token_152|>', '<|reserved_special_token_153|>', '<|reserved_special_token_154|>', '<|reserved_special_token_155|>', '<|reserved_special_token_156|>', '<|reserved_special_token_157|>', '<|reserved_special_token_158|>', '<|reserved_special_token_159|>', '<|reserved_special_token_160|>', '<|reserved_special_token_161|>', '<|reserved_special_token_162|>', '<|reserved_special_token_163|>', '<|reserved_special_token_164|>', '<|reserved_special_token_165|>', '<|reserved_special_token_166|>', '<|reserved_special_token_167|>', '<|reserved_special_token_168|>', '<|reserved_special_token_169|>', '<|reserved_special_token_170|>', '<|reserved_special_token_171|>', '<|reserved_special_token_172|>', '<|reserved_special_token_173|>', '<|reserved_special_token_174|>', '<|reserved_special_token_175|>', '<|reserved_special_token_176|>', '<|reserved_special_token_177|>', '<|reserved_special_token_178|>', '<|reserved_special_token_179|>', '<|reserved_special_token_180|>', '<|reserved_special_token_181|>', '<|reserved_special_token_182|>', '<|reserved_special_token_183|>', '<|reserved_special_token_184|>', '<|reserved_special_token_185|>', '<|reserved_special_token_186|>', '<|reserved_special_token_187|>', '<|reserved_special_token_188|>', '<|reserved_special_token_189|>', '<|reserved_special_token_190|>', '<|reserved_special_token_191|>', '<|reserved_special_token_192|>', '<|reserved_special_token_193|>', '<|reserved_special_token_194|>', '<|reserved_special_token_195|>', '<|reserved_special_token_196|>', '<|reserved_special_token_197|>', '<|reserved_special_token_198|>', '<|reserved_special_token_199|>', '<|reserved_special_token_200|>', '<|reserved_special_token_201|>', '<|reserved_special_token_202|>', '<|reserved_special_token_203|>', '<|reserved_special_token_204|>', '<|reserved_special_token_205|>', '<|reserved_special_token_206|>', '<|reserved_special_token_207|>', '<|reserved_special_token_208|>', '<|reserved_special_token_209|>', '<|reserved_special_token_210|>', '<|reserved_special_token_211|>', '<|reserved_special_token_212|>', '<|reserved_special_token_213|>', '<|reserved_special_token_214|>', '<|reserved_special_token_215|>', '<|reserved_special_token_216|>', '<|reserved_special_token_217|>', '<|reserved_special_token_218|>', '<|reserved_special_token_219|>', '<|reserved_special_token_220|>', '<|reserved_special_token_221|>', '<|reserved_special_token_222|>', '<|reserved_special_token_223|>', '<|reserved_special_token_224|>', '<|reserved_special_token_225|>', '<|reserved_special_token_226|>']📄 License
The usage of this model must abide by the [Llama 3 Community License](https://huggingface.co/meta - llama/Meta - Llama - 3 - 70B/blob/main/LICENSE).

