🚀 OpenHathi-7B English to Hinglish Translation Model
This model is designed for English to Hinglish translation, leveraging the power of a merged LoRA trained on specific datasets. It offers a practical solution for language translation tasks, with potential for further improvement.
🚀 Quick Start
You can start using this model by following the steps in the sample code below.
✨ Features
- Enhanced Translation: A merge of LoRA trained on an English to Hinglish translation dataset by NATERAW on llama2 - 7b and OPENHATHI - 7B - BASE. The inclusion of OpenHathi, which has more Hindi data in its pretraining compared to llama2, significantly improves the translation quality.
- Prompt Template: Utilize the prompt template provided by nateraw for seamless translation.
💻 Usage Examples
Basic Usage
from transformers import LlamaForCausalLM, AutoTokenizer
import torch
device = "cuda:0"
tokenizer = AutoTokenizer.from_pretrained('akashgoel-id/OpenHathi-7B-English-to-Hinglish')
model = LlamaForCausalLM.from_pretrained('akashgoel-id/OpenHathi-7B-English-to-Hinglish', torch_dtype=torch.bfloat16).to(device)
PROMPT_TEMPLATE = (
f"Translate from english to hinglish:\n{{en}}\n---\nTranslation:\n"
)
while True:
userInput = input("Enter input: ")
prompt = PROMPT_TEMPLATE.format(en=userInput)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
generate_ids = model.generate(inputs.input_ids, max_length=500)
print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
📚 Documentation
Prompting
You can use the following prompt template provided by nateraw:
"Translate from english to hinglish:\n{{en}}\n---\nTranslation:\n"
Limitations
The model has some limitations, especially when dealing with idioms:
Input |
Response |
Evaluation |
When it rains, it pours |
Jab baarish hoti hai, to baarish hoti hai |
This translation is quite literal and misses the idiomatic meaning. The original phrase implies that when something happens, it often happens to an extreme degree. The provided translation simply states that "when it rains, it rains," which doesn't capture the essence of the idiom. |
Don't count your chickens before they hatch |
Apne murgon ko ande se pahle na ginein |
This is a reasonable translation of the idiom, maintaining the metaphorical meaning about not making plans based on assumptions of uncertain future events. |
Biting off more than you can chew |
Aap jo chaba sakte hain usse adhik kaatna |
This translation captures the literal aspect of biting and chewing but may not fully convey the idiomatic sense of taking on a task that is too big or difficult to handle. |
The ball is in your court |
Gend aapke court mein hai |
This translation effectively communicates the meaning of the idiom, which is about it being someone else's turn to make a decision or take an action. |
Beating around the bush |
Bush ke chaaron or peetna |
This is a literal translation and doesn't quite capture the idiomatic meaning of avoiding the main point or not speaking directly about a subject. The phrase "Ghumaphira ke baat karna" would be more appropriate. |
Next steps
- Reduce Censorship: The model seems to be highly censored due to its use of llama2. The next step is to remove some of the censorship by fine - tuning on more uncensored data, similar to what WizardLM has done for llama2.
- Finetune on Idioms: Fine - tune the model on idioms to improve its performance in idiomatic translations.
