Usage Guide
This section explains how to run the evaluation script, configure model selection, and generate reports for translation model performance.
The main entry point for the framework is main.py. It handles configuration loading, dataset preparation, translation execution, and report generation.
Running the Evaluation
To start the evaluation, run the following command from the project root:
python main.py
This will:
- Load the language mappings from
configs/language_mappings.yaml - Instantiate a
TranslationEvaluatorobject - Evaluate each model listed in
MODELS_TO_EVALUATEacross supported language pairs - Generate a
.csvreport for each model-language pair in thereport/directory
Configuration Overview
All paths and evaluation parameters are defined in config.py
MODELS_TO_EVALUATE = ["mbart50", "nllb", "mistral"]
MODEL_REGISTRY = {
"nllb": NllbTranslator,
"mbart50": MBartTranslator,
"mistral": partial(LLMTranslator, model_name="mistral:7b")
}
DATASET_NAME = "wmt19"
DATASET_SPLIT = "train[:1000]"
OUTPUT_DIR = Path("reports")
LANGUAGE_MAPPING_PATH = Path("configs/language_mappings.yaml")
To evaluate different models or use a smaller dataset split, update these values before running main.py.
What Happens Internally
Load Language Mappings
mappings = load_language_mappings(LANGUAGE_MAPPING_PATH)
Loads a YAML file that maps each language pair to the model-specific source and target language codes.
Loop Through Models and Language Pairs
Each model in MODELS_TO_EVALUATE is evaluated on each language pair for which it has a mapping:
for model_name in models:
for lang_pair in mappings:
if model_name in mappings[lang_pair]:
...
Load Dataset
Translations and reference texts are pulled from Hugging Face datasets:
ds = load_dataset(dataset_name, lang_pair, split=split)
inputs = [ex["translation"][src_lang] for ex in ds]
refs = [ex["translation"][tgt_lang] for ex in ds]
Initialize Translator
Each translator class is dynamically loaded from MODEL_REGISTRY:
translator_cls = MODEL_REGISTRY[model_name]
translator = translator_cls(source_lang=src_code, target_lang=tgt_code)
Evaluate Translations
The TranslationEvaluator handles translation and scoring:
evaluator.register_model(model_id, translator)
evaluator.evaluate(inputs, refs, model_names=[model_id])
Generate Report
Results are saved as CSV files in the output directory:
report_file = output_dir / f"{model_id}_report.csv"
evaluator.generate_report(report_file, models=model_id)
Output Files
After a successful run, the following will be available:
reports/
├── mbart50_de-en_report.csv
├── nllb_fi-en_report.csv
├── mistral_ru-en_report.csv
...
Common Issues
| Issue | Explanation |
|---|---|
KeyError: 'language_mappings' |
Check the YAML file formatting and top-level key. |
FileNotFoundError |
Make sure the paths in config.py are correct. |
Unknown model 'xyz' |
Ensure all listed models exist in MODEL_REGISTRY. |
| Hugging Face dataset error | The dataset or language pair may not be supported. Try a different one or check availability on huggingface.co/datasets. |