Skip to content

TranslationEvaluator

TranslationEvaluator

Evaluate translation models against reference translations.

This class orchestrates evaluation of multiple translation models computing metrics like BLEU and METEOR on provided datasets. Results can be retrieved programmatically or saved as reports.

Typical usage

evaluator = TranslationEvaluator() evaluator.register_model('model1', translator1) results = evaluator.evaluate( inputs, references, model_names=['model1']) evaluator.generate_report('results.csv', models=['model1'])

__init__()

Initialize the evaluator with default metrics.

evaluate(inputs, references, model_names=None)

Evaluate registered models on inputs against reference translations.

Parameters:

Name Type Description Default
inputs List[str]

List of source texts to translate.

required
references List[str]

List of corresponding reference translations.

required
model_names Optional[List[str]]

Subset of registered model names to evaluate. If None, evaluates all registered models.

None

Returns:

Type Description
Dict[str, Dict[str, float]]

Dict[str, Dict[str, float]]: Mapping from model name to a dict of metric scores.

Raises:

Type Description
ValueError

If inputs and references lengths differ.

KeyError

If a specified model or metric is not registered.

generate_report(file_path, models=None)

Save and print evaluation report for BLEU & METEOR.

Parameters:

Name Type Description Default
file_path Union[str, Path]

Destination CSV file path.

required
models Optional[Union[str, List[str]]]

Single model name, list of names, or None for all.

None

Returns:

Type Description
None

None.

Raises:

Type Description
ValueError

If no evaluation results are available.

register_model(name, model)

Register a translation model for evaluation.

Parameters:

Name Type Description Default
name str

Unique identifier for the model.

required
model BaseTranslator

Instance implementing BaseTranslator interface.

required

Raises:

Type Description
ValueError

If name is empty or model is None.