Skip to content

Models for Non-LLM translations

MBARTTranslator

MBART model for translation. It supports many-to-many translation across multiple languages and is locally usable.

__init__(target_lang, source_lang=None, device='cuda' if torch.cuda.is_available() else 'cpu', max_length=512, num_beams=4, tokenizer_kwargs=None, model_kwargs=None)

Initializes the MBARTTranslator.

All parameters are used to configure the underlying Hugging Face model and tokenizer as defined in the HuggingFaceTranslator base class.

Parameters:

Name Type Description Default
target_lang str

The target language code for translation (e.g., 'de_DE' for German, using MBART specific codes if applicable, or generic codes if CODE_MAPPER handles conversion).

required
source_lang Optional[str]

The source language code for translation (e.g., 'en_XX' for English). If not provided, the MBART model's default behavior for source language detection applies.

None
device Optional[Union[str, device]]

The device (e.g., "cpu", "cuda", "mps") on which the MBART model and tokenizer will be loaded. Defaults to "cuda" if a CUDA-enabled GPU is available, otherwise "cpu".

'cuda' if is_available() else 'cpu'
max_length Optional[int]

The maximum sequence length for generated translations by MBART. Defaults to 512.

512
num_beams Optional[int]

The number of beams for beam search decoding with MBART. Defaults to 4.

4
tokenizer_kwargs Optional[Dict[str, Any]]

Additional keyword arguments for the MBART tokenizer. Defaults to None.

None
model_kwargs Optional[Dict[str, Any]]

Additional keyword arguments for the MBART model. Defaults to None.

None

detect_language(text)

Detects the language of the given text using langdetect.

Parameters:

Name Type Description Default
text str

The text whose language is to be detected.

required

Returns:

Type Description
str

The detected language code (e.g., 'en', 'fr').

Raises:

Type Description
ValueError

If the text is empty or invalid for detection.

LangDetectException

If language detection by the langdetect library fails for other reasons (e.g., text too short, no features).

ValueError

If the detected language is not in self.LANGUAGE_CODES.

translate(text)

Translate the input text.

Parameters:

Name Type Description Default
text str

The text to translate.

required

Returns:

Name Type Description
str str

The translated text.

translate_batch(texts)

Translate a batch of texts from source language to target language.

Parameters:

Name Type Description Default
texts list

A list of texts to be translated.

required

Returns:

Name Type Description
list list

A list of translated texts.