Skip to content

Model Implementation

BaseTranslator

Abstract base class for translator models. Ensures a consistent interface for all translators, whether local LLM-based or via an API.

translate(text) abstractmethod

Translate a single piece of text from the source language to the target language.

MBartTranslator

Translator using Hugging Face’s mBART-50 model for multilingual translation.

__init__(source_lang, target_lang, device='cpu', max_length=512, num_beams=4, tokenizer_kwargs=None, model_kwargs=None)

Initialize the mBART-50 translator.

Parameters:

Name Type Description Default
source_lang str

Source language code (e.g. "en_XX").

required
target_lang str

Target language code (e.g. "de_DE").

required
device Union[str, device]

"cpu", "cuda", or a torch.device. Defaults to "cpu".

'cpu'
max_length int

Maximum length of generated sequences. Defaults to 512.

512
num_beams int

Number of beams for beam‐search. Defaults to 4.

4
tokenizer_kwargs Optional[Dict[str, Any]]

Extra kwargs for MBart50TokenizerFast.from_pretrained. Defaults to None.

None
model_kwargs Optional[Dict[str, Any]]

Extra kwargs for MBartForConditionalGeneration.from_pretrained. Defaults to None.

None

Raises:

Type Description
ValueError

If source_lang or target_lang are empty, or if max_length or num_beams are not positive.

translate(text)

Translate a single sentence using the mBART-50 model.

Parameters:

Name Type Description Default
text str

The input sentence in the source language.

required

Returns:

Name Type Description
str str

Translated sentence.

Raises:

Type Description
TranslationError

If text is empty or generation fails.

LLMTranslator

Translator that drives any Ollama‑hosted model via the Ollama Python client.

__init__(model_name='llama3.1:8b', num_predict=512, source_lang='English', target_lang='German', stop=None, client=None, prompt_template=None)

Initialize an Ollama-based translator.

Parameters:

Name Type Description Default
model_name str

Ollama model ID (e.g. "llama3.1:8b").

'llama3.1:8b'
num_predict int

Maximum number of tokens to predict per call.

512
source_lang str

Name of the source language.

'English'
target_lang str

Name of the target language.

'German'
stop Optional[List[str]]

Optional list of stop sequences; defaults to ["—"].

None
client Optional[Client]

Optional pre-configured Ollama Client; if None, constructs a new one.

None
prompt_template Optional[str]

Optional prompt template with placeholders {source_lang}, {target_lang}, {text}. If None uses DEFAULT_TEMPLATE.

None

Raises:

Type Description
ValueError

If model_name, source_lang, or target_lang are empty, or if num_predict is not positive.

translate(text)

Translate the given text via the Ollama LLM.

Parameters:

Name Type Description Default
text str

A single sentence to translate.

required

Returns:

Name Type Description
str str

The translated sentence (stripped of surrounding whitespace).

Raises:

Type Description
TranslationError

On any failure from the Ollama client.