TUKE-KEMT
/

slavic-t5-base

Text2Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Edit model card

Slavic T5 Base

Aim of this model is to reach the best results for the Slavic laguages with Latin script.

It is suitable for tasks such as:

summarization,
extractive question answering,
machine translation between slavic languages in Latin script.

The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.

It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian,

Vocabulary has 120 000 tokens, contains capital letters.

Downloads last month: 1

Safetensors

Model size

383M params

Tensor type

F32

·

Datasets used to train TUKE-KEMT/slavic-t5-base