EU4Digital’s parallel text project lays foundations for improved translation tools

[post_audio_speech]

The EU4Digital Initiative has laid the foundations for improved machine translation of languages of selected EU candidate countries, to support their digital and legal integration into the broader European ecosystem. The ‘Parallel Text Corpora Inventory in EU Candidate Countries’ project concluded in July 2025, delivering an inventory, report and recommendations to strengthen multilingual digital tools that bridge language and data gaps for the Romanian (for Moldova) and Ukrainian languages, as well as for Albanian, Bosnian, Montenegrin, Macedonian, and Serbian.

The project aims to improve an eTranslation tool developed by the European Commission and managed by the Directorate-General for Translation (DGT). The EU translators use this tool while translating legal and regulatory texts between EU Member State languages. It is essential for ensuring legal certainty and effective implementation of EU law across Member States and avoiding confusion, disputes or hindrance of the Single Market.

“Expanding the EU eTranslation tool to include the languages of candidate countries would be especially valuable in supporting their legislative alignment with the EU,” EU4Digital says in a press release.

It also explains that developing and training high-quality machine translation systems, especially ones tailored for legal and administrative content, requires ‘parallel text corpora’ – large collections of texts that are translated between two or more languages and aligned sentence-by-sentence in a machine-readable format.

In the Parallel Text Corpora Inventory project launched in December 2024, EU4Digital identified, documented and summarised sources of parallel text data, focusing on the legal and administrative domains, and engaged with a wide range of stakeholders across candidate countries. These stakeholders included academic institutions, government bodies, technology associations, and private companies working with large language model (LLM) technologies.

The research highlighted the importance of understanding linguistic nuances when assessing language resource needs in candidate countries. In Moldova, for example, the official language is Romanian, which is already supported by the eTranslation tool. However, slight regional variations, particularly in legal terminology, may require targeted adaptation. In this case, introducing a Moldova-specific legal terminology dictionary could be a sufficient solution, rather than developing a separate language model.

The EU4Digital Initiative’s ‘Parallel Text Corpora Inventory in EU Candidate Countries’ project builds upon and complements other efforts, including the EU4Digital Facility’s ICT Innovation and Start-up Ecosystem stream of activities.

Find out more

Press release

EU accession, EU4Digital

EU4Digital’s parallel text project lays foundations for improved translation tools

MOST READ

SEE ALSO