Skip to main content
EU4Digital’s parallel text project lays foundations for improved translation tools
[featured_image_copyright]
August 4, 2025

EU4Digital’s parallel text project lays foundations for improved translation tools


[post_audio_speech]

The EU4Digital Initiative has laid the foundations for improved machine translation of languages of selected EU candidate countries, to support their digital and legal integration into the broader European ecosystem. The ‘Parallel Text Corpora Inventory in EU Candidate Countries’ project concluded in July 2025, delivering an inventory, report and recommendations to strengthen multilingual digital tools that bridge language and data gaps for the Romanian (for Moldova) and Ukrainian languages, as well as for Albanian, Bosnian, Montenegrin, Macedonian, and Serbian.

The project aims to improve an eTranslation tool developed by the European Commission and managed by the Directorate-General for Translation (DGT). The EU translators use this tool while translating legal and regulatory texts between EU Member State languages. It is essential for ensuring legal certainty and effective implementation of EU law across Member States and avoiding confusion, disputes or hindrance of the Single Market. 

“Expanding the EU eTranslation tool to include the languages of candidate countries would be especially valuable in supporting their legislative alignment with the EU,” EU4Digital says in a press release. 

It also explains that developing and training high-quality machine translation systems, especially ones tailored for legal and administrative content, requires ‘parallel text corpora’ – large collections of texts that are translated between two or more languages and aligned sentence-by-sentence in a machine-readable format.

In the Parallel Text Corpora Inventory project launched in December 2024, EU4Digital identified, documented and summarised sources of parallel text data, focusing on the legal and administrative domains, and engaged with a wide range of stakeholders across candidate countries. These stakeholders included academic institutions, government bodies, technology associations, and private companies working with large language model (LLM) technologies.

The research highlighted the importance of understanding linguistic nuances when assessing language resource needs in candidate countries. In Moldova, for example, the official language is Romanian, which is already supported by the eTranslation tool. However, slight regional variations, particularly in legal terminology, may require targeted adaptation. In this case, introducing a Moldova-specific legal terminology dictionary could be a sufficient solution, rather than developing a separate language model.

The EU4Digital Initiative’s ‘Parallel Text Corpora Inventory in EU Candidate Countries’ project builds upon and complements other efforts, including the EU4Digital Facility’s ICT Innovation and Start-up Ecosystem stream of activities.

Find out more

Press release



MOST READ

[popular_posts columns_xl=”4″ columns_l=”4″ columns_m=”3″]


[related_news]
[yea_euprojectshortlist]

SEE ALSO

[posts_by_post_tax]

Interested in the latest news and opportunities?

This website is managed by the EU-funded Regional Communication Programme for the Eastern Neighbourhood ('EU NEIGHBOURS east’), which complements and supports the communication of the Delegations of the European Union in the Eastern partner countries, and works under the guidance of the European Commission’s Directorate-General for Enlargement and Eastern Neighbourhood, and the European External Action Service. EU NEIGHBOURS east is implemented by a GOPA PACE-led consortium..


The information on this site is subject to a Disclaimer and Protection of personal data. © European Union,