Elhuyar Language and Technology Unit in Macedonia, at the PARHIJO/ENeL workshop

Elhuyar Language and Technology Unit in Macedonia, at the PARHIJO/ENeL workshop

2016 | April 11

Antton Gurrutxaga, member of the Language and Technology Unit of Elhuyar, attended a workshop on multi-word units in Macedonia on 5 and 6 April (PARHIJO/ENeL workshop on MWE e-lexicons). The representative of the Ixa Group of the UPV/EHU, Uxoa Iñurrieta, participated in the workshop that has joined two European projects:

- ENeL - European Network of e-lexicography (http://www.elexicography.eu/)

- PARHIJO - Parsing and Multi-word Expressions (http://typo.uni-constanz.de/parhijo/)

The set of multi-word units is very broad, encompassing very varied linguistic units, such as locutions or idiomatic expressions (taking the hair, removing the beans from the pot), collocations (drawing attention, establishing intimate friendship), multi-word terms (solar wind, e-book, necessary condition, high-speed train) and expressions (what to see, learn that).

Such units are of great importance in areas such as language teaching and translation. In fact, they cannot be created by a free combination of components, nor in most cases translate components. Therefore, it is necessary to introduce them into dictionaries and integrate them into automatic language processing tools.

The aim of the workshop was to foster knowledge exchange and collaboration between researchers in lexicography and language processing, and to analyze how to meet and solve the needs and problems of each area through the resources and tools of the other.

Elhuyar works in these two fields: on the one hand, he edits dictionaries and creates lexicographic and terminological resources; on the other, since 2002 he has been working on research in language processing and has developed tools for the automatic extraction of multi-word units of texts. For example:

- The Itzulterm service allows to extract the pairs of bilingual terms that have been used in a collection of bilingual texts, that is, in a parallel corpus.

- Using the technology developed in the Konastigarraga project you can obtain typical or significant combinations of a word. An application case is the Word Combinations section of the Elhuyar Web corpus portal. The most detailed information of this research work is found in the following article: Gurrutxaga, A., Alegria, I & Artola, X. (2016) "Automatic characterization of idiomaticity: combinations name+verb" in EKAIA - Scientific and technological journal of the University of the Basque Country. 10 episodes of Basque Theses. UPV/EHU.

Contact

Irune Bengoetxea Lanberri

Komunikazio-arduraduna

688676151

943363040 (Ext. 301)