UniMorph 4.0: Universal Morphology
Published in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation, and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements on several fronts that were made in the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 66 new languages, including 24 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g., missing gender and macrons information. We have amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive.In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.
@inproceedings{batsuren-etal-2022-unimorph,
author = {
Khuyagbaatar Batsuren and
Omer Goldman and
Salam Khalifa and
Nizar Habash and
Witold Kieraś and
Gábor Bella and
Brian Leonard and
Garrett Nicolai and
Kyle Gorman and
Yustinus Ghanggo Ate and
Maria Ryskina and
Sabrina Mielke and
Elena Budianskaya and
Charbel El-Khaissi and
Tiago Pimentel and
Michael Gasser and
William Abbott Lane and
Mohit Raj and
Matt Coler and
Jaime Rafael Montoya Samame and
Delio Siticonatzi Camaiteri and
Esaú Zumaeta Rojas and
Didier López Francis and
Arturo Oncevay and
Juan López Bautista and
Gema Celeste Silva Villegas and
Lucas Torroba Hennigen and
Adam Ek and
David Guriel and
Peter Dirix and
Jean-Philippe Bernardy and
Andrey Scherbakov and
Aziyana Bayyr-ool and
Antonios Anastasopoulos and
Roberto Zariquiey and
Karina Sheifer and
Sofya Ganieva and
Hilaria Cruz and
Ritván Karahóǧa and
Stella Markantonatou and
George Pavlidis and
Matvey Plugaryov and
Elena Klyachko and
Ali Salehi and
Candy Angulo and
Jatayu Baxi and
Andrew Krizhanovsky and
Natalia Krizhanovskaya and
Elizabeth Salesky and
Clara Vania and
Sardana Ivanova and
Jennifer White and
Rowan Hall Maudslay and
Josef Valvoda and
Ran Zmigrod and
Paula Czarnowska and
Irene Nikkarinen and
Aelita Salchak and
Brijesh Bhatt and
Christopher Straughn and
Zoey Liu and
Jonathan North Washington and
Yuval Pinter and
Duygu Ataman and
Marcin Wolinski and
Totok Suhardijanto and
Anna Yablonskaya and
Niklas Stoehr and
Hossep Dolatian and
Zahroh Nuriah and
Shyam Ratan and
Francis M. Tyers and
Edoardo M. Ponti and
Grant Aiton and
Aryaman Arora and
Richard J. Hatcher and
Ritesh Kumar and
Jeremiah Young and
Daria Rodionova and
Anastasia Yemelina and
Taras Andrushko and
Igor Marchenko and
Polina Mashkovtseva and
Alexandra Serova and
Emily Prud’hommeaux and
Maria Nepomniashchaya and
Fausto Giunchiglia and
Eleanor Chodroff and
Mans Hulden and
Miikka Silfverberg and
Arya D. McCarthy and
David Yarowsky and
Ryan Cotterell and
Reut Tsarfaty and
Ekaterina Vylomova
},
booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
title = {UniMorph 4.0: Universal Morphology},
year = {2022},
url = {https://aclanthology.org/2022.lrec-1.89/},
pages = {840--855},
}