UniMorph 4.0: Universal Morphology

Published in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Find paper here

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation, and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements on several fronts that were made in the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 66 new languages, including 24 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g., missing gender and macrons information. We have amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive.In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.

@inproceedings{batsuren-etal-2022-unimorph,
    author = {
        Khuyagbaatar Batsuren and
        Omer Goldman and
        Salam Khalifa and
        Nizar Habash and
        Witold Kieraś and
        Gábor Bella and
        Brian Leonard and
        Garrett Nicolai and
        Kyle Gorman and
        Yustinus Ghanggo Ate and
        Maria Ryskina and
        Sabrina Mielke and
        Elena Budianskaya and
        Charbel El-Khaissi and
        Tiago Pimentel and
        Michael Gasser and
        William Abbott Lane and
        Mohit Raj and
        Matt Coler and
        Jaime Rafael Montoya Samame and
        Delio Siticonatzi Camaiteri and
        Esaú Zumaeta Rojas and
        Didier López Francis and
        Arturo Oncevay and
        Juan López Bautista and
        Gema Celeste Silva Villegas and
        Lucas Torroba Hennigen and
        Adam Ek and
        David Guriel and
        Peter Dirix and
        Jean-Philippe Bernardy and
        Andrey Scherbakov and
        Aziyana Bayyr-ool and
        Antonios Anastasopoulos and
        Roberto Zariquiey and
        Karina Sheifer and
        Sofya Ganieva and
        Hilaria Cruz and
        Ritván Karahóǧa and
        Stella Markantonatou and
        George Pavlidis and
        Matvey Plugaryov and
        Elena Klyachko and
        Ali Salehi and
        Candy Angulo and
        Jatayu Baxi and
        Andrew Krizhanovsky and
        Natalia Krizhanovskaya and
        Elizabeth Salesky and
        Clara Vania and
        Sardana Ivanova and
        Jennifer White and
        Rowan Hall Maudslay and
        Josef Valvoda and
        Ran Zmigrod and
        Paula Czarnowska and
        Irene Nikkarinen and
        Aelita Salchak and
        Brijesh Bhatt and
        Christopher Straughn and
        Zoey Liu and
        Jonathan North Washington and
        Yuval Pinter and
        Duygu Ataman and
        Marcin Wolinski and
        Totok Suhardijanto and
        Anna Yablonskaya and
        Niklas Stoehr and
        Hossep Dolatian and
        Zahroh Nuriah and
        Shyam Ratan and
        Francis M. Tyers and
        Edoardo M. Ponti and
        Grant Aiton and
        Aryaman Arora and
        Richard J. Hatcher and
        Ritesh Kumar and
        Jeremiah Young and
        Daria Rodionova and
        Anastasia Yemelina and
        Taras Andrushko and
        Igor Marchenko and
        Polina Mashkovtseva and
        Alexandra Serova and
        Emily Prud’hommeaux and
        Maria Nepomniashchaya and
        Fausto Giunchiglia and
        Eleanor Chodroff and
        Mans Hulden and
        Miikka Silfverberg and
        Arya D. McCarthy and
        David Yarowsky and
        Ryan Cotterell and
        Reut Tsarfaty and
        Ekaterina Vylomova
    },
    booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
    title = {UniMorph 4.0: Universal Morphology},
    year = {2022},
    url = {https://aclanthology.org/2022.lrec-1.89/},
    pages = {840--855},
}