This Monday, November 4th, we have Matthew Darling presenting his research on “Automatic morphological segmentation of indigenous languages”:
Abstract: Over the summer, I worked with a research group at the University of Alberta working to develop software that identifies cognates. Cognates are words in related languages that descend from an ancestor language – and knowing which cognates are shared by a group of languages can tell us how they’re related. However, suffixes and prefixes add a lot of noise to the results – the English words “underpay” and “undersell” have nothing to do with each other, but share a common prefix. Using a set of indigenous languages from Mexico, I tested a number of unsupervised morphological segmentation programs – programs that try to identify the morphemes of a language using only corpora. I also experimented with mapping an existing gold standard segmentation for one language to segment words in related languages.
We will meet at the same time and date as usual, i.e., at 2:30pm in VSSM5220. We look forward to seeing everyone there, with game-faces on!