Tweeting te reo Māori

Sarah Putt, Contributor. 14 February 2019, 8:06 am

While most of us have been enjoying the long hot summer, researchers at Waikato University have been analysing millions of tweets to determine how te reo Māori is being used on the social media site.

Computing and Mathematical Sciences student David Trye and his supervisors Dr Andreea Calude and  Dr Felipe Bravo Márquez, are assessing 1.2 million tweets. Which seems like a huge number until you learn that they started with an original group of 8 million, before weeding out the mentions that didn't apply. The team are focused on 77 Māori loanwords, which are described as te reo Māori words used in an English context. Some popular examples are 'whanau' and 'kia ora'.

"The initial 8-million tweets contained a fair bit of distracting data 'noise'," says Tryre. "The irrelevant tweets were those not used in a New Zealand English context, or were otherwise unrelated. For example, Kiwi is the name of a song by Harry Styles, so people will tweet things like 'listening to Kiwi'. And Moana can turn up as a Disney princess rather than the sea."

First they manually coded about four thousand tweets, then they trained a machine learning model to weed out the irrelevant ones.The next step was to use a machine learning technique invented by Google called Word2Vec to automatically extract the meaning of words according to their context, Márquez says.

"The technique was invented a few years ago. But it was a huge revolution in the area of computational linguistics, or the use of computers to process human language. It was the first algorithm to do it in an efficient way."

Having done the technical mahi, now comes the exciting part - interrogating the data set. Top the list of questions is finding out who and why people are tweeting Māori words - is it fluent te reo speakers, is it people learning te reo and/or is it that these 77 'loanwords' have become a core part of New Zealand English?

What's especially interesting about the study is that it is looking at how language is used in context, and this may be different to a face-to-face conservation or other written forums.

As Calude explains. "In a dictionary you tend to have what the word means, abstractly out of context, or with a synonym or two. But here you have more of a network of related words, which may not have the same meaning but seem to occur in the same contexts. With whakapapa you have obvious words but also words like maunga, so it's not the meaning of the word, but it's like the baggage of the word that comes with it."

Calude has previous worked on research looking at Māori loanwords in newspapers, and she has already notice a difference with Twitter use. "By comparison the words are more integrated. There's more language mixing: full sections of Māori and full phrases in English together. So, it's more like code switching, which is what bilinguals do. We might be looking at almost a different phenomenon of how Māori is used in twitter."

The researchers are sharing their knowledge on the open-source platform Github. They're adding to the website as they go along, so it is a growing resource. You can check it out here.  


