Hybrid Hashtag Sub-Corpus

The Hybrid Hashtag Sub-Corpus (HH Corpus) is a subset of tweets in the MLT corpus containing hashtags made up of Māori and English words (so-called "hybrid hashtags"). There are 81 hybrid hashtags in this dataset, used in 5,684 tweets and posted to Twitter by 3,771 distinct users.

Download the HH Corpus

Click to download the HH Sub-Corpus.

Citing the HH Corpus

If you use the Hybrid Hashtag Corpus, please cite the following paper:

Trye, D., Calude, A. S., Bravo-Marquez, F., & Keegan, T. T. (2020). Hybrid Hashtags: #YouKnowYoureAKiwiWhen Your Tweet Contains Māori and English. Front. Artif. Intell.3:15. doi:10.3389/frai.2020.00015.