Hybrid Hashtag Sub-Corpus
The Hybrid Hashtag Sub-Corpus (HH Corpus) is a subset of tweets in the MLT corpus containing hashtags made up of Māori and English words (so-called "hybrid hashtags"). There are 81 hybrid hashtags in this dataset, used in 5,684 tweets and posted to Twitter by 3,771 distinct users.
Download the HH Corpus
Click to download the HH Sub-Corpus.
Citing the HH Corpus
If you use the Hybrid Hashtag Corpus, please cite the following paper:
Trye, D., Calude, A. S., Bravo-Marquez, F., & Keegan, T. T. (2020). Hybrid Hashtags: #YouKnowYoureAKiwiWhen Your Tweet Contains Māori and English. Front. Artif. Intell.3:15. doi:10.3389/frai.2020.00015.