Abstract
The classification of music genres has been studied using various auditory, linguistic, and metadata features. Classification using linguistic features typically results in lower accuracy than classifiers built with auditory features. In this paper, we hand-craft features unused in previous lyrical classifiers such as rhyme density, readability, and the occurrence of profanity. We use these features to train traditional machine learning models for lyrical classification across nine popular music genres and compare their performance. The features that contribute the most towards this classification problem, and the genres that are easiest to predict, are identified. The experiments are conducted on a set of over 20,000 lyrics. A final accuracy of 56.14% was achieved when predicting across the nine genres, improving upon accuracies obtained in previous studies.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2021 Curtis Thompson