Musicmetric’s Sentiment Analysis v1.0 Beta

JAN. 31
2010

Today we are going to introduce to you another piece of technology we have developed at Musicmetric. As you may know, parts of our product are driven by semantic analysis; we don’t just tell you how many people are talking about your artists, but also their opinions, the sentiment and common topics surrounding them. How do we do this? Sentiment analysis is a challenging problem that still has not been solved completely. Many so-called sentiment analysis systems use a very naive method to detect sentiment in a context, i.e. using key words or very basic sentence decomposition. However, human language is not that simple, so these approaches fail to capture irony, sarcasm, slang and other idiomatic expressions.

Our methods are much more advanced than simple word detection. We have implemented a set of machine learning models that can be trained with different corpora (contexts) so they work well for general language but are also much more accurate for the pre defined contexts – for example, professionally written articles, fan comments and tweets are all different contexts and therefore have different sentiment analysis models trained for each one. Using this approach allows our model to get more and more intelligent as we keep downloading data to retrain it frequently. The accuracy of our method is shown in the confusion matrices below:

Musicmetric polarity confusion matrix

Musicmetric polarity confusion matrix


So what does this matrix mean? This confusion matrix tells how many percent the system is confusing two classes (i.e. mislabeling one as another). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class (yellow hightlighted ones are correct predictions). For example, we can see that 16% of neutral reviews are predicted as negative but only 2% of negative reviews are predicted as neutral or positive. Notably we have 96% of negative reviews are predicted correctly.


The second confusion matrix is a breakdown for score from 1-5 (3 is neutral):

Confusion matrix for score from 1 to 5

Similarly we can see that 90% of reviews with scores of 2 are predicted correctly while 5% of them are predicted as 3 and none of them are predicted as 5. The below table shows the numbers of reviews we evaluated:

Number of evaluated samples

Number of evaluated samples


The third confusion matrix is a break down for score from 1-10 (Note: this test data is different from above ones and does not include any reviews which have scores less than 5):

Musicmetric sentiment confusion matrix for score from 1-10

Musicmetric sentiment confusion matrix for score from 1-10


We think that is enough of our talking. It is now your turn to see how it works by playing around with the interface to our general purpose music sentiment analysis engine below:

Note: Our sentiment analysis usually works better for longer reviews or paragraphs rather than single short sentences, and definitely works better for music related topics. Try pasting in an album review.



Tags: , , ,

Leave a Reply