Twitter Filtering
In this blog we’re going to show you an important feature that helps distinguish the quality of data supplied by musicmetric: The ability to disambiguate whether mentions of an artist with a common word as their name are in fact referring to the artist. Likewise, distinguishing between two artists that have the same name.
These methods are applicable to any text based data, but for this example we’ll take a look at Twitter.
Musicmetric collects all mentions of an artist on Twitter. Taking an example of the rock band Oasis, we collects tweets in the following 3 categories:
- name mentions: “Oasis”
- replies: “@Oasis”
- retweets: “RT @Oasis”
If the artist does not have a twitter ID, we still track their name mentions – and we are currently tracking over 500,000 artists.
It is obvious that all replies and retweets are definitely relevant to the band but some name mentions are probably not. When people post a tweet which includes the word “Oasis”, they might mean Oasis rock band, an isolated area of vegetation and water in a desert or just a name of a random bar or restaurant. Therefore it would be naive to collect tweets without filtering them because this trend data would not reflect the real popularity of the band Oasis on Twitter.
These name mentions are important since a lot of the time people will not cite the @username of the artist when referring to them on twitter (as can be seen in the examples below) and of course, not all bands even have a twitter ID.
At musicmetric, we have developed proprietary algorithms to deal with irrelevant tweets effectively. We analyse all tweets and successfully filter out irrelevant messages by assigning a probability that the tweet is relevant to that particular artist.
The table below shows a good example of our algorithm’s efficiency:

Even though there are still few irrelevant tweets (highlighted red) and some vague tweets which we can not tell whether they are relevant or not (highlighted blue), the accuracy has been improved a lot in comparison to the raw data. Currently for bands or artists who have very common names like Oasis, our model can filter up to 70%-80% of irrelevant tweets. For bands or artists who have distinct names like Lady Gaga or Robbie Williams, the model can filter up to 95%-100% of irrelevant tweets.
The chart below shows the number of tweets mentioning Oasis per hour before and after being filtered. You can see a big difference and that is why the filter is very important.
We are still collecting more data and adding more valuable information to our model. Therefore it is expected to work more and more accurately – it learns as it goes, and it can read 96 Million tweets per day, so it learns very quickly.
Why not check some live stats for your bands by registering for a musicmetric Essentials trial?
Trung
