In this blog post we’re going to look at an example of some of the data mining and large scale analysis which we do at musicmetric, detecting patterns and similarities in time series data.
One use of this analysis is that given an artist, we can find another artist with the closest trend in some variable over time – for example MySpace plays per hour. Alternatively we could generate a list of artists who are increasing in popularity in a certain way, or show which artists have had a brief surge in activity – maybe caused an album release or gig.
Because we store all the data indefinitely and in such a way that we can access it very rapidly, we can run regular batch analysis on the contents of our data warehouse to unlock interesting information.
In this example, we will compare the play count time series data for the top 20,000 artists by total plays on MySpace. It is important to consider that some trends may follow each other with a time lag, so we compare the 20K time series at multiple time lags from 0 to 30 days in the past, in 1 day increments. This means the approximate number of time series comparisons our analysis servers must do for this particular problem is 6 Billion, each one comparing hourly resolution data over a period of 4 months.
Let’s take a look at which artist has a similar trend to Kings of Leon:

Kings of Leon and The Fray - MySpace Plays Per Hour
We can see the plays per hour for The Fray seem to be following a similar long term trend to that of Kings of Leon, but offset by the difference in their popularity on MySpace – although they are converging as time goes on. The peaks and troughs also line up, so clearly the fine resolution hourly variation in the data has something to do with the overall use of MySpace at any period in time, not just the popularity of the artist. This is something that can be seen over most MySpace data.
Now let’s look at two artists who have even more similar plays per hour to each other:

Dido and The Clash - MySpace Plays Per Hour
The Clash and Dido show very high similarity for plays per hour on MySpace over the time frame shown in the chart above. A lot of this will have to do with the overall use of MySpace at any period of time, and the fact that the two artists have not had a lot of activity during that period to make their play counts diverge from each other.
Finally, we’ll search for artists that show similar short term peaks to one other. In this case Muse was flagged as a high match for 50 Cent in September 2009, as is clear in the chart below:

Muse and 50 Cent - MySpace Plays Per Hour
If we look at their discographies – we discover that both Muse and 50 Cent made a release on the same day in September.
We’ll investigate the different reasons why two artists might have similar trends to each other in another blog post, so check back soon!
Read More