<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Musicmetric - professional music analytics &#187; Labs</title>
	<atom:link href="http://www.musicmetric.com/category/labs/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.musicmetric.com</link>
	<description>Musicmetric tracks what is happening to music online. We do this by data mining the web,  we crawl and analyse tens of thousands of pages per day, and monitor thousands of live data  sources and p2p networks to deliver a fully featured music analytics platform.</description>
	<lastBuildDate>Thu, 29 Jul 2010 14:09:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Musicmetric&#8217;s Sentiment Analysis  v1.0 Beta</title>
		<link>http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/</link>
		<comments>http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 01:31:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Labs]]></category>
		<category><![CDATA[regular]]></category>
		<category><![CDATA[app]]></category>
		<category><![CDATA[language processing]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[text mining]]></category>

		<guid isPermaLink="false">http://www.musicmetric.com/?p=269281820</guid>
		<description><![CDATA[Today we are going to introduce to you another piece of technology we have developed at Musicmetric. As you may know, parts of our product are driven by semantic analysis; [...]]]></description>
			<content:encoded><![CDATA[<p>Today we are going to introduce to you another piece of technology we have developed at Musicmetric. As you may know, parts of our product are driven by semantic analysis; we don’t just tell you how many people are talking about your artists, but also their opinions, the sentiment and common topics surrounding them. How do we do this? Sentiment analysis is a challenging problem that still has not been solved completely. Many so-called sentiment analysis systems use a very naive method to detect sentiment in a context, i.e. using key words or very basic sentence decomposition. However, human language is not that simple, so these approaches fail to capture irony, sarcasm, slang and other idiomatic expressions.</p>
<p>Our methods are <strong> much more advanced </strong> than simple word detection. We have implemented a set of machine learning models that can be trained with different corpora (contexts) so they work well for general language but are also much more accurate for the pre defined contexts – for example, professionally written articles, fan comments and tweets are all different contexts and therefore have different sentiment analysis models trained for each one. Using this approach allows our model to <strong>get more and more intelligent </strong>as we keep downloading data to retrain it frequently. The accuracy of our method is shown in the confusion matrices below:</p>
<div id="attachment_269281807" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.musicmetric.com/products/269281169-revision-110/" rel="attachment wp-att-269281807"><img src="http://wptest.musicmetric.com/wp-content/uploads/2010/01/polarityConfusionMatrix.jpg" alt="Musicmetric polarity confusion matrix" title="Musicmetric polarity confusion matrix" width="491" height="116" class="size-full wp-image-269281807" /></a><p class="wp-caption-text">Musicmetric polarity confusion matrix</p></div>
<p><br/>So what does this matrix mean? This confusion matrix tells how many percent the system is confusing two classes (i.e.  mislabeling one as another). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class (yellow hightlighted ones are correct predictions). For example, we can see that 16% of neutral reviews are predicted as negative but only 2% of negative reviews are predicted as neutral or positive. Notably we have <strong>96% of negative reviews are predicted correctly</strong>. </p>
<p><br/>The second confusion matrix is a breakdown for score from 1-5 (3 is neutral):<br />
<div id="attachment_269281792" class="wp-caption center" style="width: 389px"><a href="http://www.musicmetric.com/2010/01/lady-gaga-vs-susan-boyle/269281761-revision-26/" rel="attachment wp-att-269281792"><img src="http://wptest.musicmetric.com/wp-content/uploads/2010/01/5starConfusionMatrix.jpg" alt="" title="5starConfusionMatrix" width="379" height="172" class="size-full wp-image-269281792" /></a><p class="wp-caption-text">Confusion matrix for score from 1 to 5</p></div></p>
<p>Similarly we can see that 90% of reviews with scores of 2 are predicted correctly while 5% of them are predicted as 3 and none of them are predicted as 5. The below table shows the numbers of reviews we evaluated:<br />
<div id="attachment_269281906" class="wp-caption aligncenter" style="width: 497px"><a href="http://www.musicmetric.com/2010/03/nme-awards-2010/269281839-revision-56/" rel="attachment wp-att-269281906"><img src="http://wptest.musicmetric.com/wp-content/uploads/2010/01/numberOfSamples.jpg" alt="Number of evaluated samples" title="Number of evaluated samples" width="487" height="172" class="size-full wp-image-269281906" /></a><p class="wp-caption-text">Number of evaluated samples</p></div></p>
<p><br/>The third confusion matrix is a break down for score from 1-10 (<em>Note: this test data is different from above ones and does not include any reviews which have scores less than 5</em>):<br />
<div id="attachment_269281802" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/susan_boyle_social_networks/" rel="attachment wp-att-269281802"><img src="http://wptest.musicmetric.com/wp-content/uploads/2010/01/10StarConfusionMatrix1.jpg" alt="Musicmetric sentiment confusion matrix for score from 1-10" title="Musicmetric sentiment confusion matrix for score from 1-10" width="500" height="227" class="size-full wp-image-269281802" /></a><p class="wp-caption-text">Musicmetric sentiment confusion matrix for score from 1-10</p></div></p>
<p><br/>We think that is enough of our talking. It is now your turn to see how it works by playing around with the interface to our general purpose music sentiment analysis engine below:</p>
<p><em>Note: Our sentiment analysis usually works better for longer reviews or paragraphs rather than single short sentences, and definitely works better for music related topics. Try pasting in an album review.</em><br />
<br/></p>
<div style="padding: 15px 10px 15px 15px; border: 2px solid #DDDDDD; background-color: #F0F0F0">
<form id="reviewForm" style="text-align: left;">
<p><label for="review" style="font-size: 18px;">Test our sentiment analysis. Please write an opinion or a review below:</label></p>
<p>   <textarea id="review" name="review" style="width: 465px;" rows="8" ></textarea><br />
   <label style="font-size: 10px; line-height: 1.5em;">Our robot will analyse and return a sentiment index from 1 to 5 according to how positive or negative it is (3 for neutral)  after you click on &#8220;submit&#8221;.</label></p>
<div id="recaptcha_div"></div>
<input type="submit" style="font-size: 18px; padding: 5px 5px;" id="submit" value="submit"/>
</form>
<div style="margin-top: 10px;">
<p><label for="sentimentIndex" style="font-size: 18px;">Sentiment index shown below:</label></p>
<div id="sentimentIndex"></div>
</div>
</div>
<p><script type="text/javascript" src="http://api.recaptcha.net/js/recaptcha_ajax.js"></script><br />
<script language="javascript">
jQuery(document).ready(function(){  
  Recaptcha.create("6Le51woAAAAAAI-mHl6qE6dCp7IEXpqjw-xH1Y9b",  "recaptcha_div", {theme: "red"});     
  jQuery("#submit").click(function() {
       jQuery("#sentimentIndex").attr("style", "");
       jQuery("#sentimentIndex").html("analysing...");
       jQuery.post("/wp-content/themes/default/includes/SA.php", 
                           jQuery("#reviewForm").serialize(), 
                           function(data) {
                              if (data == "invalid_captcha") {jQuery("#sentimentIndex").html("<span style='color: red'>Invalid captcha !</span>");}
                              else if (data == "no_review") {jQuery("#sentimentIndex").html("<span style='color: red'>You must type in a review or an opinion for sentiment analysis !</span>");}
                              else if (data == "server_error") {jQuery("#sentimentIndex").html("<span style='color: red'>Can't connect to the sentiment analysis server !</span>");}
                              else {
                                rate = parseInt(data);
                                newStyle = "background: url(/wp-content/uploads/2010/01/stars_map.png) 0 "+ (-19*(rate - 1)*2 - 19) + "px no-repeat; height: 17px; width: 83px; display: block";
                                jQuery("#sentimentIndex").html("");
                                jQuery("#sentimentIndex").attr("style", newStyle);
                              }
                              Recaptcha.reload();
                           }, 
                           "text");                   
      return false;
    });
});
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analysing trends over time with musicmetric</title>
		<link>http://www.musicmetric.com/2009/12/analysing-trends-over-time-with-musicmetric/</link>
		<comments>http://www.musicmetric.com/2009/12/analysing-trends-over-time-with-musicmetric/#comments</comments>
		<pubDate>Sun, 13 Dec 2009 15:13:49 +0000</pubDate>
		<dc:creator>greg</dc:creator>
				<category><![CDATA[Labs]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[music analytics]]></category>
		<category><![CDATA[time series]]></category>
		<category><![CDATA[trend analysis]]></category>

		<guid isPermaLink="false">http://wptest.musicmetric.com/?p=269281521</guid>
		<description><![CDATA[In this blog post we’re going to look at an example of some of the data mining and large scale analysis which we do at musicmetric, detecting patterns and similarities in time series data.]]></description>
			<content:encoded><![CDATA[<p>In this blog post we’re going to look at an example of some of the data mining and large scale analysis which we do at musicmetric, detecting patterns and similarities in time series data.</p>
<p>One use of this analysis is that given an artist, we can find another artist with the closest trend in some variable over time – for example MySpace plays per hour. Alternatively we could generate a list of artists who are increasing in popularity in a certain way, or show which artists have had a brief surge in activity – maybe caused an album release or gig.</p>
<p>Because we store all the data indefinitely and in such a way that we can access it very rapidly, we can run regular batch analysis on the contents of our data warehouse to unlock interesting information.</p>
<p>In this example, we will compare the play count time series data for the top 20,000 artists by total plays on MySpace. It is important to consider that some trends may follow each other with a time lag, so we compare the 20K time series at multiple time lags from 0 to 30 days in the past, in 1 day increments. This means the approximate number of time series comparisons our analysis servers must do for this particular problem is 6 Billion, each one comparing hourly resolution data over a period of 4 months.</p>
<p>Let’s take a look at which artist has a similar trend to Kings of Leon:</p>
<div class="wp-caption alignnone" style="width: 571px"><a href="/wp-content/uploads/2009/12/kol_thefray3.jpg" rel="lightbox[269281521]"><img title="Kings of Leon and The Fray - Plays per Hour" src="/wp-content/uploads/2009/12/kol_thefray3.jpg" alt="Kings of Leon and The Fray - MySpace Plays Per Hour" width="561" height="421" /></a><p class="wp-caption-text">Kings of Leon and The Fray - MySpace Plays Per Hour</p></div>
<p>We can see the plays per hour for The Fray seem to be following a similar long term trend to that of Kings of Leon, but offset by the difference in their popularity on MySpace – although they are converging as time goes on. The peaks and troughs also line up, so clearly the fine resolution hourly variation in the data has something to do with the overall use of MySpace at any period in time, not just the popularity of the artist. This is something that can be seen over most MySpace data.</p>
<p>Now let’s look at two artists who have even more similar plays per hour to each other:</p>
<div class="wp-caption alignnone" style="width: 570px"><a href="/wp-content/uploads/2009/12/dido_theclash.jpg" rel="lightbox[269281521]"><img title="Dido and The Clash - MySpace Plays Per Hour" src="/wp-content/uploads/2009/12/dido_theclash.jpg" alt="Dido and The Clash - MySpace Plays Per Hour" width="560" height="420" /></a><p class="wp-caption-text">Dido and The Clash - MySpace Plays Per Hour</p></div>
<p>The Clash and Dido show very high similarity for plays per hour on MySpace over the time frame shown in the chart above. A lot of this will have to do with the overall use of MySpace at any period of time, and the fact that the two artists have not had a lot of activity during that period to make their play counts diverge from each other.</p>
<p>Finally, we&#8217;ll search for artists that show similar short term peaks to one other. In this case Muse was flagged as a high match for 50 Cent in September 2009, as is clear in the chart below:</p>
<div class="wp-caption alignnone" style="width: 570px"><a href="/wp-content/uploads/2009/12/50cent_muse.jpg" rel="lightbox[269281521]"><img title="Muse and 50 Cent - MySpace Plays Per Hour" src="/wp-content/uploads/2009/12/50cent_muse.jpg" alt="Muse and 50 Cent - MySpace Plays Per Hour" width="560" height="420" /></a><p class="wp-caption-text">Muse and 50 Cent - MySpace Plays Per Hour</p></div>
<p>If we look at their discographies – we discover that both Muse and 50 Cent made a release on the same day in September.</p>
<p>We’ll investigate the different reasons why two artists might have similar trends to each other in another blog post, so check back soon!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.musicmetric.com/2009/12/analysing-trends-over-time-with-musicmetric/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Twitter Filtering</title>
		<link>http://www.musicmetric.com/2009/12/twitter-filtering/</link>
		<comments>http://www.musicmetric.com/2009/12/twitter-filtering/#comments</comments>
		<pubDate>Fri, 04 Dec 2009 18:38:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Labs]]></category>
		<category><![CDATA[regular]]></category>
		<category><![CDATA[music analytics]]></category>
		<category><![CDATA[musicmetric]]></category>
		<category><![CDATA[twitter analysis]]></category>
		<category><![CDATA[twitter analytics]]></category>

		<guid isPermaLink="false">http://musicmetric.tumblr.com/post/269281166</guid>
		<description><![CDATA[In this blog we’re going to show you an important feature that helps distinguish the quality of data supplied by musicmetric: The ability to disambiguate whether mentions of an artist [...]]]></description>
			<content:encoded><![CDATA[<p>In this blog we’re going to show you an important feature that helps distinguish the quality of data supplied by musicmetric: The ability to disambiguate whether mentions of an artist with a common word as their name are in fact referring to the artist. Likewise, distinguishing between two artists that have the same name.</p>
<p>These methods are applicable to any text based data, but for this example we’ll take a look at Twitter.</p>
<p>Musicmetric collects all mentions of an artist on Twitter. Taking an example of the rock band Oasis, we collects tweets in the following 3 categories:</p>
<ul class="indent_ul">
<li class="indent_ul"><em>name mentions</em>: “Oasis”</li>
<li class="indent_ul"><em>replies</em>: “@Oasis”</li>
<li class="indent_ul"><em>retweets</em>: “RT @Oasis”</li>
</ul>
<p>If the artist does not have a twitter ID, we still track their name mentions &#8211; and we are currently tracking over 500,000 artists.</p>
<p>It is obvious that all <em>replies </em>and <em>retweets</em> are definitely relevant to the band but some <em>name mentions </em>are probably not. When people post a tweet which includes the word “Oasis”, they might mean Oasis rock band, an isolated area of vegetation and water in a desert or just a name of a random bar or restaurant. Therefore it would be naive to collect tweets without filtering them because this trend data would not reflect the real popularity of the band Oasis on Twitter.</p>
<p>These name mentions are important since a lot of the time people will not cite the @username of the artist when referring to them on twitter (as can be seen in the examples below) and of course, not all bands even have a twitter ID.</p>
<p>At musicmetric, we have developed proprietary algorithms to deal with irrelevant tweets effectively. We analyse all tweets and successfully filter out irrelevant messages by assigning a probability that the tweet is relevant to that particular artist.</p>
<p>The table below shows a good example of our algorithm’s efficiency:</p>
<p><img src="/wp-content/uploads/2009/12/twitter_filtering_table.png" alt="Filtering tweets about the band &quot;Oasis&quot;" /></p>
<p>Even though there are still few irrelevant tweets (highlighted red) and some vague tweets which we can not tell whether they are relevant or not (highlighted blue), the accuracy has been improved a lot in comparison to the raw data. Currently for bands or artists who have very common names like Oasis, our model can filter up to 70%-80% of irrelevant tweets. For bands or artists who have distinct names like Lady Gaga or Robbie Williams, the model can filter up to 95%-100% of irrelevant tweets.</p>
<p>The chart below shows the number of tweets mentioning Oasis per hour before and after being filtered. You can see a big difference and that is why the filter is very important.</p>
<p><a title="Filtered and unfiltered tweets mentioning &quot;Oasis&quot;" href="http://www.musicmetric.com/images/blog_images/twitter_filtering_graph.jpg" target="_blank" rel="lightbox[269281166]"><img src="http://media.tumblr.com/tumblr_ku54q95FzH1qa4xm1.jpg" alt="Filtered and unfiltered tweets mentioning &quot;Oasis&quot;" /></a></p>
<p>We are still collecting more data and adding more valuable information to our model. Therefore it is expected to work more and more accurately &#8211; it learns as it goes, and it can read 96 Million tweets per day, so it learns very quickly.</p>
<p>Why not check some live stats for your bands by registering for a musicmetric Essentials <a title="musicmetric Applications" href="/products/">trial</a>?</p>
<p>Trung</p>
]]></content:encoded>
			<wfw:commentRss>http://www.musicmetric.com/2009/12/twitter-filtering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
