<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Musicmetric &#187; language processing</title>
	<atom:link href="http://www.musicmetric.com/tag/language-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.musicmetric.com</link>
	<description>Sexy Data</description>
	<lastBuildDate>Thu, 02 Feb 2012 14:21:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Musicmetric&#8217;s Sentiment Analysis  v1.0 Beta</title>
		<link>http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/</link>
		<comments>http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 01:31:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Labs]]></category>
		<category><![CDATA[Regular]]></category>
		<category><![CDATA[app]]></category>
		<category><![CDATA[language processing]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[text mining]]></category>

		<guid isPermaLink="false">http://www.musicmetric.com/?p=269281820</guid>
		<description><![CDATA[Today we are going to introduce to you another piece of technology we have developed at Musicmetric. As you may know, parts of our product are driven by semantic analysis; we don’t just tell you how many people are talking about your artists, but also their opinions, the sentiment and common topics surrounding them. How [...]]]></description>
			<content:encoded><![CDATA[<p>Today we are going to introduce to you another piece of technology we have developed at Musicmetric. As you may know, parts of our product are driven by semantic analysis; we don’t just tell you how many people are talking about your artists, but also their opinions, the sentiment and common topics surrounding them. How do we do this? Sentiment analysis is a challenging problem that still has not been solved completely. Many so-called sentiment analysis systems use a very naive method to detect sentiment in a context, i.e. using key words or very basic sentence decomposition. However, human language is not that simple, so these approaches fail to capture irony, sarcasm, slang and other idiomatic expressions.<span id="more-269281820"></span></p>
<p>Our methods are <strong> much more advanced </strong> than simple word detection. We have implemented a set of machine learning models that can be trained with different corpora (contexts) so they work well for general language but are also much more accurate for the pre defined contexts – for example, professionally written articles, fan comments and tweets are all different contexts and therefore have different sentiment analysis models trained for each one. Using this approach allows our model to <strong>get more and more intelligent </strong>as we keep downloading data to retrain it frequently. The accuracy of our method is shown in the confusion matrices below:</p>
<div id="attachment_269281807" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.musicmetric.com/products/269281169-revision-110/" rel="attachment wp-att-269281807"><img src="http://www.musicmetric.com/wp-content/uploads/2010/01/polarityConfusionMatrix.jpg" alt="Musicmetric polarity confusion matrix" title="Musicmetric polarity confusion matrix" width="491" height="116" class="size-full wp-image-269281807" /></a><p class="wp-caption-text">Musicmetric polarity confusion matrix</p></div>
<p><br/>So what does this matrix mean? This confusion matrix tells how many percent the system is confusing two classes (i.e.  mislabeling one as another). Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class (yellow hightlighted ones are correct predictions). For example, we can see that 16% of neutral reviews are predicted as negative but only 2% of negative reviews are predicted as neutral or positive. Notably we have <strong>96% of negative reviews are predicted correctly</strong>. </p>
<p><br/>The second confusion matrix is a breakdown for score from 1-5 (3 is neutral):<br />
<div id="attachment_269281792" class="wp-caption center" style="width: 389px"><a href="http://www.musicmetric.com/2010/01/lady-gaga-vs-susan-boyle/269281761-revision-26/" rel="attachment wp-att-269281792"><img src="http://www.musicmetric.com/wp-content/uploads/2010/01/5starConfusionMatrix.jpg" alt="" title="5starConfusionMatrix" width="379" height="172" class="size-full wp-image-269281792" /></a><p class="wp-caption-text">Confusion matrix for score from 1 to 5</p></div></p>
<p>Similarly we can see that 90% of reviews with scores of 2 are predicted correctly while 5% of them are predicted as 3 and none of them are predicted as 5. The below table shows the numbers of reviews we evaluated:<br />
<div id="attachment_269281906" class="wp-caption aligncenter" style="width: 497px"><a href="http://www.musicmetric.com/2010/03/nme-awards-2010/269281839-revision-56/" rel="attachment wp-att-269281906"><img src="http://www.musicmetric.com/wp-content/uploads/2010/01/numberOfSamples.jpg" alt="Number of evaluated samples" title="Number of evaluated samples" width="487" height="172" class="size-full wp-image-269281906" /></a><p class="wp-caption-text">Number of evaluated samples</p></div></p>
<p><br/>The third confusion matrix is a break down for score from 1-10 (<em>Note: this test data is different from above ones and does not include any reviews which have scores less than 5</em>):<br />
<div id="attachment_269281802" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/susan_boyle_social_networks/" rel="attachment wp-att-269281802"><img src="http://www.musicmetric.com/wp-content/uploads/2010/01/10StarConfusionMatrix1.jpg" alt="Musicmetric sentiment confusion matrix for score from 1-10" title="Musicmetric sentiment confusion matrix for score from 1-10" width="500" height="227" class="size-full wp-image-269281802" /></a><p class="wp-caption-text">Musicmetric sentiment confusion matrix for score from 1-10</p></div></p>
<p><br/>We think that is enough of our talking. It is now your turn to see how it works by playing around with the interface to our general purpose music sentiment analysis engine below:</p>
<p><em>Note: Our sentiment analysis usually works better for longer reviews or paragraphs rather than single short sentences, and definitely works better for music related topics. Try pasting in an album review.</em><br />
<br/></p>
<div style="padding: 15px 10px 15px 15px; border: 2px solid #DDDDDD; background-color: #F0F0F0">
<form id="reviewForm" style="text-align: left;">
<p><label for="review" style="font-size: 18px;">Test our sentiment analysis. Please write an opinion or a review below:</label></p>
<p>   <textarea id="review" name="review" style="width: 465px;" rows="8" ></textarea><br />
   <label style="font-size: 10px; line-height: 1.5em;">Our robot will analyse and return a sentiment index from 1 to 5 according to how positive or negative it is (3 for neutral)  after you click on &#8220;submit&#8221;.</label></p>
<div id="recaptcha_div"></div>
<input type="submit" style="font-size: 18px; padding: 5px 5px;" id="submit" value="submit"/>
</form>
<div style="margin-top: 10px;">
<p><label for="sentimentIndex" style="font-size: 18px;">Sentiment index shown below:</label></p>
<div id="sentimentIndex"></div>
</div>
</div>
<p><script type="text/javascript" src="http://api.recaptcha.net/js/recaptcha_ajax.js"></script><br />
<script language="javascript">
jQuery(document).ready(function(){  
  Recaptcha.create("6Le51woAAAAAAI-mHl6qE6dCp7IEXpqjw-xH1Y9b",  "recaptcha_div", {theme: "red"});     
  jQuery("#submit").click(function() {
       jQuery("#sentimentIndex").attr("style", "");
       jQuery("#sentimentIndex").html("analysing...");
       jQuery.post("/wp-content/themes/mateo/includes/SA.php", 
                           jQuery("#reviewForm").serialize(), 
                           function(data) {
                              if (data == "invalid_captcha") {jQuery("#sentimentIndex").html("<span style='color: red'>Invalid captcha !</span>");}
                              else if (data == "no_review") {jQuery("#sentimentIndex").html("<span style='color: red'>You must type in a review or an opinion for sentiment analysis !</span>");}
                              else if (data == "server_error") {jQuery("#sentimentIndex").html("<span style='color: red'>Can't connect to the sentiment analysis server !</span>");}
                              else {
                                rate = parseInt(data);
                                newStyle = "background: url(/wp-content/uploads/2010/01/stars_map.png) 0 "+ (-19*(rate - 1)*2 - 19) + "px no-repeat; height: 17px; width: 83px; display: block";
                                jQuery("#sentimentIndex").html("");
                                jQuery("#sentimentIndex").attr("style", newStyle);
                              }
                              Recaptcha.reload();
                           }, 
                           "text");                   
      return false;
    });
});
</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

