<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sanjay Kairam &#187; /Matter</title>
	<atom:link href="http://www.sanjaykairam.com/blog/category/matter/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sanjaykairam.com/blog</link>
	<description>Graduate Student &#38; Armchair Philosopher</description>
	<lastBuildDate>Thu, 19 Jan 2012 23:09:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>On Grad School, Creativity, and &#8220;Honoring Your Vomit&#8221;</title>
		<link>http://www.sanjaykairam.com/blog/2011/04/grad-school-creativity-and-honoring-your-vomit/</link>
		<comments>http://www.sanjaykairam.com/blog/2011/04/grad-school-creativity-and-honoring-your-vomit/#comments</comments>
		<pubDate>Thu, 28 Apr 2011 18:29:02 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[/Me]]></category>
		<category><![CDATA[/Meaning]]></category>
		<category><![CDATA[/Meta]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[creativity]]></category>
		<category><![CDATA[expertise]]></category>
		<category><![CDATA[ira glass]]></category>
		<category><![CDATA[keith sawyer]]></category>
		<category><![CDATA[lady gaga]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=308</guid>
		<description><![CDATA[Back when I was just starting graduate school, I remember already feeling as if I understood the components needed for great scientific research: knowledge of a domain, the ability to implement a system or execute an experiment, and a creative insight about a phenomenon worth studying. While the domain knowledge and ability to execute seemed like pre-requisites for doing science at all, the capacity for creativity seemed to the element that separated a great scientist from the good. Since I felt like I was good at identifying creative research, I hoped that once I immersed myself in academia and started gaining domain knowledge and engineering skill, the creative ideas would come to me. Now, almost a year into my PhD program, I feel like I have learned a great deal, but I am left with the question: Where are all those good ideas?]]></description>
			<content:encoded><![CDATA[<p>Back when I was just starting graduate school, I remember feeling as if I already understood the components needed for great scientific research: knowledge of a domain, the ability to implement a system or execute an experiment, and a creative insight about a phenomenon worth studying. While the domain knowledge and ability to execute seemed like pre-requisites for doing science at all, the capacity for creativity seemed to the element that separated a great scientist from the good. Since I felt like I was good at identifying creative research, I hoped that once I immersed myself in academia and started gaining domain knowledge and engineering skill, the creative ideas would come to me. Now, almost a year into my PhD program, I feel like I have learned a great deal, but I am left with the question: Where are all those good ideas?</p>
<p>Now, don&#8217;t get me wrong &#8211; I know that I have a long way left to go until people start calling me Dr. Kairam. <a href="http://www.psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html" target="_blank">At least for piano players, Ericsson theorized that 10,000 hours was the required amount of time to gain expertise</a>, and I had always figured that PhD programs were around 5 years long for that very reason (40 hours/week * 50 weeks/year * 5 years = 10,000 hours, though it seems that some of us may become &#8216;double-experts&#8217; by the time we&#8217;re done!). However, we&#8217;re also expected to complete some great research before we&#8217;ve finished the program; while I&#8217;ve done some research so far that I think is pretty good, I don&#8217;t think I&#8217;ve had any insights yet that I would consider &#8216;great&#8217;. As a result, it&#8217;s become difficult to shake the nagging doubt that perhaps I won&#8217;t get there.</p>
<p>Just as I was beginning to hit a low point, however, I came across this great video of radio host <a title="This American Life - Home" href="http://www.thisamericanlife.org/" target="_blank">Ira Glass</a>:</p>
<p><object width="425" height="349"><param name="movie" value="http://www.youtube.com/v/BI23U7U2aUY?fs=1&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="425" height="349" src="http://www.youtube.com/v/BI23U7U2aUY?fs=1&amp;hl=en_US" allowfullscreen="true" allowscriptaccess="always"></embed></object><br />
In case you don&#8217;t want to watch, he starts off by saying:</p>
<blockquote><p>&#8220;Nobody tells this to people who are beginners, and I really wish someone had told me&#8230;All of us who do creative work, we get into it because we have good taste&#8230;But there&#8217;s a gap &#8211; that for the first couple years you&#8217;re making stuff, what you&#8217;re making isn&#8217;t so good&#8230;it&#8217;s trying to be good, it has ambition to be good, but it&#8217;s not quite that good. But your taste, the thing that got you into the game&#8230;is still killer. And your taste is good enough that you can tell what you&#8217;re making is kind of a disappointment to you&#8230;A lot of people never get past this phase&#8230;they quit.&#8221;</p></blockquote>
<p>Inspired by this quote, I&#8217;ve decided to try and implement two policies to help foster my own creativity in research (as well as some other areas where I&#8217;m often creatively blocked, including songwriting and posting on this blog).</p>
<p><em><strong>1. Repetition, Repetition, Repetition</strong></em></p>
<p>Glass continues later in the video with the advice:</p>
<blockquote><p>&#8220;The most important possible thing you can do is do a lot of work. Do a huge volume of work. Put yourself on a deadline so that every week or every month you know you&#8217;re going to finish one story&#8230;because it&#8217;s only by going through a volume of work that you&#8217;re actually going to catch up and close that gap and the work you&#8217;re making will be as good as your ambitions.&#8221;</p></blockquote>
<p>Fostering creativity through repetition is evident in the insights gained from psychologist <a title="Keith Sawyer - About" href="http://keithsawyer.wordpress.com/about/" target="_blank">Keith Sawyer</a>&#8216;s interviews of winners of the <a title="New Yorker Caption Contest" href="http://www.newyorker.com/humor/caption" target="_blank">New Yorker cartoon caption contest</a>. According to his research, &#8220;the &#8216;sudden flash of insight&#8217; is largely a myth&#8221;; instead, creative ideas &#8216;emerge over time&#8217; through &#8216;hard work and constant revision&#8217;. Specifically, he says:</p>
<blockquote><p>&#8220;Cartoon contest winners usually generate lots of captions. Studies have shown that quantity breeds quality &#8211; what I call the <em>productivity theory</em>, because high productivity corresponds to high creativity. When the famous physicist Freeman Dyson was asked how to generate good ideas, he said, &#8216;Have a lot of ideas, then throw out the bad ones.&#8217; &#8220;</p></blockquote>
<p>An important element in following this advice is reminding myself that I don&#8217;t have to publish everything I produce. If a project fails but spurs new ideas and helps me gain necessary skills, then I should view it as a success. If a song or blog post never quite comes together, it may inspire something better down the line. The important thing is to rehearse the process of crafting an idea, executing it, and committing it to paper so that I get practice with the creative part of the process. Regarding the process itself, this brings me to my second point:</p>
<p><em><strong>2. Honor My Ideas</strong></em></p>
<p>I draw my inspiration for this second policy from Lady Gaga, an artist who I view to be consistently creative. Near the end of GagaVision, episode 43, she describes her creative process:<br />
<object width="560" height="349"><param name="movie" value="http://www.youtube.com/v/O6Gs6d1-Sew?fs=1&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="560" height="349" src="http://www.youtube.com/v/O6Gs6d1-Sew?fs=1&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object><br />
Transcribed:</p>
<blockquote><p>The creative process is approximately 15 minutes of vomiting my creative ideas&#8230;And then I spend days, weeks, months, years fine-tuning, but the idea is that you honor your vomit. You have to honor your vomit &#8211; you have to honor those 15 minutes.</p></blockquote>
<p>While it sounds silly (and a little gross), I found these thoughts to be very instructive. I think that while I often have ideas that are creative or &#8216;out-there&#8217;, my internal filter shuts them down before I ever get a chance to examine whether or not they are viable. By committing your ideas to paper as soon as you have them, you can circumvent this filtering process so that those ideas don&#8217;t get lost. As Dyson said above, having a lot of ideas is a first step towards having good ideas.</p>
<p>As I&#8217;ve been taking the Caltrain to Stanford more often these days (in no way motivated by my spotting a sign for $4.99/gallon gas last week), I&#8217;ve decided to implement a policy of spending each morning train ride just throwing ideas on paper. Whether it&#8217;s lyrics to a song, thoughts for a blog post, or ideas for research, by forcing myself to just &#8216;vomit up&#8217; whatever&#8217;s in my head, I am hoping that this deliberate practice at creativity will result in more ideas, and thus more good ideas, getting past my filter. In fact, that is actually how I put this blog post together, so let&#8217;s see if it keeps working.</p>
<p>If you try these or discover other methods for fostering your own creativity, share your experience in the comments!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2011/04/grad-school-creativity-and-honoring-your-vomit/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A Brief Overview of TurKit</title>
		<link>http://www.sanjaykairam.com/blog/2011/03/a-brief-overview-of-turkit/</link>
		<comments>http://www.sanjaykairam.com/blog/2011/03/a-brief-overview-of-turkit/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 19:05:26 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[crowds]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[HITs]]></category>
		<category><![CDATA[human computation]]></category>
		<category><![CDATA[human intelligence tasks]]></category>
		<category><![CDATA[mechanical turk]]></category>
		<category><![CDATA[mturk]]></category>
		<category><![CDATA[TurKit]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=292</guid>
		<description><![CDATA[These slides are from a presentation I gave in Sep Kamvar's Computational Methods in Data Mining (old website link here). In the presentation, I presented TurKit, a programming framework created by Greg Little and others at MIT that allows for programmatic iteration over tasks in Mechanical Turk. Essentially, that means that instead of the familiar paradigm of sending out a bunch of HITs and waiting for the responses, TurKit will ping AMT for answers and these answers can be used in future HITs. This allows for the use of an "improve and vote" loop, where Turkers continually improve on and validate the work of other Turkers. They had some impressive results in the paper, getting fairly high quality responses to a wide range of tasks (including image labeling, handwriting recognition, and brainstorming) for under $0.50.]]></description>
			<content:encoded><![CDATA[<p>These slides are from a presentation I gave in<a title="Sep Kamvar - Home Page" href="http://kamvar.org/" target="_blank"> Sep Kamvar</a>&#8216;s Computational Methods in Data Mining (old website link <a title="Stanford CME 340 - Computation Methods in Data Mining" href="http://kamvar.org/cme340/" target="_blank">here</a>). In the presentation, I presented TurKit, a programming framework created by <a title="MIT CSAIL - Greg Little" href="http://people.csail.mit.edu/glittle/" target="_blank">Greg Little</a> and others at MIT that allows for programmatic iteration over tasks in <a title="Amazon Mechanical Turk - Welcome" href="https://www.mturk.com/mturk/welcome" target="_blank">Mechanical Turk</a>. Essentially, that means that instead of the familiar paradigm of sending out a bunch of HITs and waiting for the responses, TurKit will ping AMT for answers and these answers can be used in future HITs. This allows for the use of an &#8220;improve and vote&#8221; loop, where Turkers continually improve on and validate the work of other Turkers. They had some impressive results in the paper, getting fairly high quality responses to a wide range of tasks (including image labeling, handwriting recognition, and brainstorming) for under $0.50.</p>
<p>The presentation ends with a quick intro in the JavaScript code (from the Iterative Text Improvement example on the <a title="TurKit - Home Page" href="http://groups.csail.mit.edu/uid/turkit/" target="_blank">TurKit website</a>) and some hopefully helpful information to know when using the Java application that you can download to try out TurKit. If I survive the end of the quarter, I hope to get a post up with some TurKit tutorial tips and lessons learned. If you have questions about TurKit, let me know, and I&#8217;ll try to get them answered in the next post!</p>
<div id="__ss_7235176" style="width: 425px;">
<p><strong><a title="TurKit: Tools for Iterative Tasks on Mechanical Turk [Little, et al. 2010]" href="http://www.slideshare.net/skairam/turkit-tools-for-iterative-tasks-on-mechanical-turk-little-et-al-2010">TurKit: Tools for Iterative Tasks on Mechanical Turk [Little 2010]</a></strong><object id="__sse7235176" width="425" height="355" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=turkitpresowebsite-110311124935-phpapp01&amp;stripped_title=turkit-tools-for-iterative-tasks-on-mechanical-turk-little-et-al-2010&amp;userName=skairam" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><embed id="__sse7235176" width="425" height="355" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=turkitpresowebsite-110311124935-phpapp01&amp;stripped_title=turkit-tools-for-iterative-tasks-on-mechanical-turk-little-et-al-2010&amp;userName=skairam" allowFullScreen="true" allowScriptAccess="always" allowfullscreen="true" allowscriptaccess="always" /></object></p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/skairam">Sanjay Kairam</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2011/03/a-brief-overview-of-turkit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Do You Know Jules Verne? What&#8217;s He Like?</title>
		<link>http://www.sanjaykairam.com/blog/2011/02/do-you-know-jules-verne-whats-he-like/</link>
		<comments>http://www.sanjaykairam.com/blog/2011/02/do-you-know-jules-verne-whats-he-like/#comments</comments>
		<pubDate>Thu, 03 Feb 2011 01:10:38 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[aardvark]]></category>
		<category><![CDATA[credibility]]></category>
		<category><![CDATA[David Pogue]]></category>
		<category><![CDATA[MG Siegler]]></category>
		<category><![CDATA[online communities]]></category>
		<category><![CDATA[Pogue's Posts]]></category>
		<category><![CDATA[Q&A Sites]]></category>
		<category><![CDATA[Quora]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[social computing]]></category>
		<category><![CDATA[social internet]]></category>
		<category><![CDATA[social search]]></category>
		<category><![CDATA[TechCrunch]]></category>
		<category><![CDATA[web 2.0]]></category>
		<category><![CDATA[Web Startups]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=280</guid>
		<description><![CDATA[The buzz around Q&#038;A startup Quora has been building steadily over the past couple of months. I measure this not only by the number of Follow messages received concerning people randomly sampled from my Facebook connections which are now flooding my inbox, but also by the heated debate that is developing about the site's usefulness, much of which is chronicled in this TechCrunch article about the "Quora Backlash Backlash". ]]></description>
			<content:encoded><![CDATA[<p>The buzz around Q&amp;A startup <a title="Quora - Home Page" href="http://quota.com" target="_blank">Quora</a> has been building steadily over the past couple of months. I measure this not only by the number of Follow messages now flooding my inbox concerning people randomly sampled from my Facebook connections, but also by the heated debate that is developing about the site&#8217;s usefulness, much of which was chronicled in this TechCrunch article about the &#8220;<a title="TechCrunch - Quora Backlash Backlash" href="http://techcrunch.com/2011/01/31/quora-quora-quora-quora-quora-quora-quora/" target="_blank">Quora Backlash Backlash</a>&#8220;.</p>
<p>Based on this post and other comments, it is fairly clear that MG Siegler is on &#8220;Team Quora&#8221;, calling Quora &#8220;a great source of information like Twitter and Facebook and blogs themselves.&#8221; Having been a Quora user for several months, I have found the quality of answers on the site to be extremely high. These have ranged from the much-celebrated cases of high-profile individuals answering questions about topics pertaining to them (e.g. Netflix CEO Reed Hastings answering the question &#8220;<a title="Quora - How Much Does Netflix Spend on Postage Each Year?" href="http://www.quora.com/Netflix/How-much-does-Netflix-spend-on-postage-each-year?" target="_blank">How much does Netflix spend on postage every year?</a>&#8220;) to the opportunities for creative individuals to answer questions in awesome and innovative ways (see Wavii programmer <a title="Erik Frey" href="http://fawx.com/" target="_blank">Erik Frey</a>&#8216;s answer to the question &#8220;<a title="Quora - Which animal has been used most frequently for a band name?" href="http://www.quora.com/Which-animal-has-been-used-most-frequently-for-a-band-name" target="_blank">Which animal has been used most frequently for a band name?</a>&#8220;). I have also personally asked a number of questions and gotten timely and high-quality answers.</p>
<p>However, the key question here is whether the site will continue to be as useful as more and more people join. While many might argue that including more subject matter experts can only improve the site, one must also remember that this increased signal is only useful when it can be separated from the increased noise. Right now, Quora is a bit like &lt;nerd alert&gt;Flynn&#8217;s cave dwelling in the Outlands, where the information contained is only safe as long as the masses can&#8217;t get to it&lt;/nerd alert&gt;; users can trust the answers they find because they often come paired with often famous or at least recognizable names and faces. Yahoo! Answers is a great example of how a Q&amp;A site can decrease in quality with respect to both the questions and answers as it opens up (for a quick, possibly NSFWish laugh, check out &#8220;<a title="11 Points - 11 Stupid..." href="http://www.11points.com/Web-Tech/11_Stupid_Questions_From_Yahoo_Answers_That_Have_Changed_My_Life" target="_blank">11 Stupid Questions from Yahoo Answers That Have Changed My Life</a>&#8220;) Even if you develop the most robust social answer-quality-checker imaginable, the presence of thousands of stupid questions tagged with topic tags that direct them to your inbox is going to turn a lot of the quality answer providers away from the site. In some ways, I felt that Aardvark fell into that trap as it grew more popular, and I now find myself answering a lot of questions that involved identifying rashes.</p>
<p>I had this question about the possible perils of mainstream adoption in mind, when reading a <a title="NY Times: Pogue's Posts - Quora Raises Questions" href="http://pogue.blogs.nytimes.com/2011/02/01/quora-raises-questions/" target="_blank">NY Times blog post</a> this week, in which David Pogue describes his first interactions with the site as &#8220;a descent into bafflement.&#8221; Among the parts of the site that he deems confusing are the login process, the task of adding connections and following topics, and the actual task of asking a question. For chronicling his confusion, however, he earned the following response from Siegler (<a title="ParisLemon - Is this Quora or the Laundromat?" href="http://parislemon.com/post/3065294483/is-this-quora-or-the-laundromat" target="_blank">on his personal blog</a>):</p>
<blockquote><p>Is this meant to be written from the perspective of a 95-year-old senile man?</p>
<p>Apparently, every site should be designed in a way so that it’s just  like every other site that failed before it on the Internet. Makes  perfect sense.</p>
<p>Prediction: he’ll love Quora in 12 months.</p></blockquote>
<p>Now, I would consider David Pogue to be a relatively tech-savvy individual. He&#8217;s been blogging about Web/technology topics for over 10 years now, and I generally find his posts to be pretty interesting and insightful. If he is having this much difficulty using the site, I think that his frustrations really do say something about Quora&#8217;s current potential to reach beyond the geek crowd into the general public . And on some level, I think that the desire to mock him for this reflects an underlying recognition that the Silicon Valley influentials who currently use the site wouldn&#8217;t actually benefit from &#8220;regular people&#8221; using it, since this could potentially spell the end of Quora&#8217;s usefulness. Perhaps a better use of time might be to consider how to make the site more user-friendly and how to maintain the quality as new users arrive.</p>
<p>Studying sites like <a title="Wikipedia - Home" href="http://en.wikipedia.org" target="_blank">Wikipedia</a> is a great way to examine how to maintain quality while scaling out to a broader audience. While it&#8217;s difficult to ascertain what actually makes Wikipedia work (<a title="Edge 2008: Kevin Kelly" href="http://www.edge.org/q2008/q08_6.html#kelly" target="_blank">in practice, if not in theory</a>), many of the qualities inherent in Wikipedia are those identified in a 2005 paper [1] by J.M. Leimeister and colleagues at the <a title="TUM Home" href="http://portal.mytum.de/welcome/" target="_blank">Technische Universität München</a> as factors which promote community success in an environment where trusted information is key. Some of these include exposing the identity of content providers, clearly establishing goals for the community, making member profiles available to other members, and providing various levels of anonymity, all of which are things which are built into Wikipedia&#8217;s core. Another feature recognized across the literature is recognition and rewards for contributors, something embodied in Wikipedia in the form of &#8220;<a title="Wikipedia - Barnstars" href="http://en.wikipedia.org/wiki/Wikipedia:Barnstars" target="_blank">barnstars</a>&#8220;. Quora does a great job of exposing identity (which will likely make it much better than Yahoo! Answers), but I don&#8217;t believe that it adequately addresses these other elements. I think that the addition of &#8220;moderators&#8221; or other custodian roles for hyper-motivated users could be the kind of thing that keeps the Quora community in check, and I would be eager to see them roll something like that out before opening the site up to the general public (I think it&#8217;s technically still in beta, no?)</p>
<p>Quora is a great source of information for me in some of the same ways that Facebook and Twitter are. The key difference is that in those media, I can control what I see and who it comes from. If you want to imagine the utility of those sites without such controls, just imagine trying to sift through the real-time Twitter stream &#8211; it&#8217;s mind-numbing, to be kind (unless of course you are a Justin Bieber fan, in which case you should be delighted). The goals of those who currently enjoy Quora&#8217;s usefulness should be to help maintain that usefulness as the site grows. I&#8217;m not saying that this will be an easy task, but if it isn&#8217;t accomplished, I don&#8217;t think that anyone is going to love the site in 12 months.</p>
<p>[1] <a title="ACM Digital Library" href="http://portal.acm.org/citation.cfm?id=1277723" target="_blank">Leimeister, J.M., Ebner, W., &amp; Krcmar, H. 2005: Design, implementation, and evaluation of trust-supporting components in virtual communities for patients. <em>Journal of Management Information Systems 21</em>, 4, 101-135.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2011/02/do-you-know-jules-verne-whats-he-like/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualizing Cross-Posting Behavior in Online Medical Communities</title>
		<link>http://www.sanjaykairam.com/blog/2010/11/visualizing-cross-posting-behavior-in-online-medical-communities/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/11/visualizing-cross-posting-behavior-in-online-medical-communities/#comments</comments>
		<pubDate>Mon, 15 Nov 2010 16:30:40 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[class]]></category>
		<category><![CDATA[coursework]]></category>
		<category><![CDATA[CS 448B]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[diana maclean]]></category>
		<category><![CDATA[network visualization]]></category>
		<category><![CDATA[OHC]]></category>
		<category><![CDATA[online communities]]></category>
		<category><![CDATA[online health communities]]></category>
		<category><![CDATA[social network]]></category>
		<category><![CDATA[Stanford]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=251</guid>
		<description><![CDATA[This quarter, I've been taking two classes: Data Visualization, taught by Jeff Heer (my rotation advisor for this quarter), and Social and Information Network Analysis, taught by Jure Leskovec (my rotation advisor for next quarter). If you're interested in either of these two topics, follow those links, as an extensive set of course materials (including class projects and suggested readings) have been posted. For a mid-quarter assignment, I worked with Diana MacLean on a project related to visualizing social network patterns. In this project, we chose to examine methods for visualizing cross-posting behaviors of users of MedHelp, an Online Health Community (OHC). ]]></description>
			<content:encoded><![CDATA[<p>Well, it&#8217;s been a long time since I&#8217;ve posted, which as you may have surmised, was connected to the process of getting into, starting, and then staying afloat in school. However, I thought it might be interesting to start posting about what I&#8217;ve been learning (since I&#8217;ve been finding it interesting!). This quarter, I&#8217;ve been taking two classes: <a title="Stanford CS 448B - Home Page" href="https://graphics.stanford.edu/wikis/cs448b-10-fall" target="_blank">Data Visualization</a>, taught by <a title="Jeff Heer - Home Page" href="http://hci.stanford.edu/jheer/" target="_blank">Jeff Heer</a> (my rotation advisor for this quarter), and <a title="Stanford CS 224W - Home Page" href="http://www.stanford.edu/class/cs224w/" target="_blank">Social and Information Network Analysis</a>, taught by <a title="Jure Leskovec - Home Page" href="http://cs.stanford.edu/people/jure/" target="_blank">Jure Leskovec</a> (my rotation advisor for next quarter). If you&#8217;re interested in either of these two topics, follow those links, as an extensive set of course materials (including class projects and suggested readings) have been posted.</p>
<p>For a mid-quarter assignment, I worked with <a title="LinkedIn - Diana MacLean" href="http://www.linkedin.com/in/dianlynnmaclean" target="_blank">Diana MacLean</a> on a project related to visualizing social network patterns. The goal of the project was to build an interactive visualization which would help support exploration of the data and help surface interesting patterns. We decided to build the project using <a title="Protovis Home" href="http://vis.stanford.edu/protovis/" target="_blank">Protovis</a>, a JavaScript-based graphical toolkit for data visualization built by Jeff and one of his students, <a title="Michael Bostock - Home Page" href="http://bost.ocks.org/mike/" target="_blank">Michael Bostock</a>. We chose Protovis (rather than another toolkit like Prefuse or Flare) primarily because it seemed like the easiest way to post the visualization on the web when we were done.</p>
<p>In this project, we chose to examine methods for visualizing cross-posting behaviors of users of <a href="https://graphics.stanford.edu/wikis/cs448b-10-fall/MedHelp">MedHelp</a>, an Online Health Community (OHC). Users suffering from various conditions come to these communities to seek advice, brainstorm about symptoms and treatments, or offer each other emotional support. An interesting pattern identified by Diana was the prevalence of users who post primarily in one community (such as &#8220;Asthma&#8221;) posting in a different, secondary community (such as &#8220;Fertility&#8221;). We were particularly interested in such behavior, which we labeled &#8220;cross-posting&#8221;. According to recent research, identifying patterns in cross-posting behavior could potentially help discover previously unknown, or untested, medical links between symptoms. In the example given above, a large degree of cross-posting between &#8220;Asthma&#8221; and &#8220;Fertility&#8221; could suggest a potential link between the two conditions. In our visualization, we represent the cross-posting data as a graph, where nodes correspond to communities on the website and directed edges (A -&gt; B) are weighted by the number of users who post primarily in community A who also post in community B. We utilized an adjacency matrix rather than a network layout because we felt that the edge-centric view would make it easier to spot trends and unusually strong or weak connections between communities.</p>
<p>The focus of the class was primarily on the design process, so if you are interested in how we came to our final version, you can check out our <a title="CS 448B - A3-KairamMacLean" href="https://graphics.stanford.edu/wikis/cs448b-10-fall/A3-KairamMacLean" target="_blank">project page</a> on the class website. Below, you can see a screenshot of what we produced. To try it out for yourself, click <a title="MedHelp Community Explorer" href="http://sanjaykairam.com/projects/medhelp/project.html" target="_blank">here</a>, and the page will open in a new window.</p>
<div id="attachment_253" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/11/screenshot1.png"><img class="size-full wp-image-253" title="MedHelp Community Explorer" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/11/screenshot1.png" alt="MedHelp Community Explorer" width="500" height="338" /></a><p class="wp-caption-text">Screenshot of our Interactive Visualization of Cross-Posting Behavior in the MedHelp Online Health Community Forums. (Note, the information section in the bottom right was added as a *.png and I&#39;m having difficulty locating it right now, which is why it&#39;s missing in the version linked to.)</p></div>
<p>The top, left panel shows an  adjacency matrix in which hue encodes the <em>value</em> of the cell  (that is, the percentage of members whose primary community is ROW  community, who posted in COLUMN community). Several mouseover and click  techniques are enabled on the matrix. In  the top right panel is a tag cloud of the most frequent words in the  cross posts corresponding to the active cell. The words are stemmed for  more accurate counting, although this occasionally makes them tricky to  decipher. Tag clouds are generated (force layout diagram on a set of  unconnected nodes, sized according to normalized word frequency) upon  first mouseover; subsequent mouse-overs render the previously generated  layout to avoid too much distracting motion. The  bottom left panel contains the color scale for the matrix hue values. A  highlight bar on the scale indicates the value of the active cell.  Superimposed on this scale is a histogram of value frequency, which  allows the user to note the commonness/scarcity of the active value in  question.</p>
<p>Clicking a cell changes the interaction state so that the row and column  of the current selected cell to be highlighted in red. (Note &#8211; you have  to click and move to another cell first before the highlighting  triggers.) This enables a quick comparison of the selected cell value  with all other values for a.) cross posting originating from ROW  community (to any COLUMN community), and b.) cross posts originating  from any ROW community to the selected COLUMN community. This way, users  can compare the significance of the selected cell value with all  similar (in the sense that they share either a source or an origin  community) values.</p>
<p>The project was an interesting foray into alternative techniques for visualizing interaction in a network. The idea of a network is so tied up in our minds with node-link diagrams, that it&#8217;s often hard to picture them any other way. Even if Data Visualization doesn&#8217;t become a large part of my later research, it&#8217;s extremely useful in helping generate different types of mental models for the same data. What other methods do you think might be more effective for the MedHelp data?</p>
<p><em>Note: Much of the description text is adapted from our project submission write-up, a large portion of which was done by Diana.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/11/visualizing-cross-posting-behavior-in-online-medical-communities/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Analyzing Responses to Likert Items</title>
		<link>http://www.sanjaykairam.com/blog/2010/06/analyzing-responses-to-likert-items/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/06/analyzing-responses-to-likert-items/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 22:43:30 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[credibility]]></category>
		<category><![CDATA[likert]]></category>
		<category><![CDATA[measurement]]></category>
		<category><![CDATA[parc]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[slideshare]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[wikidashboard]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=225</guid>
		<description><![CDATA[I'm embedding a presentation I gave at a recent "Data Lunch" about how to analyze responses to Likert items. As I am not a stats expert in any respect, I learned a number of things while putting this together - one of the most important is that Likert isn't actually pronounced "Like-ert", it's pronounced "Lick-ert", which is still tough for me to remember to say. Anyways, hope you enjoy, I'll include some summary below as well.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m embedding a presentation I gave at a recent &#8220;Data Lunch&#8221; about how to analyze responses to Likert items. As I am not a stats expert in any respect, I learned a number of things while putting this together &#8211; one of the most important is that Likert isn&#8217;t actually pronounced &#8220;Like-ert&#8221;, <a title="Wikipedia - Likert Scale #Pronounciation" href="http://en.wikipedia.org/wiki/Likert_scale#Pronunciation" target="_blank">it&#8217;s pronounced &#8220;Lick-ert&#8221;</a>, which is still tough for me to remember to say. Anyways, hope you enjoy, I&#8217;ll include some summary below as well.</p>
<div id="__ss_4456985" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Analyzing Responses to Likert Items" href="http://www.slideshare.net/skairam/likert-analysis-blogpost">Analyzing Responses to Likert Items</a></strong><object id="__sse4456985" width="425" height="355" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=likertanalysis-blogpost-100609172740-phpapp02&amp;stripped_title=likert-analysis-blogpost" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><embed id="__sse4456985" width="425" height="355" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=likertanalysis-blogpost-100609172740-phpapp02&amp;stripped_title=likert-analysis-blogpost" allowFullScreen="true" allowScriptAccess="always" allowfullscreen="true" allowscriptaccess="always" /></object></p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/skairam">Sanjay Kairam</a>.</div>
</div>
<p>Here are some brief notes on the presentation (to avoid the inevitable TL;DR comments):</p>
<ul>
<li>Data used was from a study I ran on Mechanical Turk looking at whether the tool <a title="WikiDashboard - Home" href="http://wikidashboard.parc.com" target="_blank">WikiDashboard</a> helps people to make different judgments about the credibility of Wikipedia articles.</li>
<li>Participants placed in 1 of 3 conditions: (<strong>WO</strong> = Wiki Only, <strong>WH</strong> = Wiki + the History Page, <strong>WD</strong> = Wiki + WikiDashboard)</li>
<li>Articles varied with respect to presumed quality and presumed controversy.</li>
<li>Using non-parametric tests was fairly straightforward, but none were all that powerful (able to help find interaction effects &#8211; one main hope of the study would be to find an interaction between <strong>group</strong> and <strong>quality</strong>).</li>
</ul>
<p>Anyways, this presentation is not supposed to be an expert statistics guide &#8211; rather, it represents the results of my research in trying to solve this problem (again, I&#8217;m very much not a statistics expert). There are surely many other ways to address the problem, and I would appreciate hearing from others who have tried attacking Likert items for their studies. I am continuing to analyze the data and may post some results in the near future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/06/analyzing-responses-to-likert-items/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>One Habit of Highly Successful Mathematicians</title>
		<link>http://www.sanjaykairam.com/blog/2010/05/one-habit-of-highly-successful-mathematicians/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/05/one-habit-of-highly-successful-mathematicians/#comments</comments>
		<pubDate>Tue, 25 May 2010 10:00:37 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[/Me]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Barabási]]></category>
		<category><![CDATA[Bursts]]></category>
		<category><![CDATA[mathematicians]]></category>
		<category><![CDATA[Poisson]]></category>
		<category><![CDATA[productivity]]></category>
		<category><![CDATA[reading]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=215</guid>
		<description><![CDATA[I'm currently reading Albert-László Barabási's second book, Bursts. Though the book is primarily about predicting human behavior in the future, the book is peppered with interesting anecdotes about historical figures (i.e. from the past). One such figure mentioned prominently is Siméon-Denis Poisson, the 19th-century French mathematician. A element which may seem trivial out of context but is rather crucial in the book is Barabási's description of Poisson's organizational habits (a sort of 19th-century French GTD):]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently reading <a title="Barabasi - Home Page" href="http://www.nd.edu/~alb/" target="_blank">Albert-László Barabási</a>&#8216;s second book, <a title="Amazon Books - Bursts" href="http://www.amazon.com/Bursts-Hidden-Pattern-Everything-Hardcover/dp/B003K05XQS/ref=sr_1_8?ie=UTF8&amp;s=books&amp;qid=1274746149&amp;sr=1-8" target="_blank"><em>Bursts</em></a>. Though the book is primarily about predicting human behavior in the future, the book is peppered with interesting anecdotes about historical figures (i.e. from the past). One such figure mentioned prominently is <a title="Wikipedia - Simeon-Denis Poisson" href="http://en.wikipedia.org/wiki/Sim%C3%A9on_Denis_Poisson" target="_blank">Siméon-Denis Poisson</a>, the 19th-century French mathematician. A element which may seem trivial out of context but is rather crucial in the book is Barabási&#8217;s description of Poisson&#8217;s organizational habits (a sort of 19th-century French GTD):</p>
<blockquote><p>Poisson distribution. Poisson process. Poisson equation. Poisson kernel. Poisson regression. Poisson summation formula. Poisson&#8217;s spot. Poisson&#8217;s ratio. Poisson bracket. Euler-Poisson-Darboux equation. This is only a partial list, and yet it shows the degree to which Siméon-Denis Poisson&#8217;s work has impacted just about all branches of science. But what is so impressive is not the volume of his contributions but rather their depth, raising a puzzling question: How did Poisson manage to work simultaneously on so many quite different problems and yet stay sufficiently focused to offer deep and lasting contributions?</p>
<p>Well, we had a secret: a notebook and a tiny habit.</p>
<p>Each time Poisson encountered a problem he though fascinating, he would resist the temptation to savor it. He pulled out his notebook instead and made a note of it and promptly returned to the problem that had absorbed him before the interruption. Once he solved the problem at hand, he mulled over the list of problems scribbled in his notebook, then picking as his next challenge the one he found the most interesting.</p>
<p>Poisson&#8217;s little secret was lifelong, careful prioritizing.</p></blockquote>
<p>So, this essentially describes the polar opposite of my work habits, which currently consist of frenetically switching from task to task to ensure that I complete none of them. I&#8217;m thinking of giving the priority list a try &#8211; has anybody tried a scheme like this and had success with it? Would be curious to hear your story!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/05/one-habit-of-highly-successful-mathematicians/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>PARC Forum: How Wikimedia is Scaling Open-Source Innovation</title>
		<link>http://www.sanjaykairam.com/blog/2010/05/parc-forum-how-wikimedia-is-scaling-open-source-innovation/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/05/parc-forum-how-wikimedia-is-scaling-open-source-innovation/#comments</comments>
		<pubDate>Fri, 07 May 2010 21:09:48 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[PARC Forum]]></category>
		<category><![CDATA[social web]]></category>
		<category><![CDATA[web 2.0]]></category>
		<category><![CDATA[wikimedia]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=207</guid>
		<description><![CDATA[Yesterday, I attended a pretty interesting  PARC Forum where the speakers were three members of the Wikimedia Foundation. For those, that don't know, Wikipedia is actually part of a larger group of projects (including Wiktionary, Wikiquotes, Wikiversity, etc.) which are all under the umbrella of the Wikimedia foundation, but the talks primarily focused on Wikipedia and how the foundation leverages the community of editors and developers to help build the content and tools that make the site work. PARC will have the video up in a couple days if you want to watch, and you can find the presentation here, but I'm presenting a short summary of some of the interesting tidbits and points here, organized by speaker:]]></description>
			<content:encoded><![CDATA[<p>Yesterday, I attended a pretty interesting<a title="PARC Forum - How Wikimedia is Scaling Open-Source Innovation" href="http://www.parc.com/event/1108/how-wikimedia-is-scaling-open-source-innovation.html" target="_blank"> PARC Forum</a> where the speakers were three members of the <a title="Wikimedia" href="http://wikimedia.org/" target="_blank">Wikimedia Foundation</a>. For those, that don&#8217;t know, Wikipedia is actually part of a larger group of projects (including <a title="Wiktionary" href="http://wiktionary.org" target="_blank">Wiktionary</a>, <a title="Wikiquotes" href="http://wikiquotes.org" target="_blank">Wikiquotes</a>, <a title="Wikiversity" href="http://wikiversity.org" target="_blank">Wikiversity</a>, etc.) which are all under the umbrella of the Wikimedia foundation, but the talks primarily focused on Wikipedia and how the foundation leverages the community of editors and developers to help build the content and tools that make the site work. PARC will have the video up in a couple days if you want to watch, and you can find the presentation here, but I&#8217;m presenting a short brain-dump of some of the interesting tidbits and points here, organized by speaker:</p>
<p><strong>Eugene Eric Kim: Strategy Program Manager</strong></p>
<ul>
<li>If you include all of the component sites, Wikimedia is the 5th most accessed web-property in the world.</li>
<li>350M regular visitors, $10M in revenue, and only 35 employees.</li>
<li>45K active contributors (a term they use to indicate people who make 5 or more edits per month) on English Wikipedia.</li>
<li>The country with the most visitors is actually Canada (which nobody in the audience guessed).</li>
<li>Defined the Wikimedia Foundation mission with a Jimmy Wales quote: &#8220;<strong><em>Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge.</em>&#8220;</strong></li>
</ul>
<p><strong>Trevor Parscal: Lead Front-End, UX Programs</strong></p>
<ul>
<li>Trevor is the guy in charge of &#8220;basically everything you see&#8221; (wow!)</li>
<li>Wikimedia research shows that people don&#8217;t find the software easy to use (duh), so they have launched the <a title="Wikimedia Usability Initiative" href="http://usability.wikimedia.org/" target="_blank">Usability Initiative</a>.</li>
<li>In fact, when they were testing with users, they had one user who took 20 minutes to figure out how to edit a page (and this wasn&#8217;t entirely out of the ordinary).</li>
<li>Asking people what they wanted in the site proved not-so-successful, but having them try out a new Beta version and observing behavior was really fruitful.</li>
<li>As of now, 84% of the people who opted into the Wikipedia Beta have stayed (almost 300K) people &#8211; (there was no mention of how to find the beta, btw).</li>
</ul>
<p><strong>Tomasz Finc: Engineering Program Manager</strong></p>
<ul>
<li>Fundraising is done annually, between November and January.</li>
<li>Amount raised: 2006 &#8211; $1m, 2007 &#8211; $2M, 2008 &#8211; $6M, and 2009 &#8211; $8.1M</li>
<li>Most of their fundraising comes from small donations (contrary to usual trend of large donations for these types of efforts)</li>
<li>Did a lot of A/B style testing to figure out how to optimize contribution &#8211; a lot of this is actually shared on the <a title="Wikimedia Blog" href="http://blog.wikimedia.org/" target="_blank">Wikimedia Blog.</a></li>
<li>Adding Jimmy Wales&#8217; plea increased the donations a LOT (so much that at first they thought the site was being attacked).</li>
<li>The iPhone application and mobile gateway are both being developed by the community.</li>
<li>The OLPC now has a full copy of the English Wikipedia on it.</li>
</ul>
<p>As you can see, the talks basically focused on three elements: 1) Wikipedia is big and wants to get bigger, 2) Wikipedia is hard to use and wants to get easier, 3) Wikipedia relies a lot on the community. While there wasn&#8217;t much that was earth-shattering, each of these elements was pretty interesting &#8211; the idea that such a HUGE platform and vast amount of content can be supported by just 35 full-time employees and the contributions of the community is incredible, and speaks to the power that effective community management can bring. As Wikipedia is one of the greatest examples of social software and content production, it was great to get the opportunity to peer under the hood a little bit.</p>
<p>For some more information that may not have made it into this brain-dump, check out my live-tweet of the event <a title="Twitter Search - @skairam / #PARCForum" href="http://search.twitter.com/search?q=&amp;ands=%23PARCForum&amp;phrase=&amp;ors=&amp;nots=&amp;tag=&amp;lang=all&amp;from=skairam&amp;to=&amp;ref=&amp;near=&amp;within=15&amp;units=mi&amp;since=&amp;until=&amp;rpp=10" target="_blank">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/05/parc-forum-how-wikimedia-is-scaling-open-source-innovation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can You Spot The Experts? Tagging and Expertise</title>
		<link>http://www.sanjaykairam.com/blog/2010/03/can-you-spot-the-experts-tagging-and-expertise/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/03/can-you-spot-the-experts-tagging-and-expertise/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 16:00:53 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[delicious]]></category>
		<category><![CDATA[domain expertise]]></category>
		<category><![CDATA[expertise]]></category>
		<category><![CDATA[mechanical turk]]></category>
		<category><![CDATA[mturk]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[tag]]></category>
		<category><![CDATA[tagging]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=166</guid>
		<description><![CDATA[I decided to try a little Mechanical Turk study to see if I could spot some differences between tags generated by experts and those generated by novices. I had each Turker read 1 of 5 web pages (on the topic of "enterprise 2.0 mashups") and enter 5 tags which they thought would be useful for bookmarking the page (either for themselves or others). I also asked them to rate how familiar they were with the subject matter ("Not at All", "Slightly Familiar", "Somewhat Familiar", and "I am an Expert")...]]></description>
			<content:encoded><![CDATA[<p>Recently, I&#8217;ve been reading some papers about identifying and harnessing expertise in tagging communities such as <a href="http://www.delicious.com">Delicious</a>&#8211;some of the research that I have come across have looked at topics such as:</p>
<ul>
<li>Identifying the features that underlie &#8220;tag quality&#8221; (e.g. <a href="www.grouplens.org/system/files/group07-sen.pdf">Sen, et al. (2007)</a>, <a href="portal.acm.org/ft_gateway.cfm?id=1531676&amp;type=pdf">Zhang, et al. (2009)</a>)</li>
<li>Topic-based approaches for information retrieval from tagged collections (e.g. <a href="www.cse.psu.edu/~dzhou/papers/www08-tags.pdf">Zhou, et al. (2008)</a>)</li>
<li>Graph-based algorithms for ranking based on user tags (e.g. <a href="www.kde.cs.uni-kassel.de/hotho/pub/.../seach2006hotho_eswc.pdf">Hotho, et al. (2006)</a>, <a href="www.michael-noll.com/.../telling-experts-from-spammers-expertise-ranking-in-folksonomies/">Noll, et al. (2009)</a>)</li>
</ul>
<p>I decided to try a little Mechanical Turk study to see if I could spot some differences between tags generated by experts and those generated by novices. I had each Turker read 1 of 5 web pages (on the topic of &#8220;enterprise 2.0 mashups&#8221;) and enter 5 tags which they thought would be useful for bookmarking the page (either for themselves or others). I also asked them to rate how familiar they were with the subject matter (&#8220;Not at All&#8221;, &#8220;Slightly Familiar&#8221;, &#8220;Somewhat Familiar&#8221;, and &#8220;I am an Expert&#8221;).</p>
<p>As a game, I thought it would be interesting to post some of the responses to see how easy it was to identify which tags were generated by people who rated themselves as &#8220;experts&#8221; vs. &#8220;non-experts&#8221;. I took all of the tags generated by each expertise group, cleaned them up for minor spelling mistakes and typos (e.g., &#8220;applciation&#8221; &gt; &#8220;application&#8221;) and generated a tag cloud using <a href="http://www.wordle.net/">Wordle</a>, where the tag size corresponds to the frequency of use of that word (all other factors, such as positioning and color, are purely stylistic).</p>
<p>For the following URL &#8211; <a href="http://www.soamag.com/I18/0508-1.php">http://www.soamag.com/I18/0508-1.php</a> &#8211; can you identify which tag cloud belongs to which of these groups: &#8220;Not at All (Familiar)&#8221;, &#8220;Slightly Familiar&#8221;, and &#8220;Somewhat Familiar&#8221; (there was a 4th category of &#8220;I am an Expert&#8221;, but nobody rating this URL classified themselves this way):</p>
<div id="attachment_167" class="wp-caption aligncenter" style="width: 460px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise1.jpg"><img class="size-medium wp-image-167" title="Tag Cloud 1" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise1-300x150.jpg" alt="Tag Cloud 1" width="450" height="224" /></a><p class="wp-caption-text">Tag Cloud 1 (N = 17)</p></div>
<div id="attachment_168" class="wp-caption aligncenter" style="width: 460px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise2.jpg"><img class="size-medium wp-image-168" title="Tag Cloud 2" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise2-300x95.jpg" alt="Tag Cloud 2" width="450" height="142" /></a><p class="wp-caption-text">Tag Cloud 2 (N = 16)</p></div>
<div id="attachment_170" class="wp-caption aligncenter" style="width: 460px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise0.jpg"><img class="size-medium wp-image-170" title="Tag Cloud 3" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise0-300x139.jpg" alt="Tag Cloud 3" width="450" height="207" /></a><p class="wp-caption-text">Tag Cloud 3 (N = 14)</p></div>
<p>If you have any idea which tag cloud is which, please feel free to post your guess in the comments! I&#8217;d be extremely curious to see why people guessed the way that they did. I am actually currently in the process of having some Turkers do the same thing; if you are curious about the answers, come back for my follow-up post where I post the correct answers, as well as the results of the Mechanical Turk evaluation of the tag cloud.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/03/can-you-spot-the-experts-tagging-and-expertise/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Madness of Crowds: From &#8220;Tulipomania&#8221; to the &#8220;Anti-Vax Movement&#8221;</title>
		<link>http://www.sanjaykairam.com/blog/2010/02/the-madness-of-crowds-from-tulipomania-to-the-anti-vax-movement/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/02/the-madness-of-crowds-from-tulipomania-to-the-anti-vax-movement/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 16:15:04 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[autism]]></category>
		<category><![CDATA[communal reinforcement]]></category>
		<category><![CDATA[crowds]]></category>
		<category><![CDATA[james mackay]]></category>
		<category><![CDATA[madness of crowds]]></category>
		<category><![CDATA[mmr]]></category>
		<category><![CDATA[social psychology]]></category>
		<category><![CDATA[vaccines]]></category>
		<category><![CDATA[wisdom of crowds]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=150</guid>
		<description><![CDATA[I've been interested a bit in looking at how to help people find high-quality information on the web - recently, I have been exploring how to help people make better credibility judgments about the information they find. One paper I was reading, "Statement Map: Assisting Information Credibility Analysis by Visualizing Arguments" by Koji Murakami and others at the Nara Institute of Science and Technology in Japan, uses as a motivating example the recent movement against vaccinations for children, specifically the MMR (Measles, Mumps, and Rubella), as the result of fears that these vaccines could cause autism.]]></description>
			<content:encoded><![CDATA[<blockquote><p>THE OBJECT OF THE AUTHOR in the following pages has been to collect the most remarkable instances of those moral epidemics which have been excited, sometimes by one cause and sometimes by another, and to show how easily the masses have been led astray, and how imitative and gregarious men are, even in their infatuations and crimes.</p></blockquote>
<p style="text-align: right;">Charles Mackay &#8211; <em>Memoirs of Extraordinary Popular Delusions and the Madness of the Crowds</em></p>
<p><em> </em></p>
<div class="wp-caption aligncenter" style="width: 312px"><em><em><a href="http://www.flickr.com/photos/marcosreis07/3484788024/"><img title="Tulipomania!" src="http://farm4.static.flickr.com/3641/3484788024_d10edf7040_m.jpg" alt="Tulip Image" width="302" height="201" /></a></em></em><p class="wp-caption-text">Image courtesy of Marcos Vasconcelos (click for original)</p></div>
<p><em> </em>I&#8217;ve been interested a bit in looking at how to help people find high-quality information on the web &#8211; recently, I have been exploring how to help people make better credibility judgments about the information they find. One paper I was reading, &#8220;<a title="Statement Map (pdf)" href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=5&amp;ved=0CBgQFjAE&amp;url=http%3A%2F%2Fcl.naist.jp%2F~inui%2Fpapers%2F0904WICOW-Murakami.pdf&amp;ei=lDB7S73MI4vQtgOErszLCA&amp;usg=AFQjCNGhzNeyLyziT-nufIQ9HwSVLueHYg&amp;sig2=l3qvPn13pefY0RgFgnkH_g">Statement Map: Assisting Information Credibility Analysis by Visualizing Arguments</a>&#8221; by Koji Murakami and others at the <a title="NAIST - Home" href="http://www.naist.jp/index_e.html" target="_blank">Nara Institute of Science and Technology</a> in Japan, uses as a motivating example the recent movement against vaccinations for children, specifically the MMR (Measles, Mumps, and Rubella), as the result of fears that these vaccines could cause autism.</p>
<p>Back in 2003, <a title="Pew - Internet Health Resources" href="http://www.pewinternet.org/Reports/2003/Internet-Health-Resources.aspx" target="_blank">Pew reported</a> that over 80% of Internet users have searched for health information (such as info about fitness or vaccinations) online, and one can imagine that this number has only grown since then, so this example is important in illustrating the potential impact of such health memes. I&#8217;ve heard multiple parents mention the supposed vaccination-autism link before when making decisions about whether or not to vaccinate their children, so it&#8217;s extremely important to figure out how to make sure these parents get intelligent, credible information when searching on the web. The Murakami paper provides some interesting background information on how this particular meme first started:</p>
<blockquote><p>In 1997, a group of researchers in the UK lead (sic) by Dr. Andrew Wakefield published a <a title="Lancet - Wakefield, et al. (1998)" href="http://briandeer.com/mmr/lancet-paper.htm" target="_blank">study implying a causal connection between Measles, Mumps, and Rubella (MMR) vaccinations and the development of autism in children</a>. Though further scrutiny of these initial results  disproved the autism-vaccination link &#8211; culminating in the withdrawal of endorsements by 10 of the study&#8217;s 12 authors &#8211; the damage had already been done.</p></blockquote>
<p>The consequences of this single, spurious, study have already been far-reaching. The resulting backlash precipitated a drop in vaccination rates in the UK (where the study was first published), which has led to an <a title="The Independent - MMR Row" href="http://www.independent.co.uk/life-style/health-and-families/health-news/mmr-row-blamed-for-measles-outbreak-1547651.html" target="_blank">increase in outbreaks of measles over the past decade</a> to the point where <a title="Eurosurveillance - Measles once again endemic in the UK" href="http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=18919" target="_blank">measles are once again (after transmission was halted 14 years ago) being considered endemic</a>. Even vaccination rates here in the Bay Area have dropped, with the <a title="Examiner" href="http://www.examiner.com/x-4079-SF-Sexual-Health-Examiner~y2009m11d3-High-Rates-of-MMR-Vaccine-Refusal-in-Bay-Area-Increases-the-Risk-of-BIrth-Defects" target="_blank">Examiner reporting that vaccination rates are as low as 50% for some Bay Area schools</a>.</p>
<p>I suppose a lot of this can be chalked up to mainstream media coverage of the original study (as well as coverage of well-meaning, but misguided celebrity activists like <a title="Time Magazine - Jenny McCarthy on Autism and Vaccines" href="http://www.time.com/time/health/article/0,8599,1888718,00.html" target="_blank">Jenny McCarthy</a>). However, a large part of why this meme has continued even after the original study was shown to be dubious is due to social phenomena such as <a title="Wikipedia - Communal Reinforcement" href="http://en.wikipedia.org/wiki/Communal_reinforcement" target="_blank">communal reinforcement</a> (or the &#8220;millions of people can&#8217;t be wrong&#8221; phenomenon) that can occur so easily on the web. Because it is so easy to publish information on the web, and because information published tends to persist, it is easy to find a wealth of documents supporting any viewpoint, no matter how much evidence there actually is to support that claim. In this case, one can read <a title="Vaccination Liberation - Home" href="http://www.vaclib.org/links/vaxlinks.htm" target="_blank">100 different news articles, blog posts, and other online resources</a> based on the Wakefield, et al. study without knowing first that these stories do not corroborate each other (as they are drawn from the same small, <a title="Times Oniine - MMR doctor given legal aid thousands" href="http://www.timesonline.co.uk/tol/news/uk/article1265373.ece" target="_blank">possibly falsified</a> study) and second that the <a title="Lancet - Retraction" href="http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2810%2960175-7/fulltext" target="_blank">original study was actually recently retracted by the journal</a> in the first place.</p>
<p>The way that these stories snowball and take on a life of their own is something that Charles Mackay documented in his book &#8220;<a title="James Mackay - Madness of the Crowds - Fulltext" href="http://www.econlib.org/library/Mackay/macExCover.html" target="_blank">Extraordinary Popular Delusions and the Madness of the Crowds</a>&#8221; (the first lines of which are quoted above). He tackles a variety of subjects ranging from the <a title="Wikipedia - Tulip Mania" href="http://en.wikipedia.org/wiki/Tulip_mania" target="_blank">Dutch tulip craze</a> of the 16th century (cf. &#8220;<a title="Wikipedia - Subprime Mortgage Crisis" href="http://en.wikipedia.org/wiki/Subprime_mortgage_crisis" target="_blank">21st century housing bubble</a>&#8220;) to alchemy to witch hunts. From this book (written in 1841), we can see that the often reasonable shortcuts that people make when processing new information can sometimes lead to these self-propagating effects which take on a life of their own.</p>
<p>The unfortunate fact is that just because the web gives us access to more information doesn&#8217;t guarantee that we are going to choose and use it wisely. This is why building tools to help people make better credibility judgments online is so important, raising two questions:</p>
<ul>
<li>How do we extract data from within a single web page to help people make better judgments about the information it contains? I know that there is a good deal of work on this topic in Wikipedia with tools like <a title="Wikiscanner - Home" href="http://wikiscanner.virgil.gr/">Wikipedia Scanner</a> and PARC&#8217;s <a title="PARC - WikiDashboard" href="http://wikidashboard.parc.com/" target="_blank">WikiDashboard</a> helping to to expose author and change information, but how can we bring tools like these to the web as a whole?</li>
<li>How do we connect data across web pages to hep propagate changes in information across the web? As an example, if information about the study&#8217;s retraction could be propagated to pages reporting on the study, parents reading those pages would be less likely to be led astray, possibly saving lives.</li>
</ul>
<p>For those who are specifically interested in the MMR vaccine controversy, the <a title="Wikipedia - MMR Vaccine Controversy" href="http://en.wikipedia.org/wiki/MMR_vaccine_controversy" target="_blank">Wikipedia page</a> links to a lot of good resources, including a long list of studies conducted in the last decade which show no link between autism and the vaccine.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/02/the-madness-of-crowds-from-tulipomania-to-the-anti-vax-movement/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Rise of GoogVark</title>
		<link>http://www.sanjaykairam.com/blog/2010/02/the-rise-of-googvark/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/02/the-rise-of-googvark/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 23:33:21 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[aardvark]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[buzz]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[social search]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=143</guid>
		<description><![CDATA[So, in a seemingly inevitable, but nonetheless surprising move, Google has purchased Aardvark for $50 million. My last blog post was about Aardvark's recent paper describing their social search engine, which included allusions to the research paper which was responsible for the creation of Google, so the announcement seems timely.]]></description>
			<content:encoded><![CDATA[<p>So, in a seemingly inevitable, but nonetheless surprising move, <a title="TechCrunch" href="http://techcrunch.com/2010/02/11/google-acquires-aardvark-for-50-million/" target="_blank">Google has purchased Aardvark for $50 million</a>. My <a title="Sanjay Kairam - Commons Sense" href="http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/">last blog post</a> was about Aardvark&#8217;s recent <a title="Aardvark Blog" href="http://blog.vark.com/?p=352" target="_blank">paper</a> describing their social search engine, which included allusions to the research paper which was <a title="Stanford InfoLab - PageRank" href="http://infolab.stanford.edu/~backrub/google.html">responsible for the creation of Google</a>, so the announcement seems timely.</p>
<p>Given Google&#8217;s recent social efforts (Twitter Search, Social Search, Google Buzz, etc.), I am curious to see what they will do with the Aardvark product &#8211; will it stand alone as it has or will it find its way into existing or new Google tools? I, for one, would love to see it integrated into Google&#8217;s main search. One consequence of <a title="Google Blog - Introducing Google Buzz" href="http://googleblog.blogspot.com/2010/02/introducing-google-buzz.html" target="_blank">Google&#8217;s recent launch of Buzz</a> is reminding people that Google has been collecting data on your social network for a while now. If Aardvark were integrated into your Google network, we&#8217;d have a out-of-the-box solution for social search (no messy profile-connecting or friend-inviting needed! To me, it seems like one of the biggest hurdles for most people in terms of social search or networking tools is the cost of building up their networks, so this would provide a quick and easy way around that.</p>
<p>What will GoogVark look like? I can&#8217;t say I&#8217;ve ever used the &#8220;I&#8217;m Feeling Lucky&#8221; button, but I do know that there are times when I can&#8217;t quite find the best answers through Google search, and I&#8217;d love to be able to seamlessly shift over to social search. I personally would love to see something like this (with an example supplied by Google itself!):</p>
<a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/googvark.jpg"><img class="size-full wp-image-144" title="GoogVark" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/googvark.jpg" alt="GoogVark" width="449" height="247" /></a>
<p>Would this make social search more inviting to you?</p>
<p><em>P.S. Congratulations to Aardvark&#8217;s founders over at The Mechanical Zoo &#8211; you guys deserve it!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/02/the-rise-of-googvark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

