<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sanjay Kairam &#187; research</title>
	<atom:link href="http://www.sanjaykairam.com/blog/tag/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sanjaykairam.com/blog</link>
	<description>Graduate Student &#38; Armchair Philosopher</description>
	<lastBuildDate>Thu, 19 Jan 2012 23:09:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>On Grad School, Creativity, and &#8220;Honoring Your Vomit&#8221;</title>
		<link>http://www.sanjaykairam.com/blog/2011/04/grad-school-creativity-and-honoring-your-vomit/</link>
		<comments>http://www.sanjaykairam.com/blog/2011/04/grad-school-creativity-and-honoring-your-vomit/#comments</comments>
		<pubDate>Thu, 28 Apr 2011 18:29:02 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[/Me]]></category>
		<category><![CDATA[/Meaning]]></category>
		<category><![CDATA[/Meta]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[creativity]]></category>
		<category><![CDATA[expertise]]></category>
		<category><![CDATA[ira glass]]></category>
		<category><![CDATA[keith sawyer]]></category>
		<category><![CDATA[lady gaga]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=308</guid>
		<description><![CDATA[Back when I was just starting graduate school, I remember already feeling as if I understood the components needed for great scientific research: knowledge of a domain, the ability to implement a system or execute an experiment, and a creative insight about a phenomenon worth studying. While the domain knowledge and ability to execute seemed like pre-requisites for doing science at all, the capacity for creativity seemed to the element that separated a great scientist from the good. Since I felt like I was good at identifying creative research, I hoped that once I immersed myself in academia and started gaining domain knowledge and engineering skill, the creative ideas would come to me. Now, almost a year into my PhD program, I feel like I have learned a great deal, but I am left with the question: Where are all those good ideas?]]></description>
			<content:encoded><![CDATA[<p>Back when I was just starting graduate school, I remember feeling as if I already understood the components needed for great scientific research: knowledge of a domain, the ability to implement a system or execute an experiment, and a creative insight about a phenomenon worth studying. While the domain knowledge and ability to execute seemed like pre-requisites for doing science at all, the capacity for creativity seemed to the element that separated a great scientist from the good. Since I felt like I was good at identifying creative research, I hoped that once I immersed myself in academia and started gaining domain knowledge and engineering skill, the creative ideas would come to me. Now, almost a year into my PhD program, I feel like I have learned a great deal, but I am left with the question: Where are all those good ideas?</p>
<p>Now, don&#8217;t get me wrong &#8211; I know that I have a long way left to go until people start calling me Dr. Kairam. <a href="http://www.psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html" target="_blank">At least for piano players, Ericsson theorized that 10,000 hours was the required amount of time to gain expertise</a>, and I had always figured that PhD programs were around 5 years long for that very reason (40 hours/week * 50 weeks/year * 5 years = 10,000 hours, though it seems that some of us may become &#8216;double-experts&#8217; by the time we&#8217;re done!). However, we&#8217;re also expected to complete some great research before we&#8217;ve finished the program; while I&#8217;ve done some research so far that I think is pretty good, I don&#8217;t think I&#8217;ve had any insights yet that I would consider &#8216;great&#8217;. As a result, it&#8217;s become difficult to shake the nagging doubt that perhaps I won&#8217;t get there.</p>
<p>Just as I was beginning to hit a low point, however, I came across this great video of radio host <a title="This American Life - Home" href="http://www.thisamericanlife.org/" target="_blank">Ira Glass</a>:</p>
<p><object width="425" height="349"><param name="movie" value="http://www.youtube.com/v/BI23U7U2aUY?fs=1&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="425" height="349" src="http://www.youtube.com/v/BI23U7U2aUY?fs=1&amp;hl=en_US" allowfullscreen="true" allowscriptaccess="always"></embed></object><br />
In case you don&#8217;t want to watch, he starts off by saying:</p>
<blockquote><p>&#8220;Nobody tells this to people who are beginners, and I really wish someone had told me&#8230;All of us who do creative work, we get into it because we have good taste&#8230;But there&#8217;s a gap &#8211; that for the first couple years you&#8217;re making stuff, what you&#8217;re making isn&#8217;t so good&#8230;it&#8217;s trying to be good, it has ambition to be good, but it&#8217;s not quite that good. But your taste, the thing that got you into the game&#8230;is still killer. And your taste is good enough that you can tell what you&#8217;re making is kind of a disappointment to you&#8230;A lot of people never get past this phase&#8230;they quit.&#8221;</p></blockquote>
<p>Inspired by this quote, I&#8217;ve decided to try and implement two policies to help foster my own creativity in research (as well as some other areas where I&#8217;m often creatively blocked, including songwriting and posting on this blog).</p>
<p><em><strong>1. Repetition, Repetition, Repetition</strong></em></p>
<p>Glass continues later in the video with the advice:</p>
<blockquote><p>&#8220;The most important possible thing you can do is do a lot of work. Do a huge volume of work. Put yourself on a deadline so that every week or every month you know you&#8217;re going to finish one story&#8230;because it&#8217;s only by going through a volume of work that you&#8217;re actually going to catch up and close that gap and the work you&#8217;re making will be as good as your ambitions.&#8221;</p></blockquote>
<p>Fostering creativity through repetition is evident in the insights gained from psychologist <a title="Keith Sawyer - About" href="http://keithsawyer.wordpress.com/about/" target="_blank">Keith Sawyer</a>&#8216;s interviews of winners of the <a title="New Yorker Caption Contest" href="http://www.newyorker.com/humor/caption" target="_blank">New Yorker cartoon caption contest</a>. According to his research, &#8220;the &#8216;sudden flash of insight&#8217; is largely a myth&#8221;; instead, creative ideas &#8216;emerge over time&#8217; through &#8216;hard work and constant revision&#8217;. Specifically, he says:</p>
<blockquote><p>&#8220;Cartoon contest winners usually generate lots of captions. Studies have shown that quantity breeds quality &#8211; what I call the <em>productivity theory</em>, because high productivity corresponds to high creativity. When the famous physicist Freeman Dyson was asked how to generate good ideas, he said, &#8216;Have a lot of ideas, then throw out the bad ones.&#8217; &#8220;</p></blockquote>
<p>An important element in following this advice is reminding myself that I don&#8217;t have to publish everything I produce. If a project fails but spurs new ideas and helps me gain necessary skills, then I should view it as a success. If a song or blog post never quite comes together, it may inspire something better down the line. The important thing is to rehearse the process of crafting an idea, executing it, and committing it to paper so that I get practice with the creative part of the process. Regarding the process itself, this brings me to my second point:</p>
<p><em><strong>2. Honor My Ideas</strong></em></p>
<p>I draw my inspiration for this second policy from Lady Gaga, an artist who I view to be consistently creative. Near the end of GagaVision, episode 43, she describes her creative process:<br />
<object width="560" height="349"><param name="movie" value="http://www.youtube.com/v/O6Gs6d1-Sew?fs=1&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="560" height="349" src="http://www.youtube.com/v/O6Gs6d1-Sew?fs=1&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object><br />
Transcribed:</p>
<blockquote><p>The creative process is approximately 15 minutes of vomiting my creative ideas&#8230;And then I spend days, weeks, months, years fine-tuning, but the idea is that you honor your vomit. You have to honor your vomit &#8211; you have to honor those 15 minutes.</p></blockquote>
<p>While it sounds silly (and a little gross), I found these thoughts to be very instructive. I think that while I often have ideas that are creative or &#8216;out-there&#8217;, my internal filter shuts them down before I ever get a chance to examine whether or not they are viable. By committing your ideas to paper as soon as you have them, you can circumvent this filtering process so that those ideas don&#8217;t get lost. As Dyson said above, having a lot of ideas is a first step towards having good ideas.</p>
<p>As I&#8217;ve been taking the Caltrain to Stanford more often these days (in no way motivated by my spotting a sign for $4.99/gallon gas last week), I&#8217;ve decided to implement a policy of spending each morning train ride just throwing ideas on paper. Whether it&#8217;s lyrics to a song, thoughts for a blog post, or ideas for research, by forcing myself to just &#8216;vomit up&#8217; whatever&#8217;s in my head, I am hoping that this deliberate practice at creativity will result in more ideas, and thus more good ideas, getting past my filter. In fact, that is actually how I put this blog post together, so let&#8217;s see if it keeps working.</p>
<p>If you try these or discover other methods for fostering your own creativity, share your experience in the comments!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2011/04/grad-school-creativity-and-honoring-your-vomit/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Analyzing Responses to Likert Items</title>
		<link>http://www.sanjaykairam.com/blog/2010/06/analyzing-responses-to-likert-items/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/06/analyzing-responses-to-likert-items/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 22:43:30 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[credibility]]></category>
		<category><![CDATA[likert]]></category>
		<category><![CDATA[measurement]]></category>
		<category><![CDATA[parc]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[slideshare]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[wikidashboard]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=225</guid>
		<description><![CDATA[I'm embedding a presentation I gave at a recent "Data Lunch" about how to analyze responses to Likert items. As I am not a stats expert in any respect, I learned a number of things while putting this together - one of the most important is that Likert isn't actually pronounced "Like-ert", it's pronounced "Lick-ert", which is still tough for me to remember to say. Anyways, hope you enjoy, I'll include some summary below as well.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m embedding a presentation I gave at a recent &#8220;Data Lunch&#8221; about how to analyze responses to Likert items. As I am not a stats expert in any respect, I learned a number of things while putting this together &#8211; one of the most important is that Likert isn&#8217;t actually pronounced &#8220;Like-ert&#8221;, <a title="Wikipedia - Likert Scale #Pronounciation" href="http://en.wikipedia.org/wiki/Likert_scale#Pronunciation" target="_blank">it&#8217;s pronounced &#8220;Lick-ert&#8221;</a>, which is still tough for me to remember to say. Anyways, hope you enjoy, I&#8217;ll include some summary below as well.</p>
<div id="__ss_4456985" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Analyzing Responses to Likert Items" href="http://www.slideshare.net/skairam/likert-analysis-blogpost">Analyzing Responses to Likert Items</a></strong><object id="__sse4456985" width="425" height="355" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=likertanalysis-blogpost-100609172740-phpapp02&amp;stripped_title=likert-analysis-blogpost" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><embed id="__sse4456985" width="425" height="355" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=likertanalysis-blogpost-100609172740-phpapp02&amp;stripped_title=likert-analysis-blogpost" allowFullScreen="true" allowScriptAccess="always" allowfullscreen="true" allowscriptaccess="always" /></object></p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/skairam">Sanjay Kairam</a>.</div>
</div>
<p>Here are some brief notes on the presentation (to avoid the inevitable TL;DR comments):</p>
<ul>
<li>Data used was from a study I ran on Mechanical Turk looking at whether the tool <a title="WikiDashboard - Home" href="http://wikidashboard.parc.com" target="_blank">WikiDashboard</a> helps people to make different judgments about the credibility of Wikipedia articles.</li>
<li>Participants placed in 1 of 3 conditions: (<strong>WO</strong> = Wiki Only, <strong>WH</strong> = Wiki + the History Page, <strong>WD</strong> = Wiki + WikiDashboard)</li>
<li>Articles varied with respect to presumed quality and presumed controversy.</li>
<li>Using non-parametric tests was fairly straightforward, but none were all that powerful (able to help find interaction effects &#8211; one main hope of the study would be to find an interaction between <strong>group</strong> and <strong>quality</strong>).</li>
</ul>
<p>Anyways, this presentation is not supposed to be an expert statistics guide &#8211; rather, it represents the results of my research in trying to solve this problem (again, I&#8217;m very much not a statistics expert). There are surely many other ways to address the problem, and I would appreciate hearing from others who have tried attacking Likert items for their studies. I am continuing to analyze the data and may post some results in the near future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/06/analyzing-responses-to-likert-items/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Social Networks, Health, and Youth</title>
		<link>http://www.sanjaykairam.com/blog/2010/05/social-networks-health-and-youth/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/05/social-networks-health-and-youth/#comments</comments>
		<pubDate>Fri, 21 May 2010 22:23:09 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Metareview]]></category>
		<category><![CDATA[adolescents]]></category>
		<category><![CDATA[alcohol]]></category>
		<category><![CDATA[behavior]]></category>
		<category><![CDATA[drug use]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[health behaviors]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[meta-review]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[smoking]]></category>
		<category><![CDATA[SNA]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[social network]]></category>
		<category><![CDATA[teens]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=211</guid>
		<description><![CDATA[I've been interested for a while now in how information and behavior can spread through social networks; an important sub-topic in this field is the spread of health behaviors. This area of study is especially important in understanding the behaviors of adolescents, as there are a number of unhealthy behaviors (ranging from drug use to unhealthy eating to unsafe sex practices) which start in adolescence, persist into adulthood, and contribute to some of the leading causes of death and disability.

As any parent or educator will likely tell you, the behavior of teens closely linked in a social network will often display many similarities: teens who smoke or drink, for instance, are often friends with other teens who smoke or drink. By establishing and tracking the spread of these behaviors scientifically, we can gain a greater understanding of the mechanisms at work and perhaps harness them to help spread healthy behaviors instead of unhealthy ones.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been interested for a while now in how information and behavior can spread through social networks; an important and timely sub-topic in this field is the spread of health behaviors. This area of study is especially important in understanding the behaviors of adolescents, as there are a number of unhealthy behaviors (ranging from drug use to unhealthy eating to unsafe sex practices) which start in adolescence, persist into adulthood, and contribute to some of the leading causes of death and disability. (See this <a title="CDC - Healthy Youth" href="http://www.cdc.gov/HealthyYouth/healthtopics/index.htm" target="_blank">CDC page on adolescent health behavior</a>)</p>
<p>As any parent or educator will likely tell you, the behavior of teens closely linked in a social network will often display many similarities: teens who smoke or drink, for instance, are often friends with other teens who smoke or drink. By establishing and tracking the spread of these behaviors scientifically, we can gain a greater understanding of the mechanisms at work and perhaps harness them to help spread healthy behaviors instead of unhealthy ones.</p>
<p>When we look at two teens who share a common behavior pattern (healthy or unhealthy), we must ask ourselves: Did they become friends because of their similar behavior (<em><strong>selection</strong></em>), did their behavior become similar as a result of being friends (<em><strong>influence</strong></em>), or was there some third factor at work which influenced them both separately (<em><strong>confounding factors</strong></em>)? One simple way to attempt to answer this question is through a longitudinal study, where data is collected for the same group of subjects at multiple times. By looking at the co-evolution of the social network and the behavior network, we can parse out the role that each of these factors plays. Here, I wanted to briefly discuss a few studies which have employed social network analysis and longitudinal data collection to gain a better understanding of how unhealthy behaviors can spread amongst teens.</p>
<p>The first is a study by Ennett and Bauman (<a title="PDF - Ennett and Bauman: Adolescent Social Networks: Friendship Cliques, Social Isolates, and Drug Use Risk" href="http://www.tanglewood.net/projects/teachertraining/Book_of_Readings/Ennett.pdf" target="_blank">PDF</a>) which examines smoking as a function of position in the social network.(1) Specifically, they name three classes of social network patterns: <strong>cliques</strong> (a small group of at least three adolescents whose primary friendships are with each other), <strong>liaisons</strong> (adolescents who maintain multiple friendships without being in a particular friendship clique), and <strong>isolates</strong> (adolescents who have relatively few friendships with others.) Past research showing that cliques tend to share smoking behaviors leads many people to the assumption that smoking is a primarily peer group phenomenon.  However, after looking at data from 1,092 students collected across 5 schools over 1 year (from the start of 9th grade to the start of 10th grade), the authors found that smoking was far more common among isolates (17-40% across schools) than among clique members (4-16%).  Additionally, within the 87 cliques identified, they found that smokers tended to associate in the same cliques, with the majority of cliques composed entirely or almost entirely of non-smokers. In looking at the roles played by influence and selection, they indicate that both processes contributed equally to similarity in smoking behaviors for clique members, though they do not discuss how they performed their analysis.</p>
<p>The next paper described a 2009 study by Mercken, et al. (<a title="PDF - Mercken, et al." href="http://stat.gamma.rug.nl/MerckenSnijdersSteglichVartiainenDeVries2009.pdf" target="_blank">PDF</a>) which utilizes a &#8220;stochastic actor-based model&#8221; to help in separating the roles of influence and selection &#8220;by simultaneously representing changes in friendship network structure and changes in smoking behavior among adolescents.&#8221; (2) In this study, they interviewed 1326 subjects from 11 Finnish schools 4 times over the course of 30 months (starting at the beginning of 7th grade). Each time, they asked about their friendship ties, their smoking behavior, that of their families, and their alcohol consumption. They found that adolescents who smoked more had a tendency to choose friends who likewise scored high on smoking behavior. Adolescents who smoke less than one cigarette per week were most likely to make friends with classmates who don&#8217;t smoke at all, while the most attractive potential friends for those who smoke one or more cigarettes per week were those who smoked at the highest rate. The authors did not report findings regarding the data collected on alcohol consumption, which hints at the fact that the patterns of spread may be different for different behaviors.</p>
<p>A natural question at this point is: are all friends created equal? Most of us growing up had a &#8220;best&#8221; friend in addition to our peer group. A 1997 study by Urberg, et al. (<a title="PDF - Urberg, et al." href="http://www.pitzer.edu/academics/faculty/banerjee/psyc109/readings/w10-CloseFriedGroupInfluence.PDF" target="_blank">PDF</a>) attempts to parse out the influence of close friends vs. that of one&#8217;s peer group as it pertains to both cigarette smoking and alcohol use.(3) In this study, they collected data from 1,028 Mid-western school-children in the 6th, 8th, and 10th grades; data was collected in two waves, once in the Fall and once in the Spring, and included assessments of friendship ties as well as cigarette and alcohol use. Interestingly, they found that it was smoking behavior of the peer group and not the close friend which predicted a transition into cigarette use, while it was the drinking behavior of the close friend and not the peer group which predicted a transition into alcohol use. They also found that those who have tried cigarettes or alcohol are more likely to know current users than those who have not (echoing the Mercken findings above).</p>
<p>Finally, because I couldn&#8217;t close a post on social networks and health without a Christakis/Fowler study, I wanted to mention a study from March of this year (2010) from Mednick, Christakis, and Fowler (<a title="PDF - Mednick, Christakis, and Fowler" href="http://christakis.med.harvard.edu/pdf/publications/articles/107.pdf" target="_blank">PDF</a>), which examined the interaction of two separate behaviors&#8211;low sleep and drug use&#8211;within a social network. (4) Looking at a sample of 8,349 adolescents from the <a title="ADD Health - Home" href="http://www.cpc.unc.edu/projects/addhealth" target="_blank">ADD Health Data Corpus</a>, they presented a number of interesting findings. First, they found that an individual&#8217;s behavior is correlated with the behavior of others in their network up to 4 degrees away. In the case of sleep, an individual was 29% more likely to sleep 7 hours or less if they had a friend who sleeps less than 7 hours; a friend of a friend correlated with a 17% increase, all the way down to a 5% increase for the friend of a friend of a friend of a friend. In the case of marijuana use, a direct connection to a user resulted in a 190% increase in the likelihood of use, while a 4th-degree connection still correlated with a 11% increase in use. Another interesting finding was that individuals central in the network were more likely to sleep less, with a two standard-deviation increase in centrality increasing the probability of sleeping 7 hours or less by 13% (controlling for other factors). Finally, they report on the interrelation between these behaviors, claiming that having a friend who slept 7 hours or less actually correlated with a 19% increase in smoking marijuana.</p>
<p>These 4 papers served as a useful introduction to both the methods of social network analysis and some of the interesting findings as they pertain to health behaviors and teens (across a number of behaviors &#8211; sleep, smoking, drinking, drugs). It is important for educators and health professionals to have an understanding of the social mechanisms as these will likely be a critical factor in preventing unhealthy behaviors from spreading amongst teens and persisting in their lives. Perhaps these same mechanisms can be used to spread positive behaviors such as exercise and civic-mindedness. In addition, it will be interesting to see how methods like these can be applied on a larger scale to Twitter or Facebook-sized social network corpora to track the spread of behaviors, ideas, diseases, and more across entire states or countries.</p>
<p>References:</p>
<ol>
<li>Ennett, S.T. and Bauman, K.E. (2000). Adolescent social networks: Friendship cliques, social isolates, and drug use risk. In Hansen, W.B., et al. (eds) Improving prevention effectiveness. Tanglewood Research, Inc. Greensboro, NC.</li>
<li>Mercken, L., et al. (2009). Dynamics of adolescent friendship networks and smoking behavior. <em>Social Networks</em>.</li>
<li>Urberg, K.A., Değirmencioğlu, S.M., and Pilgrim, C. (1997) Close Friend and Group Influence on Adolescent Cigarette Smoking and Alcohol Use. <em>Developmental Psychology</em>, vol 33(5), pp. 834-844.</li>
<li>Mednick, S.C., Christakis, N.A., Fowler J.H. (2010) The Spread of Sleep Loss Influences Drug Use in Adolescent Social Networks. <em>PLoS ONE</em> vol 5(3): e9775.</li>
</ol>
<p><em>Also, I want to thank Sarita Yardi and Vladimir Barash for directing me towards some of these papers.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/05/social-networks-health-and-youth/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can You Spot The Experts? Tagging and Expertise</title>
		<link>http://www.sanjaykairam.com/blog/2010/03/can-you-spot-the-experts-tagging-and-expertise/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/03/can-you-spot-the-experts-tagging-and-expertise/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 16:00:53 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[delicious]]></category>
		<category><![CDATA[domain expertise]]></category>
		<category><![CDATA[expertise]]></category>
		<category><![CDATA[mechanical turk]]></category>
		<category><![CDATA[mturk]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[tag]]></category>
		<category><![CDATA[tagging]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=166</guid>
		<description><![CDATA[I decided to try a little Mechanical Turk study to see if I could spot some differences between tags generated by experts and those generated by novices. I had each Turker read 1 of 5 web pages (on the topic of "enterprise 2.0 mashups") and enter 5 tags which they thought would be useful for bookmarking the page (either for themselves or others). I also asked them to rate how familiar they were with the subject matter ("Not at All", "Slightly Familiar", "Somewhat Familiar", and "I am an Expert")...]]></description>
			<content:encoded><![CDATA[<p>Recently, I&#8217;ve been reading some papers about identifying and harnessing expertise in tagging communities such as <a href="http://www.delicious.com">Delicious</a>&#8211;some of the research that I have come across have looked at topics such as:</p>
<ul>
<li>Identifying the features that underlie &#8220;tag quality&#8221; (e.g. <a href="www.grouplens.org/system/files/group07-sen.pdf">Sen, et al. (2007)</a>, <a href="portal.acm.org/ft_gateway.cfm?id=1531676&amp;type=pdf">Zhang, et al. (2009)</a>)</li>
<li>Topic-based approaches for information retrieval from tagged collections (e.g. <a href="www.cse.psu.edu/~dzhou/papers/www08-tags.pdf">Zhou, et al. (2008)</a>)</li>
<li>Graph-based algorithms for ranking based on user tags (e.g. <a href="www.kde.cs.uni-kassel.de/hotho/pub/.../seach2006hotho_eswc.pdf">Hotho, et al. (2006)</a>, <a href="www.michael-noll.com/.../telling-experts-from-spammers-expertise-ranking-in-folksonomies/">Noll, et al. (2009)</a>)</li>
</ul>
<p>I decided to try a little Mechanical Turk study to see if I could spot some differences between tags generated by experts and those generated by novices. I had each Turker read 1 of 5 web pages (on the topic of &#8220;enterprise 2.0 mashups&#8221;) and enter 5 tags which they thought would be useful for bookmarking the page (either for themselves or others). I also asked them to rate how familiar they were with the subject matter (&#8220;Not at All&#8221;, &#8220;Slightly Familiar&#8221;, &#8220;Somewhat Familiar&#8221;, and &#8220;I am an Expert&#8221;).</p>
<p>As a game, I thought it would be interesting to post some of the responses to see how easy it was to identify which tags were generated by people who rated themselves as &#8220;experts&#8221; vs. &#8220;non-experts&#8221;. I took all of the tags generated by each expertise group, cleaned them up for minor spelling mistakes and typos (e.g., &#8220;applciation&#8221; &gt; &#8220;application&#8221;) and generated a tag cloud using <a href="http://www.wordle.net/">Wordle</a>, where the tag size corresponds to the frequency of use of that word (all other factors, such as positioning and color, are purely stylistic).</p>
<p>For the following URL &#8211; <a href="http://www.soamag.com/I18/0508-1.php">http://www.soamag.com/I18/0508-1.php</a> &#8211; can you identify which tag cloud belongs to which of these groups: &#8220;Not at All (Familiar)&#8221;, &#8220;Slightly Familiar&#8221;, and &#8220;Somewhat Familiar&#8221; (there was a 4th category of &#8220;I am an Expert&#8221;, but nobody rating this URL classified themselves this way):</p>
<div id="attachment_167" class="wp-caption aligncenter" style="width: 460px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise1.jpg"><img class="size-medium wp-image-167" title="Tag Cloud 1" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise1-300x150.jpg" alt="Tag Cloud 1" width="450" height="224" /></a><p class="wp-caption-text">Tag Cloud 1 (N = 17)</p></div>
<div id="attachment_168" class="wp-caption aligncenter" style="width: 460px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise2.jpg"><img class="size-medium wp-image-168" title="Tag Cloud 2" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise2-300x95.jpg" alt="Tag Cloud 2" width="450" height="142" /></a><p class="wp-caption-text">Tag Cloud 2 (N = 16)</p></div>
<div id="attachment_170" class="wp-caption aligncenter" style="width: 460px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise0.jpg"><img class="size-medium wp-image-170" title="Tag Cloud 3" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/03/URL3-Expertise0-300x139.jpg" alt="Tag Cloud 3" width="450" height="207" /></a><p class="wp-caption-text">Tag Cloud 3 (N = 14)</p></div>
<p>If you have any idea which tag cloud is which, please feel free to post your guess in the comments! I&#8217;d be extremely curious to see why people guessed the way that they did. I am actually currently in the process of having some Turkers do the same thing; if you are curious about the answers, come back for my follow-up post where I post the correct answers, as well as the results of the Mechanical Turk evaluation of the tag cloud.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/03/can-you-spot-the-experts-tagging-and-expertise/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Anatomy of a Paper about a Large-Scale Social Search Engine</title>
		<link>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 21:43:22 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aardvark]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[PageRank]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[social search]]></category>
		<category><![CDATA[the mechanical zoo]]></category>
		<category><![CDATA[WWW]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=135</guid>
		<description><![CDATA[Earlier this week, the team at Aardvark unveiled a new paper "The Anatomy of a Large-Scale Social Search Engine" which will be presented in April at WWW 2010. Inspired by and patterned after "The Anatomy of a Large-Scale Hypertextual Web Search Engine", which describes the PageRank algorithm which drives Google's search ranking system (which as Aardvark's blog points out, was also presented at WWW 12 years ago). The paper, by Aardvark's Damon Horowitz and Stanford's Sep Kamvar, focuses mostly on the architecture of the Aardvark system, from the external representations with which users interact to the internal ranking algorithms on which the system runs. Below, I present a short summary of what they report, focusing on the elements I found most interesting.]]></description>
			<content:encoded><![CDATA[<p>Earlier this week, the team at Aardvark unveiled a new paper &#8220;<a title="Aardvark Blog - Anatomy of a Large-Scale Social Search Engine" href="http://blog.vark.com/?p=352" target="_blank">The Anatomy of a Large-Scale Social Search Engine</a>&#8221; which will be presented in April at <a title="WWW2010 - Home" href="http://www2010.org/www/" target="_blank">WWW 2010</a>. Inspired by and patterned after &#8220;<a title="Stanford InfoLab - Google" href="http://infolab.stanford.edu/~backrub/google.html">The Anatomy of a Large-Scale Hypertextual Web Search Engine</a>&#8220;, which describes the <a title="Wikipedia - PageRank" href="http://en.wikipedia.org/wiki/PageRank" target="_blank">PageRank</a> algorithm which drives Google&#8217;s search ranking system (which as Aardvark&#8217;s blog points out, was also presented at WWW 12 years ago).</p>
<p>The paper, by Aardvark&#8217;s Damon Horowitz and Stanford&#8217;s Sep Kamvar, focuses mostly on the architecture of the Aardvark system, from the external representations with which users interact to the internal ranking algorithms on which the system runs. Below, I present a short summary of what they report, focusing on the elements I found most interesting:</p>
<p><strong>The Basic Model</strong>: Aardvark&#8217;s scoring function is similar to PageRank in that both utilize two primary, but somewhat independently considered components: <em>relevance</em> and <em>quality</em>.</p>
<ul>
<li><em>Relevance</em> in the Aardvark model pertains to the probability that a particular user <em>i</em> can answer the given question <em>q</em> based on the identified topics contained in <em>t</em>.</li>
<li><em>Quality</em> in the Aardvark model pertains to the overall probability that a user <em>i</em> can return a satisfactory answer to another user <em>j</em>, regardless of the question.</li>
</ul>
<p><strong>Indexing Topics:</strong> Aardvark computes the relevance score by calculating a distribution of knowledge over topics known by the user using the following sources (keyword-y sounding italicized terms are for convenience only and are not used in the paper):</p>
<ul>
<li><em>Explicit Prompting</em> at sign-up for three &#8220;starter&#8221; topics about which the user has expertise.</li>
<li><em>Social Prompting</em> of a user&#8217;s friends to provide topics about which they trust the user&#8217;s opinion.</li>
<li><em>Structured Parsing</em> of the online profile pages connected to Aardvark by the user (e.g. &#8220;Interests&#8221; on a Facebook profile).</li>
<li><em>Unstructured Parsing</em> of the users&#8217; online homepage, blog, or status updates using a linear SVM to extract overall subject area and a named entity extractor to extract more specific topics.</li>
</ul>
<p><strong>Indexing Connections:</strong> Aardvark computes the quality score by building a set of weighted connections between users using characteristics ranging from social proximity to similarities in demographics or behavior, such as:</p>
<ul>
<li><em>Social Connections</em> either in the form of explicitly defined &#8220;friend&#8221; connections or implicit &#8220;network&#8221; connections, such as both being part of the Stanford network.</li>
<li><em>Demographic Similarity</em>, which likely includes age, gender, and location based on profile information collected by Aardvark.</li>
<li><em>Profile Similarity</em>, which seems to include similar movies and other items which might be listed on other profiles, such as Facebook.</li>
<li><em>Vocabulary Match</em>, which they explain with the example of &#8220;IM Shortcuts&#8221; (i.e. I assume this means it is based on the language you use to interact with Aardvark, but I am unsure.)</li>
<li><em>Chattiness and Verbosity Match</em>, which relate to frequency and length of messages used when interacting with Aardvark.</li>
<li><em>Politeness Match</em>, which basically seems to mean whether or not say &#8220;Thanks!&#8221; or not.</li>
<li><em>Speed Match</em>, which is a measure of responsiveness to other users.</li>
</ul>
<p><strong>Analyzing Questions:</strong> While all of the other components are pre-computed, this part is computed at question time (obviously). The utilize a number of classifiers to classify the question and then a set of mappers to map the question to a set of topics, noting that &#8220;the role of the Question Analyzer&#8230;is simply to learn enough about the qeustion that it may be sent to appropriately interested and knowledgeable human answerers&#8221;. Here are the classifiers they list (with the names used in the paper):</p>
<ul>
<li><em>NonQuestionClassifier:</em> Determines if input is a valid question.</li>
<li><em>InappropriateQuestionClassifier:</em> Determines if input is obscene, spam, or otherwise unsuitable for asking.</li>
<li><em>TrivialQuestionClassifier:</em> Determines if input is a simple factual question (examples given: &#8220;What time is it now?&#8221;, &#8220;What is the weather?&#8221;). If so, the user gets an automatically generated answer via traditional web search.</li>
<li><em>LocationSensitiveClassifier:</em> Determines if the question contains location information; if it does, it passes that information along to the Routing Engine</li>
</ul>
<ul>
<li><em>KeywordMatchTopicMapper:</em> Checks for string matches against user profile topics (the mapper attempts to classify meaningful vs. spurious matches).</li>
<li><em>TaxonomyTopicMapper:</em> Classifies question text using an SVM trained on an &#8220;annotated corpus of several million questions&#8221; (<strong>where did they find that?</strong>)</li>
<li><em>SalientTermTopicMapper:</em> Extracts salient phrases using a noun-phrase chunker and tf-idf and finds &#8220;semantically similar user topics&#8221;.</li>
<li><em>UserTagTopicMapper:</em>Utilizes tags explicitly provided by the asker or other answerers and maps them to user topics.</li>
</ul>
<p>This description of the routing algorithm comprises the main function of the paper. After some more description of how users interact with the system, the authors provide some interesting data collected over the past several months of use (from the beta launch in March 2009 until October 2009).  Here&#8217;s a quick run-down of the more interesting facts that they presented:</p>
<ul>
<li><em>Strong User Growth: </em>As of October 2009, they reported 90,361 user accounts, and users appear to be remaining active (in the study period, over 1/2 the users actively generated content and over 2/3 of the users passively participated).</li>
</ul>
<div id="attachment_139" class="wp-caption aligncenter" style="width: 402px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkusers.png"><img class="size-full wp-image-139" title="Aardvark User Growth" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkusers.png" alt="Aardvark User Growth" width="392" height="331" /></a><p class="wp-caption-text">User Growth on Aardvark (graph taken from the paper).</p></div>
<ul>
<li><em>Higher Query Contextualization:</em> Aardvark queries average 18.6 words in length while the average query length reported for web search is between 2.2 and 2.9 words (citing previous comparison and characterization studies).  They further state that &#8220;98.1% of questions are unique&#8221;, though I am unsure as to how exact they are being about matching (I am sure the question &#8220;What&#8217;s a great restaurant in SF&#8221; has been asked 1000 times in different forms). In addition, they report from manual scoring of 1000 randomly selected questions that 64.7% of questions asked have a subjective element, with advice about travel, restaurants, and products being specifically popular.</li>
<li><em>Fast, High-Quality Answers:</em> They report that 87.7% of questions get answers and 57.2% received an answer within 10 minutes. They report that 70.4% of answers receiving feedback are rated as &#8220;good&#8221; and only 15.5% are rated as &#8220;bad&#8221;. Interestingly, they observe a notable difference in feedback on answers from users within the asker&#8217;s social network (76% rated as food) and outside the asker&#8217;s network (68% rated as good).</li>
</ul>
<div id="attachment_138" class="wp-caption aligncenter" style="width: 503px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkquestions.png"><img class="size-full wp-image-138" title="Aardvark Questions" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkquestions.png" alt="Aardvark Questions" width="493" height="229" /></a><p class="wp-caption-text">Questions on Aardvark (chart taken from the paper).</p></div>
<p>Overall, I really enjoyed reading this paper. After using Aardvark for over a year now, it was really interesting to get to peer inside and see how the system works, and a lot of great details were provided about the ranking engine.</p>
<p>One place where I feel that the authors missed the mark was in the cursory side-by-side evaluation which pitted Aardvark against Google for a set of 200 questions randomly selected from the Aardvark system. They report that 71.5% of the questions studied were answered successfully on Aardvark, while 70.5% of the questions were answered successfully on Google. This comparison seems mostly useless as the questions, having been pulled from the Aardvark system in the first place, are ones that were specifically chosen because they are better adapted to what is being called &#8216;social search&#8217;. This comparison left me desirous of more investigation into two main questions.<em> </em></p>
<p><em>&#8220;What makes a search engine &#8216;social&#8217; in the first place?&#8221;</em></p>
<p>The distinction between social and non-social is extremely murky, something Brynn and I discovered when working on our <a title="Sanjay Kairam - Cognitive Consequences of Social Search (PDF)" href="http://sanjaykairam.com/papers/evans-kairam-pirolli-inSubmission.pdf" target="_blank">Social Search paper</a>. It has been argued before (one small example <a title="Brynn Evans' Blog - Comment by Manas Tungare" href="http://brynnevans.com/blog/2009/01/30/why-social-search-wont-topple-google-anytime-soon/#comment-1933">here</a>) that Google&#8217;s PageRank algorithm is inherently social, as it aggregates information provided by people (links to one another) to rank results. However, it is clear that something seems categorically different between Google and what people perceive to be &#8216;social search&#8217;. When it comes down to it, even though everyone is excited about <a title="Google Blog - Search is getting more social" href="http://googleblog.blogspot.com/2010/01/search-is-getting-more-social.html" target="_blank">Google&#8217;s forays into &#8220;Social Search&#8221;</a>, there&#8217;s nothing all that fundamentally different about Google indexing your blog and your tweets than any other documents extant on the web.</p>
<p>To me, it seems that the key difference is really the change in the <strong>direction of interaction</strong>. While Google takes a query (question) and compares it against traces of discussion about that question from the past (web documents), systems perceived as &#8216;social&#8217; take a question and attempt to generate new answers in the future. This change in direction is what allows for the higher context that makes &#8216;social&#8217; search answers so much more rich (at least for some questions.)  Perhaps we need a different word to define this phenomenon &#8211; &#8216;real-time search&#8217; seems to get at it more, but has its own problems.  Perhaps something like &#8216;generative search&#8217;? I really don&#8217;t know.</p>
<p><em>&#8220;Why do we need a social search engine at all?&#8221;</em></p>
<p>This one seems like the best fodder for a follow-up study by Aardvark. While they do provide a rough breakdown of the types of questions asked on Aardvark (see pie chart above), I think that a comparison might have been much more interesting if they had looked at a variety of classes of user needs and had compared the relative efficacy of searching on Aardvark and a traditional search engine such as Google. It is clear that &#8216;social&#8217; will work much better for some needs and much worse for others, but up to this point, people who talk about social search always seem to use the same types of examples (travel, restaurants, and products, for instance). It would be great to get a clear idea over a wide range of needs and use cases where systems such as Aardvark can provide benefits over existing tools.</p>
<p>Anyways, for those of you interested in &#8216;social search&#8217; and search systems, I encourage you to read this paper and tell me your thoughts!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Journal of Serendipitous and Unexpected Results</title>
		<link>http://www.sanjaykairam.com/blog/2010/02/journal-of-serendipitous-and-unexpected-results/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/02/journal-of-serendipitous-and-unexpected-results/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 18:10:17 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[announcement]]></category>
		<category><![CDATA[journal]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[serendiptiy]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=133</guid>
		<description><![CDATA[I recently received an email message announcing the creation of a new journal, entitled The Journal of Serendipitous and Unexpected Results (JSUR), which focuses on reporting research efforts that differ from "what is traditionally thought of as a publishable result."  They are looking for papers in both Computer Science and Life Science.]]></description>
			<content:encoded><![CDATA[<p>I recently received an email message announcing the creation of a new journal, entitled <a title="JSUR - Home Page" href="http://jsur.org/" target="_blank"><em>The Journal of Serendipitous and Unexpected Results</em></a> (<em>JSUR</em>), which focuses on reporting research efforts that differ from &#8220;what is traditionally thought of as a publishable result.&#8221;  They are looking for papers in both Computer Science and Life Science. (here is a link to a page describing <a title="JSUR - Contribution Types" href="http://jsur.org/node/contribution" target="_blank">contribution types</a> in more detail.) Below is the announcement and call for papers:</p>
<blockquote><p>Most research effort does not produce what is thought of as a traditionally publishable result.  That doesn&#8217;t mean, however, that nothing was gained by conducting the research.  These results, whether they are failures or merely perplexing, can provide valuable insights into open problems and prevent other researchers from duplicating work.</p>
<p>We have started a journal that focuses on serendipitous (I have no idea why this worked) and unexpected (it seems like this technique should work on this problem but it doesn&#8217;t) results.  The goal is to provide a venue for the dissemination and discussion of ideas and to enable more efficient research.</p>
<p>The Journal of Serendipitous and Unexpected Results (JSUR) is an open-access forum for researchers seeking to further scientific discovery by sharing surprising or unexpected results. These results should provide guidance toward the verification (or negation) of extant hypotheses.  JSUR has two branches, one focusing on computational sciences and the other on the life sciences.  JSUR submissions include, but are not limited to, short communications of recent research results, full-length papers, review articles, and opinion pieces.</p>
<p>Recently, we launched the beta version of the journal site at http://jsur.org .  We would love to get your feedback and even better, a submission for the first issue.</p>
<p>To get the journal started, we&#8217;re looking to collect a large number of short (2-4 page) reports. I know you have something to publish.</p>
<p>Please help us spread the word and forward this information to interested colleagues.</p></blockquote>
<p>Looking at the <a title="JSUR - Author Guidelines" href="http://jsur.org/node/authorguidelines" target="_blank">author guidelines</a>, it looks as if submissions will be reviewed both by an editorial board and through a peer review process.  In addition, an interesting note is that all articles will be open-access and Creative-Commons licensed.  I&#8217;ll definitely be happy to read some of the articles that come through here once they start publishing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/02/journal-of-serendipitous-and-unexpected-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What Makes Web Sites Credible? 10 Years Later</title>
		<link>http://www.sanjaykairam.com/blog/2010/01/what-makes-web-sites-credible-10-years-later/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/01/what-makes-web-sites-credible-10-years-later/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 22:04:54 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[credibility]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[mechanical turk]]></category>
		<category><![CDATA[mturk]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[survey]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=118</guid>
		<description><![CDATA[Last month, I read a study by B. J. Fogg and others from the Persuasive Technology Lab at Stanford, entitled "What Makes Web Sites Credible? A Report on a Large Quantitative Study". The paper described an early effort to systematically determine how different elements of web sites affect people's perceptions of credibility (defined roughly as the intersection of trustworthiness and expertise). The original study design had 1400 participants completing a survey which presented them with 51 web site elements and asked them to rate how much more or less each element would affect the believability of a web site. The two questions I hope to answer, roughly, are "What has changed in the past 10 years about how people assess web site credibility?" and "Is there a cheaper, yet effective, way to do a study like this?". The results have implications for website design.]]></description>
			<content:encoded><![CDATA[<p>Last month, I read a study by <a title="B.J. Fogg - Home Page" href="http://www.bjfogg.com/" target="_blank">B. J. Fogg</a> and others from the <a title="Stanford University - Persuasive Technology Lab" href="http://captology.stanford.edu/" target="_blank">Persuasive Technology Lab at Stanford</a>, entitled &#8220;<a title="Paper Link (PDF)" href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;ved=0CAsQFjAA&amp;url=http%3A%2F%2Fcaptology.stanford.edu%2Fpdf%2Fp61-fogg.pdf&amp;ei=ASBaS5W-G4TYsgPIu7HNBA&amp;usg=AFQjCNFfC0AVUNm3lt3gOVxE-DCbs5pI-A&amp;sig2=mkyXRpQBujCx6-j0MMFJ3Q" target="_blank">What Makes Web Sites Credible? A Report on a Large Quantitative Study</a>&#8220;. The paper described an early effort to systematically determine how different elements of web sites affect people&#8217;s perceptions of <a title="Wikipedia - Credibility" href="http://en.wikipedia.org/wiki/Credibility" target="_blank">credibility</a> (defined roughly as the intersection of trustworthiness and expertise). The original study design had 1400 participants completing a survey which presented them with 51 web site elements and asked them to rate how much more or less each element would affect the believability of a web site.</p>
<p>Upon reading this study, I noticed two things:</p>
<p>1) The original experiment solicited participants by offering to donate $10 to charity for their time; resulting in a net cost of at least $14K, raising the second question: <em>Given the growth of sites like <a title="Amazon Mechanical Turk" href="http://www.mturk.com" target="_blank">Mechanical Turk</a>, can we get comparable results for less money?</em></p>
<p>2) The data was originally collected in December <span style="text-decoration: line-through;">2009</span> 1999, which is almost exactly 10 years ago. Content on the web and our interactions with it have changed a great deal since then &#8211; I mean, in 1999, <a title="Google - Corporate Milestones" href="http://www.google.com/corporate/history.html" target="_blank">Google had only existed for a year</a>, blogs were still relatively uncommon (<a title="LiveJournal - Wikipedia" href="http://en.wikipedia.org/wiki/LiveJournal" target="_blank">LiveJournal had just started</a>), and <a title="Wikipedia - Historical Overview by Year" href="http://en.wikipedia.org/wiki/History_of_Wikipedia#Historical_overview_by_year" target="_blank">Wikipedia didn&#8217;t even exist yet</a>! This raised the question: <em>Given that the average Internet user now has a far greater amount of experience navigating the Web, should we expect the responses to be different 10 years later?</em></p>
<p>I decided to explore both of these questions through a survey on Mechanical Turk; in December 2009, I posted a HIT to Mechanical Turk replicating the original study as closely as possible. The 51 statements about web site elements were available from Fogg&#8217;s original paper, the same 7-point Likert scale was used, and the survey items were randomized.  I paid $0.05 per HIT, and I ended up getting 327 responses, with none thrown out due to quality.</p>
<p>An initial, high-level examination of the responses showed that they actually matched the 1999 data fairly well.  Average Likert ratings for each item correlated highly with average ratings for the same items in the 1999 data with R^2 = 0.96.  One difference in the data was that answers were compressed (closer to 0) overall, so for the purposes of comparison, I transformed the 2009 data using the transformation (1.1677X &#8211; 0.0003) to match the 1999 data (note that this has no effect on the correlation coefficient).</p>
<p>The first analysis in Fogg, et al. was to separate the various elements into 7 &#8220;scales&#8221; using factor analysis (<em>Real-World Feel, Ease of Use, Expertise, Trustworthiness, Tailoring, Commercial Implications,</em> and <em>Amateurism</em>).  Below, I present comparisons for items in each scale.  I highlighted items that deviated from the 1999 values by more than 0.25 (without the original data, I couldn&#8217;t do much more in-depth comparison), but this might give some rough idea of which elements are <strong>more</strong> important now than they were in 1999 (red) and which elements are <strong>less</strong> important (purple):</p>
<div id="attachment_120" class="wp-caption aligncenter" style="width: 500px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide1.jpg"><img class="size-full wp-image-120" title="Real-World Feel Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide1.jpg" alt="Real-World Feel Scale" width="490" height="211" /></a><p class="wp-caption-text">Elements related to the &quot;Real-World&quot; feel scale are rated similarly from 1999 to 2009.</p></div>
<div id="attachment_121" class="wp-caption aligncenter" style="width: 498px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide2.jpg"><img class="size-full wp-image-121" title="&quot;Ease of Use&quot; Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide2.jpg" alt="&quot;Ease of Use&quot; Scale" width="488" height="209" /></a><p class="wp-caption-text">One difference is that people seem more critical of long download times than in the past.</p></div>
<div id="attachment_122" class="wp-caption aligncenter" style="width: 500px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide3.jpg"><img class="size-full wp-image-122" title="Expertise Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide3.jpg" alt="Expertise Scale" width="490" height="304" /></a><p class="wp-caption-text">For some reason, it seems that displaying an award helps your site&#39;s credibility more than 10 years ago, and that providing a lot of news stories matters less.</p></div>
<div id="attachment_123" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide4.jpg"><img class="size-full wp-image-123" title="Trustworthiness Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide4.jpg" alt="Trustworthiness Scale" width="491" height="285" /></a><p class="wp-caption-text">Stating your policy on content and ending in &quot;.org&quot; are more important to people now - could this be a cultural shift in response to sites like Wikipedia? Overall, it seems as if links out to other sites matter less for credibility now.</p></div>
<div id="attachment_124" class="wp-caption aligncenter" style="width: 503px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide5.jpg"><img class="size-full wp-image-124" title="Tailoring Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide5.jpg" alt="Tailoring Scale" width="493" height="195" /></a><p class="wp-caption-text">Email confirmations are more important now than in 1999.</p></div>
<div id="attachment_125" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide6.jpg"><img class="size-full wp-image-125" title="Commercial Implications" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide6.jpg" alt="Commercial Implications" width="491" height="356" /></a><p class="wp-caption-text">Outside advertisements and an e-commerce focus matter more for credibility now than in the past. People are paying less attention to the commercial purpose of sites, as well as the number of ads and their integration with content (Google is a favorite site for many, after all.)</p></div>
<div id="attachment_126" class="wp-caption aligncenter" style="width: 502px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide7.jpg"><img class="size-full wp-image-126" title="Amateurisum Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide7.jpg" alt="Amateurism Scale" width="492" height="367" /></a><p class="wp-caption-text">People seem to notice domain name mismatches now (more public knowledge of phishing/identity theft now?), but less attention to multi-lingual sites.</p></div>
<div id="attachment_127" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide8.jpg"><img class="size-full wp-image-127" title="Other Elements" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide8.jpg" alt="Other Elements" width="491" height="181" /></a><p class="wp-caption-text">People seem more willing to trust free financial sites now than in the past.</p></div>
<p>Anyways, as we might have guessed, the answers from 2009 seem to match the answers from 1999 pretty well.  The elements that made things highly credible or highly &#8216;un-credible&#8217; in the past seem to have remained constant, and those which didn&#8217;t matter then seem not to matter too much now.  Some interesting elements are noted in the captions with some rough conjectures as to why some of them might be trending the way they are.</p>
<p>The rest of the Fogg paper focused on characterizing differences in scale responses due to demographic differences between participants, but I found that part of the study less convincing as they averaged over the positively and negatively phrased elements on each scale (which I think makes the interpretation somewhat confusing).</p>
<p>Now, back to our two questions:</p>
<p>1) It looks as if using Mechanical Turk, we were able to get reasonable answers that fairly closely matched those from the original study.  Total price tag: 327 response * $0.05 = $16.35 paid to participants (plus Amazon&#8217;s cut), which made this study about $13,980 cheaper than the original one.</p>
<p>2) We see above a few small changes in what makes web sites credible, but overall, people are looking at the same things, meaning that we should continue taking the same factors into account when designing websites. I&#8217;ve made some guesses as to why these elements may have changed over time, but I&#8217;m curious to hear what you think, so leave a comment!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/01/what-makes-web-sites-credible-10-years-later/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A MTurk Exploration of Activity Stream Usage</title>
		<link>http://www.sanjaykairam.com/blog/2009/06/a-mturk-exploration-of-activity-stream-usage/</link>
		<comments>http://www.sanjaykairam.com/blog/2009/06/a-mturk-exploration-of-activity-stream-usage/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 23:02:02 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[activity streams]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[friendfeed]]></category>
		<category><![CDATA[mechanical turk]]></category>
		<category><![CDATA[mturk]]></category>
		<category><![CDATA[news feeds]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://sanjaykairam.com/blog/?p=51</guid>
		<description><![CDATA[These are some slides from a presentation I gave on some Mechanical Turk data I collected about how people are using Activity Streams.  Specifically, I was interested in what tools people were using, what they were using them for, how these tools might be improved, and how people had been using these tools to collaborate/coordinate.  Here's what I found...]]></description>
			<content:encoded><![CDATA[<p>These are some slides from a presentation I gave on some Mechanical Turk data I collected about how people are using Activity Streams (also called News Feeds).  Specifically, I was interested in what tools people were using, what they were using them for, how these tools might be improved, and how people had been using these tools to collaborate/coordinate.  Here&#8217;s what I found:</p>
<div id="__ss_1588047" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="An Exploration of Activity Stream Usage via Mechanical Turk" href="http://www.slideshare.net/skairam/an-exploration-of-activity-stream-usage-via-mechanical-turk?type=presentation">An Exploration of Activity Stream Usage via Mechanical Turk</a><object width="425" height="355" data="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=parcactivitystreamsmturksurveypresentation-090615165608-phpapp01&amp;stripped_title=an-exploration-of-activity-stream-usage-via-mechanical-turk" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=parcactivitystreamsmturksurveypresentation-090615165608-phpapp01&amp;stripped_title=an-exploration-of-activity-stream-usage-via-mechanical-turk" /><param name="allowfullscreen" value="true" /></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">PDF documents</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/skairam">Sanjay Kairam</a>.</div>
</div>
<p>The data collected and the major points were fairly straightforward:</p>
<p><strong>Participant Demographics:</strong></p>
<ul>
<li><strong>Age:</strong> Mean = 25.6, SD = 8.0</li>
<li><strong>Education:</strong> Almost all were mid-college or post-college (and about 1/6 post-graduate study).</li>
<li><strong>Usage:</strong> Most (56/78) reported specifically personal usage, and only 2 subjects reported specifically professional usage (14 indicated both, however).</li>
</ul>
<p><strong>Tools Used:</strong></p>
<ul>
<li>Vast majority listed <strong>Facebook</strong> (61/78) &#8211; this was unsurprising (also, Facebook Stream listed as first example of an &#8220;activity stream&#8221; in survey instructions.)</li>
<li>Wide <strong>Twitter</strong> usage (41/78) was surprising, however.  Past experiencing with polling for Twitter-related topics on MTurk had resulted in low yield.  Perhaps this is due to the crazy upswing in Twitter sign-ups over the past few months?</li>
<li>Other than <strong>MySpace</strong> (16/78), tools such as LinkedIn, Yammer, FriendFeed, and others were barely listed, indicating either that these tools are not widely used or that people do not consider some of these to be activity streams.</li>
</ul>
<p><strong>Functions Served:</strong></p>
<ul>
<li><strong>Note:</strong> These responses were loosely categorized by me &#8211; this was not intended to be a rigorous academic study, but rather a glimpse into usage of these tools.</li>
<li><strong>Status</strong>(33/78), <strong>Communication</strong>(32/78), and <strong>Information</strong>(19/78) were listed as the most common functions served.</li>
<li>Responses also demonstrated a wide variety of usage, however, including some less anticipated uses such as <strong>Journaling</strong>.  Perhaps this speaks somewhat to the flexibility of these tools and the ability that users have to adapt them to their own needs.</li>
</ul>
<p><strong>Feature Requests / Improvements:</strong></p>
<ul>
<li>These are exact quotes from participants (again, loosely grouped into categories by me &#8211; no cross-coding was done).</li>
</ul>
<p>As you can see, the &#8220;Summary&#8221; was really just a reminder about what people said regarding potential improvements, but I thought this was really the most interesting part.  It&#8217;s interesting that most of the things that people asked for were things that are either available or which could be easily made available by new activity stream client applications, so there may be a lot of low-hanging fruit out there for application developers.</p>
<p>I&#8217;d be curious to see if any of you has done (or seen) similar research regarding Twitter, Facebook, or other activity streams (whether on MTurk or otherwise) and if you found similar or different trends.  If you are interested in clarification, more details, or discussion about any of the points brought up here, the comments section awaits.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2009/06/a-mturk-exploration-of-activity-stream-usage/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

