<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sanjay Kairam &#187; Uncategorized</title>
	<atom:link href="http://www.sanjaykairam.com/blog/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sanjaykairam.com/blog</link>
	<description>Home Page and Blog (Commons Sense)</description>
	<lastBuildDate>Mon, 06 Sep 2010 23:00:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>One Habit of Highly Successful Mathematicians</title>
		<link>http://www.sanjaykairam.com/blog/2010/05/one-habit-of-highly-successful-mathematicians/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/05/one-habit-of-highly-successful-mathematicians/#comments</comments>
		<pubDate>Tue, 25 May 2010 10:00:37 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[/Me]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Barabási]]></category>
		<category><![CDATA[Bursts]]></category>
		<category><![CDATA[mathematicians]]></category>
		<category><![CDATA[Poisson]]></category>
		<category><![CDATA[productivity]]></category>
		<category><![CDATA[reading]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=215</guid>
		<description><![CDATA[I'm currently reading Albert-László Barabási's second book, Bursts. Though the book is primarily about predicting human behavior in the future, the book is peppered with interesting anecdotes about historical figures (i.e. from the past). One such figure mentioned prominently is Siméon-Denis Poisson, the 19th-century French mathematician. A element which may seem trivial out of context but is rather crucial in the book is Barabási's description of Poisson's organizational habits (a sort of 19th-century French GTD):]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently reading <a title="Barabasi - Home Page" href="http://www.nd.edu/~alb/" target="_blank">Albert-László Barabási</a>&#8216;s second book, <a title="Amazon Books - Bursts" href="http://www.amazon.com/Bursts-Hidden-Pattern-Everything-Hardcover/dp/B003K05XQS/ref=sr_1_8?ie=UTF8&amp;s=books&amp;qid=1274746149&amp;sr=1-8" target="_blank"><em>Bursts</em></a>. Though the book is primarily about predicting human behavior in the future, the book is peppered with interesting anecdotes about historical figures (i.e. from the past). One such figure mentioned prominently is <a title="Wikipedia - Simeon-Denis Poisson" href="http://en.wikipedia.org/wiki/Sim%C3%A9on_Denis_Poisson" target="_blank">Siméon-Denis Poisson</a>, the 19th-century French mathematician. A element which may seem trivial out of context but is rather crucial in the book is Barabási&#8217;s description of Poisson&#8217;s organizational habits (a sort of 19th-century French GTD):</p>
<blockquote><p>Poisson distribution. Poisson process. Poisson equation. Poisson kernel. Poisson regression. Poisson summation formula. Poisson&#8217;s spot. Poisson&#8217;s ratio. Poisson bracket. Euler-Poisson-Darboux equation. This is only a partial list, and yet it shows the degree to which Siméon-Denis Poisson&#8217;s work has impacted just about all branches of science. But what is so impressive is not the volume of his contributions but rather their depth, raising a puzzling question: How did Poisson manage to work simultaneously on so many quite different problems and yet stay sufficiently focused to offer deep and lasting contributions?</p>
<p>Well, we had a secret: a notebook and a tiny habit.</p>
<p>Each time Poisson encountered a problem he though fascinating, he would resist the temptation to savor it. He pulled out his notebook instead and made a note of it and promptly returned to the problem that had absorbed him before the interruption. Once he solved the problem at hand, he mulled over the list of problems scribbled in his notebook, then picking as his next challenge the one he found the most interesting.</p>
<p>Poisson&#8217;s little secret was lifelong, careful prioritizing.</p></blockquote>
<p>So, this essentially describes the polar opposite of my work habits, which currently consist of frenetically switching from task to task to ensure that I complete none of them. I&#8217;m thinking of giving the priority list a try &#8211; has anybody tried a scheme like this and had success with it? Would be curious to hear your story!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/05/one-habit-of-highly-successful-mathematicians/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Applying for a NSF Graduate Research Fellowship</title>
		<link>http://www.sanjaykairam.com/blog/2010/04/applying-for-a-nsf-graduate-research-fellowship/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/04/applying-for-a-nsf-graduate-research-fellowship/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 17:00:32 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Me]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[fellowship]]></category>
		<category><![CDATA[grad school]]></category>
		<category><![CDATA[graduate school]]></category>
		<category><![CDATA[grfp]]></category>
		<category><![CDATA[nsf]]></category>
		<category><![CDATA[phd]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=194</guid>
		<description><![CDATA[For a comprehensive look at applying for NSF (and other similar fellowships), you should check out Philip Guo's Fellowships Tips page, which is really good and very comprehensive. I personally learn best by example, so in this post, I'd like to provide a personal perspective on the application and review process; hopefully, this will prove helpful to some of you applying in the coming fall.  While I did not win this year, I think it's helpful to see the essays of others with reviews to get a real sense of what the reviewers are looking for.]]></description>
			<content:encoded><![CDATA[<p>This year, as I was applying for Graduate Schools, I also decided to apply for an <a title="NSF GRFP - Home Page" href="http://nsfgrfp.org" target="_blank">NSF Graduate Research Fellowship</a>.  For those unfamiliar with the fellowship, here is the description from the website:</p>
<blockquote><p>The National Science Foundation&#8217;s Graduate Research Fellowship Program  (GRFP) helps ensure the vitality of the human resource base of science  and engineering in the United States and reinforces its diversity.  The  program recognizes and supports outstanding graduate students in  NSF-supported science, technology, engineering, and mathematics  disciplines who are pursuing research-based master&#8217;s and doctoral  degrees in the U.S. and abroad.  The NSF welcomes applications from all  qualified students and strongly encourages under-represented  populations, including women, under-represented racial and ethnic  minorities, and persons with disabilities, to apply for this fellowship.</p></blockquote>
<p>In addition to the prestige accompanying the receipt of this fellowship, winners also receive the following:</p>
<ul>
<li>Three Years of Support</li>
<li>$30K Annual Stipend</li>
<li>$10.5K Cost-of-Education Allowance</li>
<li>$1K One-Time International Travel Allowance</li>
<li><a title="TeraGrid - About" href="http://www.teragrid.org/about/" target="_blank">TeraGrid</a> Supercomputer Access</li>
</ul>
<p>The applications first become available in August and are eventually due in early November (at least this was the schedule they followed in 2009-2010). The results were set to be announced in mid-March, though this year they were actually announced in mid-April (I heard that decisions were delayed due to the weather complications in D.C. this winter). According to the site, it looks as if they announced 2000 awardees and 2025 honorable mentions, which seems to be up a great deal from past years.</p>
<p><strong>Why am I writing this post?</strong></p>
<p>For a comprehensive look at applying for NSF (and other similar fellowships), you should check out <a title="Philip Guo - Fellowship Tips" href="http://stanford.edu/~pgbovine/fellowship-tips.htm" target="_blank">Philip Guo&#8217;s Fellowships Tips page</a>, which is really good and very comprehensive. I personally learn best by example, so my goal in this post is to provide a personal example of the application and review process; hopefully, this will prove helpful to some of you applying in the coming fall.  While I did not win this year, I think it&#8217;s helpful to see the essays of others with reviews to get a real sense of what the reviewers are looking for.</p>
<p><strong>If you take one thing away from this post: BE SPECIFIC!</strong></p>
<p>The number one lesson that I gleaned from various sources while applying was &#8220;Whatever you write about, BE SPECIFIC&#8221;. From my understanding, what you write about and whether or not you actually follow the proposal if you win are both secondary to how specific you can be in your proposal. I wish I could re-find this link now, but when I was applying, I remember reading the application of another student who had posted their application materials and reviews. This student wrote an incredibly detailed proposal for studying land use in Africa; they included specific information about the plots of land they were going to study, the local contacts that they had assembled, even information about satellites from which they were going to pull aerial photos. One of the reviews had a comment along the lines of &#8220;Would have appreciated more detail about the study &#8211; for instance, what type of analysis are you planning to do on the satellite imagery?&#8221; It seems that it is impossible for you to cram too many details into the space provided.</p>
<p><strong>Brief Overview of Written Application Materials:</strong></p>
<p>Essentially, aside from other (important) components such as recommendations, test scores, and the like, the application is comprised of three major written components (see the full list <a title="NSF GRFP - Application Materials" href="http://www.nsfgrfp.org/how_to_apply/application_materials" target="_blank">here</a>). The <strong>Personal Statement</strong> is where you get to talk about your background, your strengths, why you are interested in your research areas, and how winning a fellowship will contribute to your long-term career goals. In the <strong>Research Experience Statement</strong>, your goal is to essentially discuss why you are qualified to do the work that you are proposing to do. Finally, in the <strong>Proposed Plan of Research Statement</strong>, you lay out the research question you intend to address, how you are going to answer it, and how that answer will contribute to science and society as a whole.</p>
<p>Throughout these three essays, there are basically two major principles you want to keep in mind:</p>
<ul>
<li><strong>Intellectual Merit: </strong>How important and original is this research, and how qualified is the applicant to conduct it?</li>
<li><strong>Broader Impacts:</strong> How will this research contribute to science,  society, education, underprivileged groups, etc.?</li>
</ul>
<p>Anyways, with that in mind, here are links to PDF&#8217;s of my three statements: <a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/04/NSF-PersonalStatement-Final.pdf">Personal Statement</a>, <a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/04/NSF-ResearchExperience-Final.pdf">Research Experience</a>, and <a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/04/NSF-ResearchPlan-Final.pdf">Proposed Research Plan</a>. Below, I am posting the reviews that I received from my 2 reviewers:</p>
<p><strong>Reviewer 1:</strong></p>
<blockquote><p><em>Overall Assessment of Intellectual Merit: <strong>Good</strong></em></p></blockquote>
<blockquote><p>The application has a good personal statement and the applicant has a convincing motivation and past research experience background. Nevertheless, I found this application not as competitive because the proposed research application was not specific enough about what exactly it is all about. By the same token it was not stated where the research is. A stronger motivational statement for the proposed research could also strengthen the application.</p></blockquote>
<blockquote><p><em>Overall Assessment of Broader Impacts: <strong>Good</strong></em></p></blockquote>
<blockquote><p>The proposed research has an obvious broader impact and benefit to society. Nevertheless, the broader impact statement of this application could be stronger by addressing it specifically and in more detail . Broader impact criteria can be impact on society, integration of research on education, the potential to reach diverse audiences and outreach.</p></blockquote>
<p><strong>Reviewer 2:</strong></p>
<blockquote><p><em>Overall Assessment of Intellectual Merit: <strong>Good</strong></em></p></blockquote>
<blockquote><p>Sanjay Kairam has a strong academic record and good research experience. To his credit, he has published several articles in some top-rated conferences. His description of the proposed research is reasonable but I expect a better and more specific explanations of the research plan and methodology.</p></blockquote>
<blockquote><p><em>Overall Assessment of Broader Impacts: <strong>Very Good</strong></em></p></blockquote>
<blockquote><p>His interest in social computing stem from his personal experience working in hospital in low income neighborhood. He participated in several activities to promote social computing. The proposed research will have a positive impact on better understanding of social issues.</p></blockquote>
<p><strong>Conclusions:</strong></p>
<p>In summary, my major takeaway was that while the personal statement and research experience were good, they wanted more details in a few different aspects of my proposed research plan (motivation, methodology, impact).  I was a little puzzled by the phrase &#8220;it was not stated where the research is&#8221; because I, as is the case with many students, was applying concurrently to graduate schools, so I did not yet know where I would be.</p>
<p>If you are applying for graduate schools, I would advise applying for NSF or another similar fellowship for a number of reasons.  First off, you might get it.  Second, even if you don&#8217;t, putting together the application was an incredibly useful exercise towards getting my school applications together &#8211; a great portion of my personal and research experience statements found their way into my school application materials.</p>
<p>I hope that getting to see my application statements and the reviews that they earned will help you when you are applying.  If you&#8217;d like to know more about my experience applying, talk about your experience, or even have thoughts on what I wrote (hey, I am applying again this year), hit up the comments section. If you are reading this while putting together your application &#8211; good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/04/applying-for-a-nsf-graduate-research-fellowship/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Anatomy of a Paper about a Large-Scale Social Search Engine</title>
		<link>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 21:43:22 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aardvark]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[PageRank]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[social search]]></category>
		<category><![CDATA[the mechanical zoo]]></category>
		<category><![CDATA[WWW]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=135</guid>
		<description><![CDATA[Earlier this week, the team at Aardvark unveiled a new paper "The Anatomy of a Large-Scale Social Search Engine" which will be presented in April at WWW 2010. Inspired by and patterned after "The Anatomy of a Large-Scale Hypertextual Web Search Engine", which describes the PageRank algorithm which drives Google's search ranking system (which as Aardvark's blog points out, was also presented at WWW 12 years ago). The paper, by Aardvark's Damon Horowitz and Stanford's Sep Kamvar, focuses mostly on the architecture of the Aardvark system, from the external representations with which users interact to the internal ranking algorithms on which the system runs. Below, I present a short summary of what they report, focusing on the elements I found most interesting.]]></description>
			<content:encoded><![CDATA[<p>Earlier this week, the team at Aardvark unveiled a new paper &#8220;<a title="Aardvark Blog - Anatomy of a Large-Scale Social Search Engine" href="http://blog.vark.com/?p=352" target="_blank">The Anatomy of a Large-Scale Social Search Engine</a>&#8221; which will be presented in April at <a title="WWW2010 - Home" href="http://www2010.org/www/" target="_blank">WWW 2010</a>. Inspired by and patterned after &#8220;<a title="Stanford InfoLab - Google" href="http://infolab.stanford.edu/~backrub/google.html">The Anatomy of a Large-Scale Hypertextual Web Search Engine</a>&#8220;, which describes the <a title="Wikipedia - PageRank" href="http://en.wikipedia.org/wiki/PageRank" target="_blank">PageRank</a> algorithm which drives Google&#8217;s search ranking system (which as Aardvark&#8217;s blog points out, was also presented at WWW 12 years ago).</p>
<p>The paper, by Aardvark&#8217;s Damon Horowitz and Stanford&#8217;s Sep Kamvar, focuses mostly on the architecture of the Aardvark system, from the external representations with which users interact to the internal ranking algorithms on which the system runs. Below, I present a short summary of what they report, focusing on the elements I found most interesting:</p>
<p><strong>The Basic Model</strong>: Aardvark&#8217;s scoring function is similar to PageRank in that both utilize two primary, but somewhat independently considered components: <em>relevance</em> and <em>quality</em>.</p>
<ul>
<li><em>Relevance</em> in the Aardvark model pertains to the probability that a particular user <em>i</em> can answer the given question <em>q</em> based on the identified topics contained in <em>t</em>.</li>
<li><em>Quality</em> in the Aardvark model pertains to the overall probability that a user <em>i</em> can return a satisfactory answer to another user <em>j</em>, regardless of the question.</li>
</ul>
<p><strong>Indexing Topics:</strong> Aardvark computes the relevance score by calculating a distribution of knowledge over topics known by the user using the following sources (keyword-y sounding italicized terms are for convenience only and are not used in the paper):</p>
<ul>
<li><em>Explicit Prompting</em> at sign-up for three &#8220;starter&#8221; topics about which the user has expertise.</li>
<li><em>Social Prompting</em> of a user&#8217;s friends to provide topics about which they trust the user&#8217;s opinion.</li>
<li><em>Structured Parsing</em> of the online profile pages connected to Aardvark by the user (e.g. &#8220;Interests&#8221; on a Facebook profile).</li>
<li><em>Unstructured Parsing</em> of the users&#8217; online homepage, blog, or status updates using a linear SVM to extract overall subject area and a named entity extractor to extract more specific topics.</li>
</ul>
<p><strong>Indexing Connections:</strong> Aardvark computes the quality score by building a set of weighted connections between users using characteristics ranging from social proximity to similarities in demographics or behavior, such as:</p>
<ul>
<li><em>Social Connections</em> either in the form of explicitly defined &#8220;friend&#8221; connections or implicit &#8220;network&#8221; connections, such as both being part of the Stanford network.</li>
<li><em>Demographic Similarity</em>, which likely includes age, gender, and location based on profile information collected by Aardvark.</li>
<li><em>Profile Similarity</em>, which seems to include similar movies and other items which might be listed on other profiles, such as Facebook.</li>
<li><em>Vocabulary Match</em>, which they explain with the example of &#8220;IM Shortcuts&#8221; (i.e. I assume this means it is based on the language you use to interact with Aardvark, but I am unsure.)</li>
<li><em>Chattiness and Verbosity Match</em>, which relate to frequency and length of messages used when interacting with Aardvark.</li>
<li><em>Politeness Match</em>, which basically seems to mean whether or not say &#8220;Thanks!&#8221; or not.</li>
<li><em>Speed Match</em>, which is a measure of responsiveness to other users.</li>
</ul>
<p><strong>Analyzing Questions:</strong> While all of the other components are pre-computed, this part is computed at question time (obviously). The utilize a number of classifiers to classify the question and then a set of mappers to map the question to a set of topics, noting that &#8220;the role of the Question Analyzer&#8230;is simply to learn enough about the qeustion that it may be sent to appropriately interested and knowledgeable human answerers&#8221;. Here are the classifiers they list (with the names used in the paper):</p>
<ul>
<li><em>NonQuestionClassifier:</em> Determines if input is a valid question.</li>
<li><em>InappropriateQuestionClassifier:</em> Determines if input is obscene, spam, or otherwise unsuitable for asking.</li>
<li><em>TrivialQuestionClassifier:</em> Determines if input is a simple factual question (examples given: &#8220;What time is it now?&#8221;, &#8220;What is the weather?&#8221;). If so, the user gets an automatically generated answer via traditional web search.</li>
<li><em>LocationSensitiveClassifier:</em> Determines if the question contains location information; if it does, it passes that information along to the Routing Engine</li>
</ul>
<ul>
<li><em>KeywordMatchTopicMapper:</em> Checks for string matches against user profile topics (the mapper attempts to classify meaningful vs. spurious matches).</li>
<li><em>TaxonomyTopicMapper:</em> Classifies question text using an SVM trained on an &#8220;annotated corpus of several million questions&#8221; (<strong>where did they find that?</strong>)</li>
<li><em>SalientTermTopicMapper:</em> Extracts salient phrases using a noun-phrase chunker and tf-idf and finds &#8220;semantically similar user topics&#8221;.</li>
<li><em>UserTagTopicMapper:</em>Utilizes tags explicitly provided by the asker or other answerers and maps them to user topics.</li>
</ul>
<p>This description of the routing algorithm comprises the main function of the paper. After some more description of how users interact with the system, the authors provide some interesting data collected over the past several months of use (from the beta launch in March 2009 until October 2009).  Here&#8217;s a quick run-down of the more interesting facts that they presented:</p>
<ul>
<li><em>Strong User Growth: </em>As of October 2009, they reported 90,361 user accounts, and users appear to be remaining active (in the study period, over 1/2 the users actively generated content and over 2/3 of the users passively participated).</li>
</ul>
<div id="attachment_139" class="wp-caption aligncenter" style="width: 402px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkusers.png"><img class="size-full wp-image-139" title="Aardvark User Growth" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkusers.png" alt="Aardvark User Growth" width="392" height="331" /></a><p class="wp-caption-text">User Growth on Aardvark (graph taken from the paper).</p></div>
<ul>
<li><em>Higher Query Contextualization:</em> Aardvark queries average 18.6 words in length while the average query length reported for web search is between 2.2 and 2.9 words (citing previous comparison and characterization studies).  They further state that &#8220;98.1% of questions are unique&#8221;, though I am unsure as to how exact they are being about matching (I am sure the question &#8220;What&#8217;s a great restaurant in SF&#8221; has been asked 1000 times in different forms). In addition, they report from manual scoring of 1000 randomly selected questions that 64.7% of questions asked have a subjective element, with advice about travel, restaurants, and products being specifically popular.</li>
<li><em>Fast, High-Quality Answers:</em> They report that 87.7% of questions get answers and 57.2% received an answer within 10 minutes. They report that 70.4% of answers receiving feedback are rated as &#8220;good&#8221; and only 15.5% are rated as &#8220;bad&#8221;. Interestingly, they observe a notable difference in feedback on answers from users within the asker&#8217;s social network (76% rated as food) and outside the asker&#8217;s network (68% rated as good).</li>
</ul>
<div id="attachment_138" class="wp-caption aligncenter" style="width: 503px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkquestions.png"><img class="size-full wp-image-138" title="Aardvark Questions" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkquestions.png" alt="Aardvark Questions" width="493" height="229" /></a><p class="wp-caption-text">Questions on Aardvark (chart taken from the paper).</p></div>
<p>Overall, I really enjoyed reading this paper. After using Aardvark for over a year now, it was really interesting to get to peer inside and see how the system works, and a lot of great details were provided about the ranking engine.</p>
<p>One place where I feel that the authors missed the mark was in the cursory side-by-side evaluation which pitted Aardvark against Google for a set of 200 questions randomly selected from the Aardvark system. They report that 71.5% of the questions studied were answered successfully on Aardvark, while 70.5% of the questions were answered successfully on Google. This comparison seems mostly useless as the questions, having been pulled from the Aardvark system in the first place, are ones that were specifically chosen because they are better adapted to what is being called &#8216;social search&#8217;. This comparison left me desirous of more investigation into two main questions.<em> </em></p>
<p><em>&#8220;What makes a search engine &#8216;social&#8217; in the first place?&#8221;</em></p>
<p>The distinction between social and non-social is extremely murky, something Brynn and I discovered when working on our <a title="Sanjay Kairam - Cognitive Consequences of Social Search (PDF)" href="http://sanjaykairam.com/papers/evans-kairam-pirolli-inSubmission.pdf" target="_blank">Social Search paper</a>. It has been argued before (one small example <a title="Brynn Evans' Blog - Comment by Manas Tungare" href="http://brynnevans.com/blog/2009/01/30/why-social-search-wont-topple-google-anytime-soon/#comment-1933">here</a>) that Google&#8217;s PageRank algorithm is inherently social, as it aggregates information provided by people (links to one another) to rank results. However, it is clear that something seems categorically different between Google and what people perceive to be &#8216;social search&#8217;. When it comes down to it, even though everyone is excited about <a title="Google Blog - Search is getting more social" href="http://googleblog.blogspot.com/2010/01/search-is-getting-more-social.html" target="_blank">Google&#8217;s forays into &#8220;Social Search&#8221;</a>, there&#8217;s nothing all that fundamentally different about Google indexing your blog and your tweets than any other documents extant on the web.</p>
<p>To me, it seems that the key difference is really the change in the <strong>direction of interaction</strong>. While Google takes a query (question) and compares it against traces of discussion about that question from the past (web documents), systems perceived as &#8216;social&#8217; take a question and attempt to generate new answers in the future. This change in direction is what allows for the higher context that makes &#8216;social&#8217; search answers so much more rich (at least for some questions.)  Perhaps we need a different word to define this phenomenon &#8211; &#8216;real-time search&#8217; seems to get at it more, but has its own problems.  Perhaps something like &#8216;generative search&#8217;? I really don&#8217;t know.</p>
<p><em>&#8220;Why do we need a social search engine at all?&#8221;</em></p>
<p>This one seems like the best fodder for a follow-up study by Aardvark. While they do provide a rough breakdown of the types of questions asked on Aardvark (see pie chart above), I think that a comparison might have been much more interesting if they had looked at a variety of classes of user needs and had compared the relative efficacy of searching on Aardvark and a traditional search engine such as Google. It is clear that &#8216;social&#8217; will work much better for some needs and much worse for others, but up to this point, people who talk about social search always seem to use the same types of examples (travel, restaurants, and products, for instance). It would be great to get a clear idea over a wide range of needs and use cases where systems such as Aardvark can provide benefits over existing tools.</p>
<p>Anyways, for those of you interested in &#8216;social search&#8217; and search systems, I encourage you to read this paper and tell me your thoughts!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What Makes Web Sites Credible? 10 Years Later</title>
		<link>http://www.sanjaykairam.com/blog/2010/01/what-makes-web-sites-credible-10-years-later/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/01/what-makes-web-sites-credible-10-years-later/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 22:04:54 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[credibility]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[mechanical turk]]></category>
		<category><![CDATA[mturk]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[survey]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=118</guid>
		<description><![CDATA[Last month, I read a study by B. J. Fogg and others from the Persuasive Technology Lab at Stanford, entitled "What Makes Web Sites Credible? A Report on a Large Quantitative Study". The paper described an early effort to systematically determine how different elements of web sites affect people's perceptions of credibility (defined roughly as the intersection of trustworthiness and expertise). The original study design had 1400 participants completing a survey which presented them with 51 web site elements and asked them to rate how much more or less each element would affect the believability of a web site. The two questions I hope to answer, roughly, are "What has changed in the past 10 years about how people assess web site credibility?" and "Is there a cheaper, yet effective, way to do a study like this?". The results have implications for website design.]]></description>
			<content:encoded><![CDATA[<p>Last month, I read a study by <a title="B.J. Fogg - Home Page" href="http://www.bjfogg.com/" target="_blank">B. J. Fogg</a> and others from the <a title="Stanford University - Persuasive Technology Lab" href="http://captology.stanford.edu/" target="_blank">Persuasive Technology Lab at Stanford</a>, entitled &#8220;<a title="Paper Link (PDF)" href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;ved=0CAsQFjAA&amp;url=http%3A%2F%2Fcaptology.stanford.edu%2Fpdf%2Fp61-fogg.pdf&amp;ei=ASBaS5W-G4TYsgPIu7HNBA&amp;usg=AFQjCNFfC0AVUNm3lt3gOVxE-DCbs5pI-A&amp;sig2=mkyXRpQBujCx6-j0MMFJ3Q" target="_blank">What Makes Web Sites Credible? A Report on a Large Quantitative Study</a>&#8220;. The paper described an early effort to systematically determine how different elements of web sites affect people&#8217;s perceptions of <a title="Wikipedia - Credibility" href="http://en.wikipedia.org/wiki/Credibility" target="_blank">credibility</a> (defined roughly as the intersection of trustworthiness and expertise). The original study design had 1400 participants completing a survey which presented them with 51 web site elements and asked them to rate how much more or less each element would affect the believability of a web site.</p>
<p>Upon reading this study, I noticed two things:</p>
<p>1) The original experiment solicited participants by offering to donate $10 to charity for their time; resulting in a net cost of at least $14K, raising the second question: <em>Given the growth of sites like <a title="Amazon Mechanical Turk" href="http://www.mturk.com" target="_blank">Mechanical Turk</a>, can we get comparable results for less money?</em></p>
<p>2) The data was originally collected in December <span style="text-decoration: line-through;">2009</span> 1999, which is almost exactly 10 years ago. Content on the web and our interactions with it have changed a great deal since then &#8211; I mean, in 1999, <a title="Google - Corporate Milestones" href="http://www.google.com/corporate/history.html" target="_blank">Google had only existed for a year</a>, blogs were still relatively uncommon (<a title="LiveJournal - Wikipedia" href="http://en.wikipedia.org/wiki/LiveJournal" target="_blank">LiveJournal had just started</a>), and <a title="Wikipedia - Historical Overview by Year" href="http://en.wikipedia.org/wiki/History_of_Wikipedia#Historical_overview_by_year" target="_blank">Wikipedia didn&#8217;t even exist yet</a>! This raised the question: <em>Given that the average Internet user now has a far greater amount of experience navigating the Web, should we expect the responses to be different 10 years later?</em></p>
<p>I decided to explore both of these questions through a survey on Mechanical Turk; in December 2009, I posted a HIT to Mechanical Turk replicating the original study as closely as possible. The 51 statements about web site elements were available from Fogg&#8217;s original paper, the same 7-point Likert scale was used, and the survey items were randomized.  I paid $0.05 per HIT, and I ended up getting 327 responses, with none thrown out due to quality.</p>
<p>An initial, high-level examination of the responses showed that they actually matched the 1999 data fairly well.  Average Likert ratings for each item correlated highly with average ratings for the same items in the 1999 data with R^2 = 0.96.  One difference in the data was that answers were compressed (closer to 0) overall, so for the purposes of comparison, I transformed the 2009 data using the transformation (1.1677X &#8211; 0.0003) to match the 1999 data (note that this has no effect on the correlation coefficient).</p>
<p>The first analysis in Fogg, et al. was to separate the various elements into 7 &#8220;scales&#8221; using factor analysis (<em>Real-World Feel, Ease of Use, Expertise, Trustworthiness, Tailoring, Commercial Implications,</em> and <em>Amateurism</em>).  Below, I present comparisons for items in each scale.  I highlighted items that deviated from the 1999 values by more than 0.25 (without the original data, I couldn&#8217;t do much more in-depth comparison), but this might give some rough idea of which elements are <strong>more</strong> important now than they were in 1999 (red) and which elements are <strong>less</strong> important (purple):</p>
<div id="attachment_120" class="wp-caption aligncenter" style="width: 500px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide1.jpg"><img class="size-full wp-image-120" title="Real-World Feel Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide1.jpg" alt="Real-World Feel Scale" width="490" height="211" /></a><p class="wp-caption-text">Elements related to the &quot;Real-World&quot; feel scale are rated similarly from 1999 to 2009.</p></div>
<div id="attachment_121" class="wp-caption aligncenter" style="width: 498px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide2.jpg"><img class="size-full wp-image-121" title="&quot;Ease of Use&quot; Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide2.jpg" alt="&quot;Ease of Use&quot; Scale" width="488" height="209" /></a><p class="wp-caption-text">One difference is that people seem more critical of long download times than in the past.</p></div>
<div id="attachment_122" class="wp-caption aligncenter" style="width: 500px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide3.jpg"><img class="size-full wp-image-122" title="Expertise Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide3.jpg" alt="Expertise Scale" width="490" height="304" /></a><p class="wp-caption-text">For some reason, it seems that displaying an award helps your site&#39;s credibility more than 10 years ago, and that providing a lot of news stories matters less.</p></div>
<div id="attachment_123" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide4.jpg"><img class="size-full wp-image-123" title="Trustworthiness Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide4.jpg" alt="Trustworthiness Scale" width="491" height="285" /></a><p class="wp-caption-text">Stating your policy on content and ending in &quot;.org&quot; are more important to people now - could this be a cultural shift in response to sites like Wikipedia? Overall, it seems as if links out to other sites matter less for credibility now.</p></div>
<div id="attachment_124" class="wp-caption aligncenter" style="width: 503px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide5.jpg"><img class="size-full wp-image-124" title="Tailoring Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide5.jpg" alt="Tailoring Scale" width="493" height="195" /></a><p class="wp-caption-text">Email confirmations are more important now than in 1999.</p></div>
<div id="attachment_125" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide6.jpg"><img class="size-full wp-image-125" title="Commercial Implications" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide6.jpg" alt="Commercial Implications" width="491" height="356" /></a><p class="wp-caption-text">Outside advertisements and an e-commerce focus matter more for credibility now than in the past. People are paying less attention to the commercial purpose of sites, as well as the number of ads and their integration with content (Google is a favorite site for many, after all.)</p></div>
<div id="attachment_126" class="wp-caption aligncenter" style="width: 502px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide7.jpg"><img class="size-full wp-image-126" title="Amateurisum Scale" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide7.jpg" alt="Amateurism Scale" width="492" height="367" /></a><p class="wp-caption-text">People seem to notice domain name mismatches now (more public knowledge of phishing/identity theft now?), but less attention to multi-lingual sites.</p></div>
<div id="attachment_127" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide8.jpg"><img class="size-full wp-image-127" title="Other Elements" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/01/Slide8.jpg" alt="Other Elements" width="491" height="181" /></a><p class="wp-caption-text">People seem more willing to trust free financial sites now than in the past.</p></div>
<p>Anyways, as we might have guessed, the answers from 2009 seem to match the answers from 1999 pretty well.  The elements that made things highly credible or highly &#8216;un-credible&#8217; in the past seem to have remained constant, and those which didn&#8217;t matter then seem not to matter too much now.  Some interesting elements are noted in the captions with some rough conjectures as to why some of them might be trending the way they are.</p>
<p>The rest of the Fogg paper focused on characterizing differences in scale responses due to demographic differences between participants, but I found that part of the study less convincing as they averaged over the positively and negatively phrased elements on each scale (which I think makes the interpretation somewhat confusing).</p>
<p>Now, back to our two questions:</p>
<p>1) It looks as if using Mechanical Turk, we were able to get reasonable answers that fairly closely matched those from the original study.  Total price tag: 327 response * $0.05 = $16.35 paid to participants (plus Amazon&#8217;s cut), which made this study about $13,980 cheaper than the original one.</p>
<p>2) We see above a few small changes in what makes web sites credible, but overall, people are looking at the same things, meaning that we should continue taking the same factors into account when designing websites. I&#8217;ve made some guesses as to why these elements may have changed over time, but I&#8217;m curious to hear what you think, so leave a comment!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/01/what-makes-web-sites-credible-10-years-later/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
