<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sanjay Kairam &#187; google</title>
	<atom:link href="http://www.sanjaykairam.com/blog/tag/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sanjaykairam.com/blog</link>
	<description>Graduate Student &#38; Armchair Philosopher</description>
	<lastBuildDate>Thu, 19 Jan 2012 23:09:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The Rise of GoogVark</title>
		<link>http://www.sanjaykairam.com/blog/2010/02/the-rise-of-googvark/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/02/the-rise-of-googvark/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 23:33:21 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[aardvark]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[buzz]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[social search]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=143</guid>
		<description><![CDATA[So, in a seemingly inevitable, but nonetheless surprising move, Google has purchased Aardvark for $50 million. My last blog post was about Aardvark's recent paper describing their social search engine, which included allusions to the research paper which was responsible for the creation of Google, so the announcement seems timely.]]></description>
			<content:encoded><![CDATA[<p>So, in a seemingly inevitable, but nonetheless surprising move, <a title="TechCrunch" href="http://techcrunch.com/2010/02/11/google-acquires-aardvark-for-50-million/" target="_blank">Google has purchased Aardvark for $50 million</a>. My <a title="Sanjay Kairam - Commons Sense" href="http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/">last blog post</a> was about Aardvark&#8217;s recent <a title="Aardvark Blog" href="http://blog.vark.com/?p=352" target="_blank">paper</a> describing their social search engine, which included allusions to the research paper which was <a title="Stanford InfoLab - PageRank" href="http://infolab.stanford.edu/~backrub/google.html">responsible for the creation of Google</a>, so the announcement seems timely.</p>
<p>Given Google&#8217;s recent social efforts (Twitter Search, Social Search, Google Buzz, etc.), I am curious to see what they will do with the Aardvark product &#8211; will it stand alone as it has or will it find its way into existing or new Google tools? I, for one, would love to see it integrated into Google&#8217;s main search. One consequence of <a title="Google Blog - Introducing Google Buzz" href="http://googleblog.blogspot.com/2010/02/introducing-google-buzz.html" target="_blank">Google&#8217;s recent launch of Buzz</a> is reminding people that Google has been collecting data on your social network for a while now. If Aardvark were integrated into your Google network, we&#8217;d have a out-of-the-box solution for social search (no messy profile-connecting or friend-inviting needed! To me, it seems like one of the biggest hurdles for most people in terms of social search or networking tools is the cost of building up their networks, so this would provide a quick and easy way around that.</p>
<p>What will GoogVark look like? I can&#8217;t say I&#8217;ve ever used the &#8220;I&#8217;m Feeling Lucky&#8221; button, but I do know that there are times when I can&#8217;t quite find the best answers through Google search, and I&#8217;d love to be able to seamlessly shift over to social search. I personally would love to see something like this (with an example supplied by Google itself!):</p>
<a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/googvark.jpg"><img class="size-full wp-image-144" title="GoogVark" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/googvark.jpg" alt="GoogVark" width="449" height="247" /></a>
<p>Would this make social search more inviting to you?</p>
<p><em>P.S. Congratulations to Aardvark&#8217;s founders over at The Mechanical Zoo &#8211; you guys deserve it!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/02/the-rise-of-googvark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Anatomy of a Paper about a Large-Scale Social Search Engine</title>
		<link>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/</link>
		<comments>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 21:43:22 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aardvark]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[PageRank]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[social search]]></category>
		<category><![CDATA[the mechanical zoo]]></category>
		<category><![CDATA[WWW]]></category>

		<guid isPermaLink="false">http://www.sanjaykairam.com/blog/?p=135</guid>
		<description><![CDATA[Earlier this week, the team at Aardvark unveiled a new paper "The Anatomy of a Large-Scale Social Search Engine" which will be presented in April at WWW 2010. Inspired by and patterned after "The Anatomy of a Large-Scale Hypertextual Web Search Engine", which describes the PageRank algorithm which drives Google's search ranking system (which as Aardvark's blog points out, was also presented at WWW 12 years ago). The paper, by Aardvark's Damon Horowitz and Stanford's Sep Kamvar, focuses mostly on the architecture of the Aardvark system, from the external representations with which users interact to the internal ranking algorithms on which the system runs. Below, I present a short summary of what they report, focusing on the elements I found most interesting.]]></description>
			<content:encoded><![CDATA[<p>Earlier this week, the team at Aardvark unveiled a new paper &#8220;<a title="Aardvark Blog - Anatomy of a Large-Scale Social Search Engine" href="http://blog.vark.com/?p=352" target="_blank">The Anatomy of a Large-Scale Social Search Engine</a>&#8221; which will be presented in April at <a title="WWW2010 - Home" href="http://www2010.org/www/" target="_blank">WWW 2010</a>. Inspired by and patterned after &#8220;<a title="Stanford InfoLab - Google" href="http://infolab.stanford.edu/~backrub/google.html">The Anatomy of a Large-Scale Hypertextual Web Search Engine</a>&#8220;, which describes the <a title="Wikipedia - PageRank" href="http://en.wikipedia.org/wiki/PageRank" target="_blank">PageRank</a> algorithm which drives Google&#8217;s search ranking system (which as Aardvark&#8217;s blog points out, was also presented at WWW 12 years ago).</p>
<p>The paper, by Aardvark&#8217;s Damon Horowitz and Stanford&#8217;s Sep Kamvar, focuses mostly on the architecture of the Aardvark system, from the external representations with which users interact to the internal ranking algorithms on which the system runs. Below, I present a short summary of what they report, focusing on the elements I found most interesting:</p>
<p><strong>The Basic Model</strong>: Aardvark&#8217;s scoring function is similar to PageRank in that both utilize two primary, but somewhat independently considered components: <em>relevance</em> and <em>quality</em>.</p>
<ul>
<li><em>Relevance</em> in the Aardvark model pertains to the probability that a particular user <em>i</em> can answer the given question <em>q</em> based on the identified topics contained in <em>t</em>.</li>
<li><em>Quality</em> in the Aardvark model pertains to the overall probability that a user <em>i</em> can return a satisfactory answer to another user <em>j</em>, regardless of the question.</li>
</ul>
<p><strong>Indexing Topics:</strong> Aardvark computes the relevance score by calculating a distribution of knowledge over topics known by the user using the following sources (keyword-y sounding italicized terms are for convenience only and are not used in the paper):</p>
<ul>
<li><em>Explicit Prompting</em> at sign-up for three &#8220;starter&#8221; topics about which the user has expertise.</li>
<li><em>Social Prompting</em> of a user&#8217;s friends to provide topics about which they trust the user&#8217;s opinion.</li>
<li><em>Structured Parsing</em> of the online profile pages connected to Aardvark by the user (e.g. &#8220;Interests&#8221; on a Facebook profile).</li>
<li><em>Unstructured Parsing</em> of the users&#8217; online homepage, blog, or status updates using a linear SVM to extract overall subject area and a named entity extractor to extract more specific topics.</li>
</ul>
<p><strong>Indexing Connections:</strong> Aardvark computes the quality score by building a set of weighted connections between users using characteristics ranging from social proximity to similarities in demographics or behavior, such as:</p>
<ul>
<li><em>Social Connections</em> either in the form of explicitly defined &#8220;friend&#8221; connections or implicit &#8220;network&#8221; connections, such as both being part of the Stanford network.</li>
<li><em>Demographic Similarity</em>, which likely includes age, gender, and location based on profile information collected by Aardvark.</li>
<li><em>Profile Similarity</em>, which seems to include similar movies and other items which might be listed on other profiles, such as Facebook.</li>
<li><em>Vocabulary Match</em>, which they explain with the example of &#8220;IM Shortcuts&#8221; (i.e. I assume this means it is based on the language you use to interact with Aardvark, but I am unsure.)</li>
<li><em>Chattiness and Verbosity Match</em>, which relate to frequency and length of messages used when interacting with Aardvark.</li>
<li><em>Politeness Match</em>, which basically seems to mean whether or not say &#8220;Thanks!&#8221; or not.</li>
<li><em>Speed Match</em>, which is a measure of responsiveness to other users.</li>
</ul>
<p><strong>Analyzing Questions:</strong> While all of the other components are pre-computed, this part is computed at question time (obviously). The utilize a number of classifiers to classify the question and then a set of mappers to map the question to a set of topics, noting that &#8220;the role of the Question Analyzer&#8230;is simply to learn enough about the qeustion that it may be sent to appropriately interested and knowledgeable human answerers&#8221;. Here are the classifiers they list (with the names used in the paper):</p>
<ul>
<li><em>NonQuestionClassifier:</em> Determines if input is a valid question.</li>
<li><em>InappropriateQuestionClassifier:</em> Determines if input is obscene, spam, or otherwise unsuitable for asking.</li>
<li><em>TrivialQuestionClassifier:</em> Determines if input is a simple factual question (examples given: &#8220;What time is it now?&#8221;, &#8220;What is the weather?&#8221;). If so, the user gets an automatically generated answer via traditional web search.</li>
<li><em>LocationSensitiveClassifier:</em> Determines if the question contains location information; if it does, it passes that information along to the Routing Engine</li>
</ul>
<ul>
<li><em>KeywordMatchTopicMapper:</em> Checks for string matches against user profile topics (the mapper attempts to classify meaningful vs. spurious matches).</li>
<li><em>TaxonomyTopicMapper:</em> Classifies question text using an SVM trained on an &#8220;annotated corpus of several million questions&#8221; (<strong>where did they find that?</strong>)</li>
<li><em>SalientTermTopicMapper:</em> Extracts salient phrases using a noun-phrase chunker and tf-idf and finds &#8220;semantically similar user topics&#8221;.</li>
<li><em>UserTagTopicMapper:</em>Utilizes tags explicitly provided by the asker or other answerers and maps them to user topics.</li>
</ul>
<p>This description of the routing algorithm comprises the main function of the paper. After some more description of how users interact with the system, the authors provide some interesting data collected over the past several months of use (from the beta launch in March 2009 until October 2009).  Here&#8217;s a quick run-down of the more interesting facts that they presented:</p>
<ul>
<li><em>Strong User Growth: </em>As of October 2009, they reported 90,361 user accounts, and users appear to be remaining active (in the study period, over 1/2 the users actively generated content and over 2/3 of the users passively participated).</li>
</ul>
<div id="attachment_139" class="wp-caption aligncenter" style="width: 402px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkusers.png"><img class="size-full wp-image-139" title="Aardvark User Growth" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkusers.png" alt="Aardvark User Growth" width="392" height="331" /></a><p class="wp-caption-text">User Growth on Aardvark (graph taken from the paper).</p></div>
<ul>
<li><em>Higher Query Contextualization:</em> Aardvark queries average 18.6 words in length while the average query length reported for web search is between 2.2 and 2.9 words (citing previous comparison and characterization studies).  They further state that &#8220;98.1% of questions are unique&#8221;, though I am unsure as to how exact they are being about matching (I am sure the question &#8220;What&#8217;s a great restaurant in SF&#8221; has been asked 1000 times in different forms). In addition, they report from manual scoring of 1000 randomly selected questions that 64.7% of questions asked have a subjective element, with advice about travel, restaurants, and products being specifically popular.</li>
<li><em>Fast, High-Quality Answers:</em> They report that 87.7% of questions get answers and 57.2% received an answer within 10 minutes. They report that 70.4% of answers receiving feedback are rated as &#8220;good&#8221; and only 15.5% are rated as &#8220;bad&#8221;. Interestingly, they observe a notable difference in feedback on answers from users within the asker&#8217;s social network (76% rated as food) and outside the asker&#8217;s network (68% rated as good).</li>
</ul>
<div id="attachment_138" class="wp-caption aligncenter" style="width: 503px"><a href="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkquestions.png"><img class="size-full wp-image-138" title="Aardvark Questions" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2010/02/aardvarkquestions.png" alt="Aardvark Questions" width="493" height="229" /></a><p class="wp-caption-text">Questions on Aardvark (chart taken from the paper).</p></div>
<p>Overall, I really enjoyed reading this paper. After using Aardvark for over a year now, it was really interesting to get to peer inside and see how the system works, and a lot of great details were provided about the ranking engine.</p>
<p>One place where I feel that the authors missed the mark was in the cursory side-by-side evaluation which pitted Aardvark against Google for a set of 200 questions randomly selected from the Aardvark system. They report that 71.5% of the questions studied were answered successfully on Aardvark, while 70.5% of the questions were answered successfully on Google. This comparison seems mostly useless as the questions, having been pulled from the Aardvark system in the first place, are ones that were specifically chosen because they are better adapted to what is being called &#8216;social search&#8217;. This comparison left me desirous of more investigation into two main questions.<em> </em></p>
<p><em>&#8220;What makes a search engine &#8216;social&#8217; in the first place?&#8221;</em></p>
<p>The distinction between social and non-social is extremely murky, something Brynn and I discovered when working on our <a title="Sanjay Kairam - Cognitive Consequences of Social Search (PDF)" href="http://sanjaykairam.com/papers/evans-kairam-pirolli-inSubmission.pdf" target="_blank">Social Search paper</a>. It has been argued before (one small example <a title="Brynn Evans' Blog - Comment by Manas Tungare" href="http://brynnevans.com/blog/2009/01/30/why-social-search-wont-topple-google-anytime-soon/#comment-1933">here</a>) that Google&#8217;s PageRank algorithm is inherently social, as it aggregates information provided by people (links to one another) to rank results. However, it is clear that something seems categorically different between Google and what people perceive to be &#8216;social search&#8217;. When it comes down to it, even though everyone is excited about <a title="Google Blog - Search is getting more social" href="http://googleblog.blogspot.com/2010/01/search-is-getting-more-social.html" target="_blank">Google&#8217;s forays into &#8220;Social Search&#8221;</a>, there&#8217;s nothing all that fundamentally different about Google indexing your blog and your tweets than any other documents extant on the web.</p>
<p>To me, it seems that the key difference is really the change in the <strong>direction of interaction</strong>. While Google takes a query (question) and compares it against traces of discussion about that question from the past (web documents), systems perceived as &#8216;social&#8217; take a question and attempt to generate new answers in the future. This change in direction is what allows for the higher context that makes &#8216;social&#8217; search answers so much more rich (at least for some questions.)  Perhaps we need a different word to define this phenomenon &#8211; &#8216;real-time search&#8217; seems to get at it more, but has its own problems.  Perhaps something like &#8216;generative search&#8217;? I really don&#8217;t know.</p>
<p><em>&#8220;Why do we need a social search engine at all?&#8221;</em></p>
<p>This one seems like the best fodder for a follow-up study by Aardvark. While they do provide a rough breakdown of the types of questions asked on Aardvark (see pie chart above), I think that a comparison might have been much more interesting if they had looked at a variety of classes of user needs and had compared the relative efficacy of searching on Aardvark and a traditional search engine such as Google. It is clear that &#8216;social&#8217; will work much better for some needs and much worse for others, but up to this point, people who talk about social search always seem to use the same types of examples (travel, restaurants, and products, for instance). It would be great to get a clear idea over a wide range of needs and use cases where systems such as Aardvark can provide benefits over existing tools.</p>
<p>Anyways, for those of you interested in &#8216;social search&#8217; and search systems, I encourage you to read this paper and tell me your thoughts!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2010/02/anatomy-of-a-paper-about-a-large-scale-social-search-engine/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Vint Cerf: Information on the Go</title>
		<link>http://www.sanjaykairam.com/blog/2009/11/vint-cerf-information-on-the-go/</link>
		<comments>http://www.sanjaykairam.com/blog/2009/11/vint-cerf-information-on-the-go/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 02:29:24 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[/Mobile]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[ICANN]]></category>
		<category><![CDATA[internet of things]]></category>
		<category><![CDATA[PARC Forum]]></category>
		<category><![CDATA[space]]></category>
		<category><![CDATA[Vint Cerf]]></category>

		<guid isPermaLink="false">http://sanjaykairam.com/blog/?p=79</guid>
		<description><![CDATA[I just got back from a very enjoyable PARC Forum from Vint Cerf at this week&#8217;s entitled &#8220;Information on the Go&#8221;.  Given his stats (VP of Google, Presidential Medal of Freedom, Turing Award, etc. etc.) it already looked to be an interesting talk, but I was surprised by how entertaining and engaging he was as...]]></description>
			<content:encoded><![CDATA[<p>I just got back from a very enjoyable PARC Forum from Vint Cerf at this week&#8217;s entitled &#8220;Information on the Go&#8221;.  Given his stats (VP of Google, Presidential Medal of Freedom, Turing Award, etc. etc.) it already looked to be an interesting talk, but I was surprised by how entertaining and engaging he was as a speaker (BTW, before he mentioned it, I actually never had noticed that &#8220;PARC&#8221; backwards is &#8220;CRAP&#8221;).  His talk covered basically everything internet-related under the sun (and orbiting it), and I wanted to share some highlights here.</p>
<p><strong>STATS:</strong> He started off with some data about internet usage (many pulled from <a title="Internet World Stats" href="http://www.internetworldstats.com/">here</a>), most notably mentioning that there are currently approximately 1.7B Internet users in the world (also adding that as Google&#8217;s &#8216;Internet Evangelist&#8217;, he &#8220;still has 75% of the world left to go.&#8221;).  What was interesting was his focus on what the numbers really meant &#8211; while Asia only has an 18.5% Internet penetration rate, that still works out to about 704M people, which is still more than 2x the entire US population.  In addition, he mentioned that there are currently 4 billion mobile phones, a fact which was new to me, and which implied that most people were using mobile phones as their primary conduit to the Internet.</p>
<p><strong>Major Changes:</strong> He also shared some of the major changes that were happening soon.  One of these was about the Internet&#8217;s current switch to <a title="Wikipedia - IPv6" href="http://en.wikipedia.org/wiki/IPv6" target="_blank">IPv6</a>, mentioning that we were<a title="IPv4 Address Countdown" href="http://www.potaroo.net/tools/ipv4/index.html" target="_blank"> on track to exhaust the stock of IPv4 addresses by 2011</a> (perhaps sooner if there is a rush for addresses at the end).  He mentioned that the new 128-bit addresses would allow for 3.8 x 10^34 addresses &#8211; &#8220;a number only Congress can appreciate&#8221;.  In addition, he mentioned upcoming changes like internationalization of domain names (<a title="ICANN Announcement - Internationalized Domain Names" href="http://www.icann.org/en/announcements/announcement-30oct09-en.htm" target="_blank">allowing non-latin characters in top-level domain names</a>).</p>
<p><strong>Applications:</strong> He then did a whirlwind tour of the kinds of applications that are supported by the Internet (mostly seen through Google&#8217;s eyes, of course).  Email (GMAIL), Video-Sharing (YOUTUBE), Maps (GMAPS), you get the picture&#8230;It was interesting to hear him talk about Google Wave, because even though he spoke about it with conviction, I noticed that it was still difficult for him to really express the use cases for the service, something that I (<a title="Twitter Search" href="http://search.twitter.com/search?q=what%27s+the+point+of+google+wave" target="_blank">and many other people</a>) have had a bit of trouble with.</p>
<p><strong>New Types of Devices:</strong> He talked about the grand proliferation of internet-enabled things.  This is a topic that I&#8217;ve long had some interest in and have been desirous to get involved with (beyond playing around with my <a title="Nabaztag - Home" href="http://www.nabaztag.com/" target="_blank">Nabaztag</a>, of course).  He actually got into an interesting anecdote about installing temperature sensors in his wine cellar that text messaged him when the temperature rose above 60 degrees.  He talked about possible additions to this project, including actuators to turn on the cooling system remotely, and RFID-indexing all of the bottles to keep inventory.  This transitioned into discussion of putting sensors in the corks themselves to deliver information about the wine, which reminded me a great deal of Bruce Sterling&#8217;s discussion of &#8220;spimes&#8221; in his book, &#8220;<a title="MIT Press - Shaping Things" href="http://mitpress.mit.edu/catalog/item/default.asp?tid=10603&amp;ttype=2" target="_blank">Shaping Things</a>&#8221; (a quick, interesting read, btw &#8211; definitely recommended).</p>
<p><strong>Challenges of the Digital Age:</strong> Finally (skipping over some things), he talked about 2 big challenges that he saw for information handling in the future.  The first was the phenomenon of &#8216;Bit Rot&#8217; &#8211; he applied the term not to the <a title="Wikipedia - Bit Rot" href="http://en.wikipedia.org/wiki/Bit_rot" target="_blank">decay of physical storage media</a>, but rather to the idea that we might just stop deciding to update programs, and thus 1000 years from now, we&#8217;ll never be able to see that Powerpoint presentation from 2004.  The other grand challenge he brought up (something he has been heavily involved with) was the creation of the &#8220;<a title="Wired.com Article" href="http://www.wired.com/wired/archive/8.01/solar.html" target="_blank">Interplanetary Internet</a>&#8220;.  The idea (linking things in space to the Internet) is relatively simple, but the execution (as one might expect) is somewhat hard.  Challenges such as vast distances (<a title="Register - Vint Cerf mods Android for Interplanetary Interwebs" href="http://www.theregister.co.uk/2009/11/05/vint_cerf_on_mobile/" target="_blank">speed of light actually becomes a factor when Mars gets as far as 235 million miles from Earth</a>) and planetary rotation (now you see it, now you don&#8217;t), have led them to develop a <a title="Wikipedia - DTN" href="http://en.wikipedia.org/wiki/Delay-tolerant_networking">Delay-Tolerant Network</a> protocol that uses a &#8220;store and forward&#8221; approach instead of trying to achieve end-to-end communication.  Their plan is to use as nodes satellites which have been re-purposed and re-programmed after completing their original mission.</p>
<p>Anyways, the talk, which covered everything from mobile phones to cloud-computing to internet-enabled surfboards to Googling in space, was interesting and inspiring.  In case you want to watch it, PARC has started putting up videos of the PARC Forum talks, and you can find the Vint Cerf talk <a title="PARC - PARC Forum (Vint Cerf)" href="http://www.parc.com/event/955/information-on-the-go.html" target="_blank">here</a> a few days hence.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2009/11/vint-cerf-information-on-the-go/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meta-Review: The Role of Domain Expertise in Web Search</title>
		<link>http://www.sanjaykairam.com/blog/2009/10/meta-review-the-role-of-domain-expertise-in-web-search/</link>
		<comments>http://www.sanjaykairam.com/blog/2009/10/meta-review-the-role-of-domain-expertise-in-web-search/#comments</comments>
		<pubDate>Wed, 28 Oct 2009 06:51:56 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Metareview]]></category>
		<category><![CDATA[domain expertise]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[meta-review]]></category>
		<category><![CDATA[mrtaggy]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search expertise]]></category>
		<category><![CDATA[web search]]></category>

		<guid isPermaLink="false">http://sanjaykairam.com/blog/?p=67</guid>
		<description><![CDATA[This is a first post in a new format that I'm trying out: the "Meta-Review".  Besides the fact that it starts with an "M" (thus fitting with my category naming format), I'm calling it a "Meta-Review" because it's composed of notes and thoughts about a handful of papers all mashed together.  This isn't intended to be a carefully thought-out treatise on the papers discussed, but instead is really just a more public version of my immediate thoughts and notes (if I'm going to write them down anyways, why not share?)  Comments, discussion, and pointers to additional/related papers are encouraged, as they would benefit other readers (and more importantly, me).

In this post, I present a summary and discussion of 4 papers (and a poster abstract) about the role that domain expertise plays in web search behavior and performance.]]></description>
			<content:encoded><![CDATA[<p>This is a first post in a new format that I&#8217;m trying out: the &#8220;Meta-Review&#8221;.  Besides the fact that it starts with an &#8220;M&#8221; (thus fitting with my category naming format), I&#8217;m calling it a &#8220;Meta-Review&#8221; because it&#8217;s composed of notes and thoughts about a handful of papers all mashed together.  This isn&#8217;t intended to be a carefully thought-out treatise on the papers discussed, but instead is really just a more public version of my immediate thoughts and notes (if I&#8217;m going to write them down anyways, why not share?)  Comments, discussion, and pointers to additional/related papers are encouraged, as they would benefit other readers (and more importantly, me).</p>
<p>Here&#8217;s a quick list of the papers mentioned:</p>
<ul>
<li><strong>&#8220;How Medical Expertise Influences Web Search Interaction&#8221;</strong> [1] and <strong>&#8220;Characterizing the Influence of Domain Expertise on Web Search Behavior&#8221;</strong> [2] by Ryen White, Sue Dumais, and Jaime Teevan.  This poster abstract and longer paper present a large-scale, log-based analysis of web searches in 4 domains (Medicine, Finance, Law, and Computer Science), looking specifically at how domain experts differ from non-domain experts in terms of search behavior.  The data for the study were extensive, comprised of a sample of URL visits from users of a browser toolbar over the course of a 3-month period and representing &#8220;more than 10 billion URL visits from more than 500 thousand unique users.&#8221;</li>
<li><strong>&#8220;Knowledge in the Head and on the Web: Using Topic Expertise to Aid Search&#8221;</strong> [3] by Geoffrey Duggan and Stephen Payne.  This paper looks at the role of domain expertise in predicting search performance for people searching within their domain of expertise.  The study involved asking 34 university students trivia questions on two topics &#8211; Football (they meant to write &#8220;Soccer&#8221;) and Pop Music &#8211; and asking them to answer, first using only their own knowledge, and then again with the help of the Internet.</li>
<li><strong>&#8220;Web search behavior of Internet experts and newbies&#8221;</strong> [4] by Christoph Holscher and Gerhard Strube.  This is a somewhat earlier paper focusing on identifying the search strategies of internet (search) experts, and then using that ifnromation to help compare the effects of serach expertise and domain expertise on search performance.  In the first study, they had 12 internet experts first do a mental walk-through of their search strategies and then carry out real search tasks using a teach-aloud/think-aloud sort of protocol.  In the second study, they had 24 university students conduct web-based search tasks pertaining to economics &#8211; they were divided by domain expertise (half were economics students) and search expertise (assessed by interview and pre-test).</li>
<li><strong>&#8220;</strong><strong>Domain knowledge, search behaviour, and search effectiveness of engineering and science students: an exploratory study&#8221;</strong> [5] by Xiangmin Zhang, Hermina G.B. Anghelescu, Xiaojun Yuan.  This paper examined the relationships connecting domain knwoledge, search behavior, and search effectiveness.  The study established the domain knowledge of 22 engineering studies through familiarity with terms from an engineering thesaurus, and then had them search on 3 assigned topics.</li>
</ul>
<p>The papers that focused on quantifying search behavior showed that <strong>domain experts tended to do more exploration overall than domain novices</strong>.  [2] found that they issued more queries, they branched more (branching defined as stepping back to a previous page and then moving forward to a new page), they visited a larger number of unique domains, and they spent a longer time overall per search tasks.  [5] also found that domain experts tended to issue more queries (34.64 queries/subject vs. 20.09) over the course of their search tasks.  In addition, the <strong>domain experts explored this space faster</strong>: [3] found that greater topic expertise led to less time spent per page and to faster decisions about whether or not to stop a line of inquiry, a finding corroborated by [4].</p>
<p><strong>One interesting point of disagreement involved the length of queries</strong>.  [2] pointed to past literature demonstrating that domain experts tended to issue longer queries and more technical query terms, a result replicated in their study.  Longer queries were seen from domain experts in [5], as well (4 terms/query vs. 2.86).  [3], however, found that the domain experts studied used shorter queries than the domain novices, contradicting these other studies (though the scope of consideration was restricted greatly to come to this conclusion &#8211; they looked at just 2 of the football questions).  In the first study from [4], when the search behavior of the internet experts was compared against search logs from the Fireball Search Engine, it was found that the internet experts use longer queries (3.64 vs. 1.66 words), but in the second study, those with domain knowledge were found to make shorter queries than those without (1.97 vs. 2.96 words).</p>
<p>The papers that attempted to highlight specific search strategies also revealed some interesting differences.  [2] examined the domain suffixes of the sites visited, and noticed that <strong>domain experts tend to visit different types of sites than domain novices</strong>.  Computer Science experts, for instance, were more likely to visit *.org or *.edu sites than novices, while novices were more likely to visit *.com sites, representing that experts might be more familiar with academic or industry sites, while novices might be more familiar with consumer-oriented commercial sites.  [4] also found that <strong>so-called &#8220;double experts&#8221; (those with Internet AND domain expertise) tended to navigate directly to &#8220;go-to&#8221; sources of information</strong>, while all other groups were more likely to start with search engines.</p>
<p>Regarding overall performance, it is perhaps not surprising that domain experts performed better in all search tasks.  [3] attempts to distinguish between searching within one&#8217;s domain for information already known vs. searching within one&#8217;s domain for information that one doesn&#8217;t already know, and found that domain experts perform better in both scenarios.  Because [2] did not control user tasks, they coded successes as logged searches where the final click was a URL and failures as when the final click was another search.  Given this coding, they found that experts were more successful than novices when searching in-domain, but that these same experts performed the same as novices when searching for information out of their domain of expertise, highlighting the difference between domain expertise and search expertise.</p>
<p>Some of the interesting questions that come out of this field of research relate to how we can transfer the advantage that domain (and search) experts have to novices.  One possible method is to pin down what these experts are doing that helped them perform better and attempt to work these strategies into instruction.  Obviously, Internet search skills are already immensely important, and I would hope that this would trickle down into educational curricula (if they haven&#8217;t already).</p>
<p>Domain experts find information faster because their expertise in the space allows them to identify relevant information faster and to build off of it.  But, for those of us who are attempting to use the web to learn things on our own, there is a serious <a title="Wikipedia - Bootstrapping" href="http://en.wikipedia.org/wiki/Bootstrapping" target="_blank">boot-strapping</a> problem here.  As someone who is mostly self-taught when it comes to programming, I know how difficult it is to face the problem of searching for information when you are not entirely sure what to search for.  Once you get over the initial learning curve, it becomes much easier.  For those of us interested in building new technologies, here is a challenge: How can we create tools that support domain novices by doing this bootstrapping for them?</p>
<p>If we can find a way to identify domain novices and present them with useful information such as definitions or important &#8220;go-to&#8221; sources, we can significantly speed up their learning so that they can more quickly fend for themselves.  Faceted search tools such as <a title="Mr. Taggy" href="http://mrtaggy.com/" target="_blank">MrTaggy</a> take a solid step towards tackling this problem by <a title="ASC Blog - Announcing Mr. Taggy" href="http://asc-parc.blogspot.com/2009/02/announcing-mrtaggycom-tag-based.html" target="_blank">providing searchers with additional cues</a> that provide context.  More detailed study is needed regarding how to connect the gaps in which domain novices get lost &#8211; as better tools become available for providing socially or computationally-derived contextual data, it will be interesting to see what technologies evolve to support these needs.</p>
<p><strong>Links for Papers Above (Most from ACM Digital Library):</strong></p>
<ul>
<li>[1] &#8220;<a title="ACM Digital Library" href="http://portal.acm.org/citation.cfm?id=1390334.1390506&amp;coll=Portal&amp;dl=ACM&amp;CFID=58634471&amp;CFTOKEN=83510727" target="_blank">How Medical Expertise Influences Web Search Interaction</a>&#8221; by Ryen W. White, Susan Dumais, and Jaime Teevan in <em>Special Interest Group on Information Retrieval (SIGIR) 2008.</em></li>
<li>[2] &#8220;<a title="ACM Digital Library" href="http://portal.acm.org/citation.cfm?id=1498759.1498819&amp;coll=Portal&amp;dl=ACM&amp;CFID=58634471&amp;CFTOKEN=83510727" target="_blank">Characterizing the Influence of Domain Expertise on Web Search Behavior</a>&#8221; by Ryen W. White, Susan Dumais, and Jaime Teevan in <em>Conference on Web Search and Data Mining (WSDM) 2009.</em></li>
<li>[3] &#8220;<a title="ACM Digital Library" href="http://portal.acm.org/citation.cfm?id=1357054.1357062&amp;coll=Portal&amp;dl=ACM&amp;CFID=58634471&amp;CFTOKEN=83510727" target="_blank">Knowledge in the Head and on the Web: Using Topic Expertise to Aid Search</a>&#8221; by Geoffrey B. Duggan and Stephen J. Payne in <em>Conference on Human Factors in Computing Systems (CHI) 2008.</em></li>
<li>[4] &#8220;<a title="ScienceDirect" href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;_udi=B6VRG-40B2JGR-V&amp;_user=2553175&amp;_rdoc=1&amp;_fmt=&amp;_orig=search&amp;_sort=d&amp;_docanchor=&amp;view=c&amp;_searchStrId=1067009223&amp;_rerunOrigin=scholar.google&amp;_acct=C000057827&amp;_version=1&amp;_urlVersion=0&amp;_userid=2553175&amp;md5=3a2cb9c2fabb8bec9c6ecedb4575df4d" target="_blank">Web search behavior of Internet experts and newbies</a>&#8221; by Christoph Holscher and Gerhard Strube in <em>Computer Networks 33 (2000), pp.337-346</em>.</li>
<li>[5] &#8220;<a title="Information Research" href="http://informationr.net/ir/10-2/paper217.html" target="_blank">Domain knowledge, search behaviour, and search effectiveness of engineering and science students: an exploratory study</a>&#8221; by Xiangmin Zhang, Hermina G.B. Anghelescu, and Xiaojun Yuan in <em>Information Research 10(2), Jan 2005.</em></li>
</ul>
<p><strong>Some Other Reading on this Topic:</strong></p>
<ul>
<li>&#8220;The Effects of Topic Familiarity on Information Search&#8221; by Diane Kelly, Colleen Cool in <em>Joint Conference on Digital Libraries (JCDL) 2002</em>.</li>
<li>&#8220;Domain-Specific Search Strategies for the Effective Retrieval of Healthcare and Shopping Information&#8221; by Suresh K. Bhavnani in <em>Conference on Human Factors in Computing Systems (CHI) 2002.</em></li>
<li>&#8220;<a title="ACM Digital Library" href="http://portal.acm.org/citation.cfm?id=985358" target="_blank">The Effects of Domain Knowledge on Search Tactic Formulation</a>&#8221; by Barbara M. Wildermuth in <em>Journal of the American Society for Information Science ant Technology, 2004.</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2009/10/meta-review-the-role-of-domain-expertise-in-web-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mozilla Labs Releases &quot;Raindrop&quot;</title>
		<link>http://www.sanjaykairam.com/blog/2009/10/mozilla-labs-releases-raindrop/</link>
		<comments>http://www.sanjaykairam.com/blog/2009/10/mozilla-labs-releases-raindrop/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 17:04:23 +0000</pubDate>
		<dc:creator>skairam</dc:creator>
				<category><![CDATA[/Matter]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[raindrop]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[wave]]></category>

		<guid isPermaLink="false">http://sanjaykairam.com/blog/?p=71</guid>
		<description><![CDATA[This week, Mozilla Labs announced a new project entitled &#8220;Raindrop&#8221;.  The blog post introduces the underlying principles behind the system, as well as some of the development details and future plans: Today we’re introducing Raindrop, an exploration in messaging innovation being led by the team responsible for Thunderbird, to explore new ways to use Open...]]></description>
			<content:encoded><![CDATA[<p>This week, Mozilla Labs announced a new project entitled &#8220;Raindrop&#8221;.  The <a title="Mozilla Labs - Raindrop" href="http://labs.mozilla.com/raindrop/2009/10/22/introducing-raindrop/" target="_blank">blog post</a> introduces the underlying principles behind the system, as well as some of the development details and future plans:</p>
<blockquote><p>Today we’re introducing Raindrop, an exploration in messaging innovation being led by the team responsible for Thunderbird, to explore new ways to use Open Web technologies to create useful, compelling messaging experiences.</p>
<p>We hope to lead and spur the development of extensible applications that help users easily and enjoyably manage their conversations, notifications, and messages across a variety of online services. A central principle behind Raindrop is that messaging should be personal — we want Raindrop to be people-centric both in how we process messages, and in how we can help give people control over their personal data and experiences.</p>
<p>When a friend’s link from YouTube or flickr arrives, your messaging client should be able to show the video or photos near or as part of the message, rather than rudely kicking you over to a separate browser tab. Notifications from computers and mailing lists should be organized for you, not clutter your Inbox or require tedious manual filter setup. It should be easy to smoothly integrate new web services into your conversation viewer entirely using open web technologies.</p></blockquote>
<p>The post doesn&#8217;t remains a little too vague to offer a specific vision of what they are talking about.  Essentially, it sounds like Raindrop will be some sort of aggregator for conversation on the web, delivering messages to you in an email-like format.  The &#8220;fundamental ideas&#8221; video shines the light a little bit more on the idea of intelligently culling &#8220;personal&#8221; messages (as opposed to bulk) from your various streams. (P.S. The video didn&#8217;t play correctly for me, but you can watch it in large-format at Vimeo <a title="Vimeo - Mozilla Raindrop Intro Video" href="http://vimeo.com/7197666" target="_blank">here</a>).</p>
<div id="attachment_72" class="wp-caption aligncenter" style="width: 510px"><span><span><img class="size-full wp-image-72 " title="Hey! &quot;Raindrop&quot; rhymes with &quot;Alltop&quot;!" src="http://www.sanjaykairam.com/blog/wp-content/uploads/2009/10/raindrop.jpg" alt="The &quot;Second Iteration&quot; of the Raindrop Interface" width="500" height="311" /></span></span><p class="wp-caption-text">The &quot;Second Iteration&quot; of the Raindrop Interface</p></div>
<p>As I have obviously not yet had the opportunity to try out Raindrop, I can&#8217;t really give any sort of review of the service.  However, I think that the design principles here are interesting; with the increasing number of conversation platforms appearing on the web, the need for intelligent aggregation is growing quickly.  Even Friendfeed, the leading social web aggregator, felt unmanageable to me at times, and they weren&#8217;t even trying to deal with email!</p>
<p>I also enjoyed the use of &#8220;Raindrop&#8221; as a name, as it conjured up (for me) a very specific comparison with another <a title="Or perhaps not?" href="http://whedonesque.com/comments/20516" target="_blank">possibly water-themed</a> product.  While Google Wave&#8217;s approach to aggregating information is to literally inundate you with it and force you to use the search function to paddle your way out, Raindrop (in theory, at least) seems to focus on keeping messages separate, allowing you to catch a few in your hand when you need them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sanjaykairam.com/blog/2009/10/mozilla-labs-releases-raindrop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

