Can You Spot The Experts? Tagging and Expertise
Recently, I’ve been reading some papers about identifying and harnessing expertise in tagging communities such as Delicious–some of the research that I have come across have looked at topics such as:
- Identifying the features that underlie “tag quality” (e.g. Sen, et al. (2007), Zhang, et al. (2009))
- Topic-based approaches for information retrieval from tagged collections (e.g. Zhou, et al. (2008))
- Graph-based algorithms for ranking based on user tags (e.g. Hotho, et al. (2006), Noll, et al. (2009))
I decided to try a little Mechanical Turk study to see if I could spot some differences between tags generated by experts and those generated by novices. I had each Turker read 1 of 5 web pages (on the topic of “enterprise 2.0 mashups”) and enter 5 tags which they thought would be useful for bookmarking the page (either for themselves or others). I also asked them to rate how familiar they were with the subject matter (“Not at All”, “Slightly Familiar”, “Somewhat Familiar”, and “I am an Expert”).
As a game, I thought it would be interesting to post some of the responses to see how easy it was to identify which tags were generated by people who rated themselves as “experts” vs. “non-experts”. I took all of the tags generated by each expertise group, cleaned them up for minor spelling mistakes and typos (e.g., “applciation” > “application”) and generated a tag cloud using Wordle, where the tag size corresponds to the frequency of use of that word (all other factors, such as positioning and color, are purely stylistic).
For the following URL – http://www.soamag.com/I18/0508-1.php – can you identify which tag cloud belongs to which of these groups: “Not at All (Familiar)”, “Slightly Familiar”, and “Somewhat Familiar” (there was a 4th category of “I am an Expert”, but nobody rating this URL classified themselves this way):
If you have any idea which tag cloud is which, please feel free to post your guess in the comments! I’d be extremely curious to see why people guessed the way that they did. I am actually currently in the process of having some Turkers do the same thing; if you are curious about the answers, come back for my follow-up post where I post the correct answers, as well as the results of the Mechanical Turk evaluation of the tag cloud.



In general, I think that people tend to be more individualistic in terms of what they think is important when they consider themselves vaguely expert at something. In other words, people who know more will probably have very different opinions on what’s most important, because they all have the basic framework/summary sorted out. My guess is that the bottom tag cloud belongs to those users who said that they were “slightly familiar”, because most of the tags are quite evenly sized. On the same logic, I’d guess that the middle belongs to the “slightly familiar” users, and the top to the “not at all familiar”.
I’m also curious as to why “magazine” got such a sizable tag in all the tag clouds, as it occurred only once in the article. Of course, it was a magazine article, but it’s interesting that the users thought that that was so important.