What Makes Web Sites Credible? 10 Years Later

by skairam on January 22nd, 2010

Last month, I read a study by B. J. Fogg and others from the Persuasive Technology Lab at Stanford, entitled “What Makes Web Sites Credible? A Report on a Large Quantitative Study“. The paper described an early effort to systematically determine how different elements of web sites affect people’s perceptions of credibility (defined roughly as the intersection of trustworthiness and expertise). The original study design had 1400 participants completing a survey which presented them with 51 web site elements and asked them to rate how much more or less each element would affect the believability of a web site.

Upon reading this study, I noticed two things:

1) The original experiment solicited participants by offering to donate $10 to charity for their time; resulting in a net cost of at least $14K, raising the second question: Given the growth of sites like Mechanical Turk, can we get comparable results for less money?

2) The data was originally collected in December 2009 1999, which is almost exactly 10 years ago. Content on the web and our interactions with it have changed a great deal since then – I mean, in 1999, Google had only existed for a year, blogs were still relatively uncommon (LiveJournal had just started), and Wikipedia didn’t even exist yet! This raised the question: Given that the average Internet user now has a far greater amount of experience navigating the Web, should we expect the responses to be different 10 years later?

I decided to explore both of these questions through a survey on Mechanical Turk; in December 2009, I posted a HIT to Mechanical Turk replicating the original study as closely as possible. The 51 statements about web site elements were available from Fogg’s original paper, the same 7-point Likert scale was used, and the survey items were randomized.  I paid $0.05 per HIT, and I ended up getting 327 responses, with none thrown out due to quality.

An initial, high-level examination of the responses showed that they actually matched the 1999 data fairly well.  Average Likert ratings for each item correlated highly with average ratings for the same items in the 1999 data with R^2 = 0.96.  One difference in the data was that answers were compressed (closer to 0) overall, so for the purposes of comparison, I transformed the 2009 data using the transformation (1.1677X – 0.0003) to match the 1999 data (note that this has no effect on the correlation coefficient).

The first analysis in Fogg, et al. was to separate the various elements into 7 “scales” using factor analysis (Real-World Feel, Ease of Use, Expertise, Trustworthiness, Tailoring, Commercial Implications, and Amateurism).  Below, I present comparisons for items in each scale.  I highlighted items that deviated from the 1999 values by more than 0.25 (without the original data, I couldn’t do much more in-depth comparison), but this might give some rough idea of which elements are more important now than they were in 1999 (red) and which elements are less important (purple):

Real-World Feel Scale

Elements related to the "Real-World" feel scale are rated similarly from 1999 to 2009.

"Ease of Use" Scale

One difference is that people seem more critical of long download times than in the past.

Expertise Scale

For some reason, it seems that displaying an award helps your site's credibility more than 10 years ago, and that providing a lot of news stories matters less.

Trustworthiness Scale

Stating your policy on content and ending in ".org" are more important to people now - could this be a cultural shift in response to sites like Wikipedia? Overall, it seems as if links out to other sites matter less for credibility now.

Tailoring Scale

Email confirmations are more important now than in 1999.

Commercial Implications

Outside advertisements and an e-commerce focus matter more for credibility now than in the past. People are paying less attention to the commercial purpose of sites, as well as the number of ads and their integration with content (Google is a favorite site for many, after all.)

Amateurism Scale

People seem to notice domain name mismatches now (more public knowledge of phishing/identity theft now?), but less attention to multi-lingual sites.

Other Elements

People seem more willing to trust free financial sites now than in the past.

Anyways, as we might have guessed, the answers from 2009 seem to match the answers from 1999 pretty well.  The elements that made things highly credible or highly ‘un-credible’ in the past seem to have remained constant, and those which didn’t matter then seem not to matter too much now.  Some interesting elements are noted in the captions with some rough conjectures as to why some of them might be trending the way they are.

The rest of the Fogg paper focused on characterizing differences in scale responses due to demographic differences between participants, but I found that part of the study less convincing as they averaged over the positively and negatively phrased elements on each scale (which I think makes the interpretation somewhat confusing).

Now, back to our two questions:

1) It looks as if using Mechanical Turk, we were able to get reasonable answers that fairly closely matched those from the original study.  Total price tag: 327 response * $0.05 = $16.35 paid to participants (plus Amazon’s cut), which made this study about $13,980 cheaper than the original one.

2) We see above a few small changes in what makes web sites credible, but overall, people are looking at the same things, meaning that we should continue taking the same factors into account when designing websites. I’ve made some guesses as to why these elements may have changed over time, but I’m curious to hear what you think, so leave a comment!

2 Comments
  1. Michael Bernstein permalink

    Do you get the sense that there are new design elements that signal trust or not? For example, the web design aesthetic has evolved in the past 10 years, so if a site looks really old, I tend not to trust it. Or, if it’s clearly not using javascript etc., indicating that it is either very old or just thrown together, I trust it less.

    Incidentally, this also suggests that sampling bias in MTurk is either less of a big deal than we thought, or that everyone who completed Fogg’s survey is now on MTurk.

  2. Hey Michael,

    I definitely believe that there are likely a number of new factors that affect credibility that probably weren’t addressed in the Fogg study, due to the overall newness of the WWW. Elements specifically pertaining to design might be difficult to parse out – it seems as if the original study may have tried to get at that a little bit (“site has been updated”, “site looks professionally designed” sort of peripherally address this?), but I do agree that given the time that has passed, there are definitely sites that appear dated and less “credible” overall.

    I am thinking about doing a follow-up study where I might address questions about “new” site elements that people take into account that weren’t prevalent in 1999 (“The site uses AJAX”?). I would also be extremely interested in getting at social questions, given how much our navigation of the web is now socially-mediated. It would be interesting to look at how things like “This site was linked to by a friend’s FB status” or “One of my Twitter followers @’d me with a link to this site” or “This is bookmarked by 1000 people on Delicious” might affect perceptions of site credibility.

    I suspect that the MTurk population worked well specifically for a study like this because they tend to be somewhat internet-savvy compared to the current general population – much like the population in Fogg’s study, given that he also recruited online.

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS