Saturday, May 19, 2012

Crowdsourced information

Crowdsourcing is changing the way we live, even if you don't notice it at first. A person with access to the internet has more information available to them than the sum of all information available throughout history prior to the last few years. But how do you sort through all of the useless information and find what you are looking for. Crowdsourcing is actually on of the biggest factors in this information explosion, as people are putting massive amount of data into the internet ecosystem. Interestingly enough, user-generated data can both help and hurt our efforts to sift through the haystack and find the needles we are looking for

I saw a video recently where the founder of ShareThis talked about sharing replacing search, as people are more interested in what other people think than the results of a computer algorithm. Social network recommendations can help us find trusted sources of information and a variety of other internet resources. For years my family has relied on product reviews found on the sites of a number of internet retailers to help us identify strengths and weaknesses of different options before making purchase decisions. For the most part, we have found the information provided by those reviews to be very reliable. Theses cases show the value of crowdsourced information, provided by users who are most often not being compensated in any way for their contribution.

On the other hand, the volume of user-entered data on the internet can get in the way when we are trying to find very detailed information. I recently purchased an aquarium for my family and was looking for advice on the care and feeding of my new fish. There were some cases where a single Google search resulted in three or more opinions that contradicted one another. Similarly, when I was researching some unusual health troubles, I found far too many responses that provided no indication of the credentials of the author. Services like Yahoo Answers are very interesting, yet I have found many reasons to question the validity of the information they provide. Wikipedia suffers from similar concerns about the accuracy and validity of certain articles.

In addition, these examples of the results of generally well-intentioned efforts are further complicated by those who maliciously wish to mislead others by manipulating open systems. Companies have been found to enter fake reviews of their own products in the hopes of increasing sales. Then there are those who wish to promote their personal agendas by intentionally providing incorrect information.

It will be interesting to see how society evolves to deal with this divergence. As someone very interested in data quality, I see a number of ways to approach these issues with technology we already have available to us. Twitter dealt with fake accounts for famous individuals by providing a method of verifying an account. Other sites limit the volume of information by setting tight access controls on content creation or by having a small set moderators. Many online answer sites allow users to rate responses so that invalid or misleading responses drop in search results and impact fewer searchers. No one answer is perfect, but the various solutions offer us insight on how we can leverage to benefits of crowdsourced data and minimize the dangers.

