Wednesday, March 24, 2010

When the going gets gnarly...cheat! Look to crowdsourcing

At the Open Source Business Conference last week, industry pundit Tim O'Reilly gave a very well received talk, the theme of which was the importance of data over algorithms and applications. "The open source community will have to become the open data community," O'Reilly suggested. That got me thinking and reading about crowdsourcing, and I am convinced it will become an important mechanism for your business to solve a certain class of problems.

If you are like me, you've heard the term tossed around for some time, but you may not realize the range of applications available today or ease of access provided by clearinghouse-like services. Companies looking to get better software cheaper through open source ought also to be figuring out how to use the power of crowds to semi-automate repetitive, pain-in-the-neck tasks.

The concept of crowdsourcing is simple. You break problems down into a quantity of simple pieces, hand them off to an array of individuals, and reassemble the results. (The clearinghouses I mention above, make it very easy to do that.) It remains the case that there are plenty of tasks, even simple ones, much better handled by humans than computers. Problems or projects that involve lots of those kinds of tasks lend themselves to crowdsourcing. Companies are applying this approach to areas ranging from QA to Marketing and Sales, and vendors are popping up with all kinds of different angles on this simple idea.

Mechanical Turk from Amazon is one of the first implementations to make an enormous pool of resources readily available to anyone. A couple of hours ago, there were 192,000 HITs (Human Intellegence Tasks) in the MT backlog. I just "Wow"ed out loud when I refreshed my brower a few seconds ago and found that had gone down to 177,000 That kind of backlog movement suggests a lot of activity!

If you want to get something done or earn some money on the side, access is as close as your Amazon account. The rates generally range between a penny and a dollar per HIT with the bulk being under a dime. I saw one task advertised as being high rate; it was estimated at 20 seconds and earned 12 cents. That works out to a theoretical maximum of $20/hr, but my guess is that it would be hard to earn even $10/hr. That's consistent with one of the dings I've heard on the whole crowdsourcing idea, that it exploits people for low wages. But I also understand that many people who perform this work do it for fun or live someplace where $10 is big bucks.

To give you a sense for the kinds of quickie jobs people perform for under a dime, here are some examples:

* Decide if two words are related to each other
* Categorize Web Sites(New URLs) (WARNING: This HIT may contain offensive content. Worker discretion is advised.)
* Verify that a manufacturer's web page on a retailer's site has all of the expected details and features.

The high end (today at least) is $5. Here are some >$1 tasks:

* Help us test the system recovery using CA ARCserve D2D software, on Windows XP operating system
* Submit UFO related stories to website Exantria.com
* iPhone software testing (must have an iPhone or iPod Touch running OS 3.1 and be in the USA)

There is an easy web UI for people seeking work and for those looking to have work performed. But, Amazon also provides an API so a program can actually call "wetware" workers like a subroutine. How could you use Mechanical Turk? I'll toss out an idea, please share others with comments below.

Like many, the Black Duck website is set up such that people fill out a form before downloading something such as a whitepaper. Every time that happens a lead goes into our CRM system and then out to our salespeople. Unfortunately, our flow of leads gets polluted with guys named "a;dfja pdslfja" whose phone numbers are 123456. You get the idea: Some people (I know you'd never do this) fill out our forms with junk. I'm thinking our IT guys could build a little routine that would divert each "lead" before injecting it into our system, and submit a HIT asking someone to vet it as real or junk. We could then separate the wheat from the chaff for probably two cents a lead, judging from similar tasks. Not bad. That seems a lot easier than trying to write an algorithm to figure it out; people know junk when they see it.

Mechanical Turk, and a few others are general purpose clearinghouses for generic tasks. But there are plenty of other angles on this market being pursued by an array of other companies. Don't worry, I'll fill you in them in one of my next few blogs. In the meantime, how about some ideas...how might your company use Mechanical Turk?
To Learn More Click Here
Bookmark and Share

No comments:

Post a Comment