Squiggly-letter test used to tell humans, computers apart getting harder as machines catch on
By Dan Misener, CBC News
March 1, 2011
Tell me if this sounds familiar: you’re online, ready to buy some concert tickets or to sign up for a new email account. But before you’re allowed to
proceed, you have to prove you’re a human being by deciphering a mess of distorted, squiggly letters and numbers, then typing them into a text box.
This is what’s called a Captcha, or completely automated public Turing test to tell computers and humans apart.
For a while now, I’ve had a sneaking suspicion that Captchas are getting harder. Increasingly, I’m left wondering, is that an uppercase “x” or a
lowercase “x”? An “o” or a zero? A “q” or an “o” with a squiggle through it?
Sometimes, even though I’m 100 per cent sure I’ve typed exactly the right thing, the computer disagrees with me.
For months, I thought I was alone in this frustration. I worried that my increasing inability to pass these tests suggested that I’m not entirely
human. Then last week, I opened an email message from a colleague that read: “You know those distorted letters we have to type to pass security tests online?
I notice they’re getting more and more distorted.”
According to Luis von Ahn, one of the computer scientists who coined the term “Captcha,” the tests are getting harder.
If you can’t make out this dizzying array of letters, you’ve failed the completely automated public Turing test to tell computers and humans apart,
known as a Captcha.
“The thing about Captchas is that many people do their own implementations,” he told me. “Over time, some of these implementations have gotten a lot
harder, because the really easy ones – essentially, the undistorted ones – can be broken by bots.”
Traditionally, identifying squiggly, distorted letters has been difficult for computers but comparatively easy for humans. But computers are getting
better and better at it, and easy Captchas aren’t as effective as they once were.
Still, von Ahn says his own implementation of Captchas, called reCaptcha, isn’t getting any harder.
“It’s still the case, as it was three or four years ago, that a person who submits a solution [to reCaptcha] is going to be correct 96 per cent of the
time,” von Ahn said. “That number remains the same.”
Using Captchas to digitize books
Digitizing books involves photographically scanning the work and rendering it into text using Optical Character Recognition (OCR), but OCR can’t always
identify every word. Programs like reCaptcha allow the words that can’t be identified by OCR to be turned into a Captcha image that is then given to a
user to solve as part of a regular security check on various websites. The mystery word is given to the user in conjunction with another word for which
the answer is already known. The user is asked to read both words. If they
solve the one for which the answer is known, the system assumes their answer is correct for the mystery word, too. The same word is then given to several
other users to verify the initial answer was correct. Source: ReCaptcha
ReCaptcha, which was acquired by Google in 2009, generates more than 100 million Captcha images a day for various websites for free. The Captcha
images it provides are also used to help decipher words that can’t be identified during the process of digitizing printed material (see sidebar).
Computers are getting better at solving Captchas because devising automated ways of bypassing the test is potentially lucrative. Imagine that you’re an
email spammer. Wouldn’t it be great if you could automatically sign up for hundreds or thousands of bogus email accounts? Or, imagine you’re a ticket
scalper. Wouldn’t it be terrific if you could write a computer program to automatically buy all the tickets for a concert?
Captchas can help keep spammers and scalpers at bay.
Because there’s a lot of money to be made, software developers are actively writing code they say can crack Captchas that, von Ahn says, sells for
$10,000. Von Ahn said he has even seen ticket scalpers advertise software they say can break reCaptcha for as high as $50,000.
According to von Ahn, it’s simply a matter of time before software will rival humans at solving Captchas, but it could take decades.
A screen grab of a nearly indecipherable Captcha image used to distinguishbetween humans and computers.
In the meantime, as easier, ineffective systems are phased out, people will continue to be frustrated by some Captchas. And as frustrating as Captchas
are for the average user, they can be even more frustrating for people who are visually impaired or use screen-reader software. Some Captcha
implementations include an audio alternative, but accessibility will continue to be an issue.
Regardless of how they are implemented, Captchas are all built around the idea of creating a task that’s hard for computers and easy for humans. As
computers get better and better at reading squiggly letters, we may be asked to prove our humanity by performing other types of tasks.
For instance, computers are still very bad at determining the contents of a photograph. It’s difficult for software to tell the difference between a
photo of a cat and a photo of a dog. Microsoft Research built a Captcha system called ASIRRA (Animal Species Image Recognition for Restricting
Access) based on this idea. Companies like Solve Media and NuCaptcha have put their own twists on Captchas that require users to enter words from a
text or video advertisement.
Some sites that use Captchas are making them harder to decipher because computers are getting better at figuring them out.
If von Ahn is right and computers will eventually be able to reliably solve text-based Captchas, that’s not necessarily a bad thing. Though
Captcha-busting technology could be used by spammers or ticket scalpers, it could also help decipher hard-to-read parts of digitized books or identify
skewed and distorted text in photographs.
So, the next time you’re confounded by a mess of squiggly, distorted letters, don’t be too hard on yourself. Maybe it’s the Captcha’s fault.
As von Ahn told me, “Sometimes, they’re really bad. Sometimes, they are so hard to read that I can’t read them myself.”
Comforting words from one of the people responsible for some of those squiggly letters.