How reCAPTCHA works and how to mess with it!

Post image for How reCAPTCHA works and how to mess with it!

The fault of reCAPTCHA lies in the fact that it is used to digitize non-digital texts maybe to OCR the web, as well as stop spam. What this means is that in every captcha, there will be two words: One that the computer knows is right and will compare and check your text against and one that it hopes to use to digitize text. In other words, reCAPTCHA only needs one word of your captcha to be correct for your captcha to be accepted.

You will be given two words: [Real, Fake] or [Fake, Real]. The fake word is unknown to the computer and can be replaced with anything.

Let me show you few examples of how ALWAYS FAKE words look like:

Numbers:
recaptcha-numbers

Symbols:
recaptcha-symbols

Words with accents or punctuation:
recaptcha-accents-punctuation

Indecipherable text:
recaptcha-indecipherable-text


Almost always fake:

Words with the following are usually fake with few exceptions. Sometimes you will get surprises, however.

Inverted colors:
recaptcha-inverted-colors

Odd or non-matching fonts:
recaptcha-non-matching-fonts

Deformation caused by scanning:
recaptcha-deformation


Things to remember:

1. The fake word is usually the one which is blurrier and harder to read, even if by a little. However, sometimes it is the one which is unusually clean and easy to read though the quality of the scanned words varies greatly.

2. Real words usually use the same type of font throughout as they are computer generated, while the appearance and font for fake words can vary greatly as they are scanned from multiple sources. See the “Odd or non-matching fonts” above for examples.

3. The fake word is usually thicker, bolder, and blacker. But sometimes it is also thin and long.

4. A fake word’s alignment of letters are more likely to be in a straighter line or a smoother curve as it is scanned from a printed material. A real word’s alignment of letters are more likely to be wavy and a bit jumbled up due to being distorted by a computer.

5. You’ll sometimes get words with lots of noticeable dots around them. They are obviously scanned from books and therefore, fake.

6. Practice! Once you start out, You’ll have difficulty identifying which captcha is real, But after doing a few dozen, You’ll be proficient in picking out fakes and this will be such a great time saver for you.


Update: I’ve been seeing a lot of comments like… “Yeah, How noble! screwing over a worth project. What a time saver!
Guys, I’m not asking you to do that everytime you see a recaptcha. If you do have the time to read indecipherable text and fill it in then great, Thank you. But think about if you’re probably late for something and you have to download a file or leave a comment before you go, Will you keep refreshing the recaptcha until you find a word that you can read?
You get my point.
[ad#bottom]

{ 30 comments… read them below or add one }

Victus September 1, 2010 at 3:07 AM

A fake word and a real word…
Thanks, I didn’t know that, but why recaptcha needs two words anyway??

Reply

Ted September 1, 2010 at 3:11 AM

@Victus
Like I said, The other word could be a piece of a pic, or a word from a scanned book in a PDF file.
So basically reCAPTCHA (Google) is using us to OCR the web and digitize non-digital texts.

Reply

Greg September 1, 2010 at 10:35 PM

Why in the world would you want to sabotage recaptcha? They’re doing a great thing by digitalizing old texts. Telling people to try and fool recaptcha is slowing down this productive process.

Reply

DrTrue February 12, 2011 at 1:01 AM

Maybe it’s time to pay guys to do the actual work and not take advantage of unaware people?

Reply

Reality January 29, 2012 at 11:04 AM

Ok. Who is going to do that? You? It’s not difficult work so lets say you make minimum wage… and then we’ll stuff you in a library for a few years and your job is to transcribe every word of every book in the library. Granted. That’s one library. Have fun.

Reply

Dmytry February 14, 2012 at 5:59 PM

When you digitize a book to make it available for free you can add a text boxes for the uncertain words inside the viewer, so that people whom are actually reading that book fix it, taking into account the context! That’s the way to do this accurately!

Applying distortion to already hard to read piece of text, seen entirely outside the context, that is NOT the way to do it accurately.

What they are doing, they digitize the books which they don’t make available for free in text format. They’re stealing people’s effort, just to avoid paying some poor chinese or indian.

At same time, they’re clinical psychopaths of some sort. They played the idiots like a fiddle – the idiots think that recaptcha somehow uses the effort that would have been wasted. It does not; it just adds more effort that it steals.

Why does anything exist? May 2, 2012 at 8:39 AM

Sure. Give people a weekly quota that’s reasonable, so that they can adjust how much they get done per day. Most people work, and this would be great supplementary income. And it’s for a decent cause.

I’d do it. And say 10 people for every 10 million did it as well. Per language. That would be a decent startup I think. Or have a group goal for each language. Just ideas, but it could work

Munkatten September 13, 2010 at 11:59 PM

Recaptcha is silly, I already have to enter a stupid word in order to access websites, retain access to the websites, register, log in, check my settings, change my settings, and now I have to enter two? Essentially doubling my frustrations?

If it was used to combat spam, sure, I see the point, but it’s not. Fuck recaptcha.

Reply

Anonymous December 1, 2010 at 3:45 AM

To digitize books, did you not get that the first time around? For you, entering 2 words instead of 1 is gonna take 6 seconds instead of 3. But the entire system digitizes more books than anything else out there, so it’s little cost for great benefit.

Reply

UpCaptcha August 17, 2012 at 7:59 AM

Of course there’s some noble cause behind it. Nazis, KKK and countless other organizations for their own profit stated that. You crack me up!

Reply

Tomasz Cichowski September 21, 2010 at 1:53 PM

They are basically using our work without paying us, so why not fuck them up? One day, we will encounter ebooks about Professor Ni**er raping Mrs. G**tse.

Reply

InstantQueue June 22, 2011 at 4:08 AM

LOL
good one…

Reply

silver October 27, 2010 at 9:51 AM

yep me likes. fuck them up. better yet, with vulgar versions of the same words, so that its harder for the proofreader to check.

Reply

fckomegle January 22, 2011 at 8:58 AM

i’m often on omegle and i have to type that sh*t like every 90 seconds. i’m pretty much following your guide and if anyone wants to blame someone, then blame omegle for abusing that system, not me. it’s a good thing in theory only!

Reply

Gregory Scott April 3, 2011 at 4:33 PM

Bad news, children: your churlish attempts to fuck with the glorious and socially beneficial reCaptcha are in vain.

ReCaptcha will only accept an interpretation of an unknown word when it receives a *large* number of identical responses from an equally large number of different people.

So your sad little attempts at rebellion against ‘the man’ are merely being ignored… by a computer no less.

Oh, sweet irony, delicious is thy name!

Reply

Ted April 4, 2011 at 3:23 PM

Dear Gregory,

Thanks for your reply, but that doesn’t make any sense! I mean why would reCaptcha still show the same word after getting a “large number of identical responses”?!!

This is like destroying the main idea of reCaptcha “Stop spam. Read book.” Which is OCR and stopping spam. Think about it!

Reply

Joey April 13, 2011 at 8:05 PM

What he’s saying is that reCaptcha doesnt show the word only ONCE. The word that you ‘fucked up’ was correctly typed by a large number of people, and this number will ALWAYS be larger than the number of people who ‘fuck up’ the word.

So this ‘prank’ is definitely in vain.

Reply

Chrisb April 23, 2011 at 7:23 PM

No, it doesn’t make sense to mess with reCaptcha because you are going to have to use a Captcha method anyway. The websites that use reCaptcha could have used any captcha method. You would have still had to input words, recaptcha just utilizes your wasted effort. It’s like the new invention being tried out to use power generated by people walking over sidewalks in a city. You would have still used the same amount of effort to walk across down the street, just now it’s not wasted effort.

Reply

Dmytry October 13, 2011 at 8:35 PM

Did you even read the article? See, ‘captcha’ in recaptcha is an ordinary captcha with computer generated nonsense, that wastes same amount of your effort as other captchas do, and solving of which does not help digitize books. There’s also the second word, which does not protect from the bots, but solving which deprives some poor sod of a job.

Reply

aSSAD June 9, 2011 at 10:02 AM

btw the nytimes is charging for all the stuff we had to recaptch while helping them build their “open” database.

http://www.nytimes.com/ref/membercenter/nytarchive.html

the nytimes gets around 30 million visitors per month so even if only 1% of them pays they are still making a shitload of money

http://techgenie.com/latest/new-york-times-to-charge-for-digital-edition/

we have been working like drones so these sick bastards get even richer

Reply

Donald L Sykes November 24, 2011 at 11:40 PM

Yes they need to pay people to do this, otherwise all content digitized using this method should have to be provided for free.

Reply

klm November 25, 2011 at 1:24 PM

Here’s what I wonder: People have to ask for a new captcha when they can’t read a word. So, what happens to that unreadable one– stays in the system until read, right? But, would that mean that the percentage of unreadable words begins to climb until, one day, maybe 20 years from now, almost everything’s been digitized, and we only have the unreadable crap left?
I don’t usually troll, but this was fun. : )

Reply

Mike K March 1, 2012 at 10:42 PM

Can someone please tell me why I, yes the user not the pc, cannot see the words. When I go to buy tix for instance and the recapcha screen comes up, there are no words! Is my laptop too old or running an old browser? Please help…..

Reply

Dave March 10, 2012 at 9:17 PM

I freaking hate this whole word thing. Regardless of how many times I enter the stupid words they never work which never allows me to post comments on blogs. Besides they no longer even have a real word. They are all completely random letters now. Total Crap if you ask me.

Reply

Boof July 25, 2012 at 6:44 PM

I see it as a win-win.

Websites get a free widget to validate human users.
Google gets the text analysis of the end user.
The end user gets to post his comment (or whatever) without having to register or otherwise enter personal data.

Reply

Boof July 25, 2012 at 6:44 PM

(should have added “without waiting for a moderator” to the last sentence)

Reply

Johnny July 28, 2012 at 4:45 PM

Fucking recaptchas render automatic downloaders useless. I HATE them !

Reply

Martin August 22, 2012 at 1:40 PM

Recaptcha isn’t worth anything. I keep getting these spammers!

Reply

MacCentric November 5, 2012 at 9:01 PM

One thing I hate about Captcha is that it says to enter the two words, but invariably one of them isn’t a word at all. Add to that the fucked up manner in which they are displayed, and it ends up that more than once my brilliant insights were denied to a forum because I just gave up trying to figure out what the fuck the fucking fuck was.

Reply

Tom January 21, 2013 at 12:28 AM

I keep getting strings of readable letters that are not words. Should I type in the closest actual word or just enter letter for letter?

Reply

Leave a Comment

Previous post:

Next post: