jump to navigation

Stop Spam, Read Books February 22, 2008

Posted by Nirmal in IT, fun, technology, work.
add a comment

recaptcha.jpg

(Image: recaptcha.net)

Got reCAPTCHA for two applications I’ve been handling yesterday. Had always wanted to do that, but as always the rush to do good things actually prevent you from doing any good. So far we’ve used some imaging library locally to generate CAPTCHAs wherever they were required, but this is great cool and excellent.

I still remember the first time I saw a CAPTCHA from reCAPTCHA on Facebook- the hallmark two words, distorted by twisting and then further enhanced by a run-through curvy line. Those days we even had to solve CAPTCHAs virtually to turn around there, means even for simple things such as poking. After a mobile confirmation it got better, but they were a cool idea.

CAPTCHAs are the distorted pieces of text that you see at the bottom of your everyday web form. As I can remember, about 60 million CAPTCHAs are solved by people per day (recaptcha.net). They’re there to prevent auto-submission by programmatic tricks called bots for services such as free email services. They can do this real fast, but are kinda slow at things humans can do with a bit of squinting, like CAPTCHA solving. The reCAPTCHA website has a nice story about an online poll asking what’s the best computer science graduate school, and Carnegie Mellon students had written an automated program to vote for CMU. The next day MIT students had written their own, making it virtually a war between bots, with MIT winning the poll with 21750 or something with CMU following closely behind. You get the idea. It’s serious shit.

Unfortunately most or today’s CAPTCHAs (even though some seem challenging to humans too :)) technology developed to defeat them. That’s when the people at the Carnegie Mellon University’s computer science dep. came up with the idea for reCAPTCHA service. CAPTCHAs used in the reCAPTCHA service are unique in the sense that they are already proven images to be unreadable by automated methods. What they’re doing is this. The are in a project digitizing old books from The Internet Archive. In this process, there are some words found in those texts that aren’t reliably readable by computers. They always get it wrong. ReCAPTCHA get these, add some spice by twisting them a bit and sending a curvy line through (shoot and then, to make sure, stab..) , and make them available through a web service for us simpletons to use in our bot-invaded sites. This is a service to the text digitization project as well. When a proper number of that 60 million quota are reCAPTCHA, that means millions of OCR-unreadable words are translated correctly by human ‘volunteers’. There stems the reCAPTCHA slogan, “Stop Spam, Read Books”. Everytime you solve a reCAPTCHA, you help digitize and preserve old books.

The question remains on how the reCAPTCHA decides whether the solution given by a user is correct if the words cannot be read by computers. It’s like this: one of each reCAPTCHA you see contains one word which was successfully translated by several human users previously, and another word which is completely new. If the previously read word was solved correctly, the CAPTCHA is deemed solved. But still there’s the assurance that an automated means cannot pass the test.

ReCAPTCHA service is continuously being improved to fight the never-ending battle from spammers. This is a great advantage than using an onsite solution. You get an assured, world-class solution which is being always improved. And it’s real easy-peasey to implement as well. Sign up for your domain and you get an API key pair, public and private. You include them in your form processing- or whatever script, along with some simple code to generate and validate the CAPTCHA, along with the code library which handles the crunching. The steps are here. ReCAPTCHA supports a variety of languages including the Internet sandwich paste PHP, which I tried.

Enough said, if you need a CAPTCHA solution (or even if you don’t :)), why not try reCAPTCHA? It’s brains courtesy of CMU, servers courtesy of Intel, and Linux courtesy of Novell, among others. Run to their site and sign up for free now, it’s always better to read books than to spam. Try it, you’ll love it!

One Workday from News Feed February 20, 2008

Posted by Nirmal in food, life, work.
add a comment

clipboard01.jpg

Go Dvorák! February 16, 2008

Posted by Nirmal in IT, life, technology.
5 comments

dvorak.jpg

Right- and left-handed Dvorák formats (via Wikipedia)

Yesterday we started upon using the US Dvorák computer keypad format. Sandy is convinced that’d bring our productivity up by +20% or something. He pioneered the somewhat difficult conversion with a small number of other engineers including myself and Big D.

So after the engineering management meeting, we got printed keypad formats distributed to everyone consented and we read a bit about it and tried the conversion. Big D came round and helped my computer to get converted: seems that Microsoft supported the format from about Windows 95. But heck, difficult without on-key stickers. Being one of the first undergoing conversion training, I was about to get a set of stickers first, but then it turned out the stickers were printed the other way round. The guy had printed the key faces on the peel-off section of the sticker! So I had to switch back to the normal QWERTY format and go ahead. So was spent Tuesday.

The best part happened yesterday. I came in the morning and could log in usually. Then I locked the computer for a second and tried to unlock. Firetruck, the unlock dialog box was taking in Dvorák format entries. And my keypad is in QWERTY.

Riz, Ru and D Jr immediately respond to my call for air support and try to log in. None can [with the great commotion around I wonder whether they'd be able to even if the keypad was qwerty]. The printed layout is somewhat misleading. It has a single-row return key. Then Ru suggests testing each type in the user name field before typing in the password. After some exhaustion he logs in for me. I immediately switch the keypad format to US English for Windows Explorer. But still I have to type the password in QWERTY for login.

Afterwards D Jr gave me a replacement for a while while he took out almost all alpha keys off my keypad and switches them back in in Dvorák format. But they were of different heights now, and it looks kinda bit out of place. But the biggest problem is typing itself. Back to Ko Ko [A ko B ko C ko etc.] typing :-D. Reminded me of a Microsoft Word typing lesson I took once in high school. Ohh.. soo frustrating. IM is the biggest prob. People must be thinking I’ve suddenly gone dyslexic or something.

To crown all I jammed the ‘J’ key just before going home. Not to mention that I’m stuck hell middle of an engineering project. But we’re not turning back; Courage, Gentlemen! Go Dvorák!

(Although written early last week, this post couldn’t see light of day til now since the format switch was so crippling)

JDK Linux February 1, 2008

Posted by Nirmal in IT, technology.
1 comment so far

 sun-java.jpg

sun.com/java

Atlast got the ubiquitous Java Development Kit on Edubuntu. It was bit of a trouble to start with. I was starting on Java but since playing with it on a Compaq at the prestigious 15th Floor Lab, I didn’t meet it on Linux. I’ve got NetBeans on Windows, but I rarely go there and I feel a bit outta place to get there just for a cuppa Java. Previously I tried to get NetBeans on *buntu but I got busted somehow. So yesterday I walked upto our Java expert Shash and ask what I do. Shash says of course, and recommends that I get the 60MB something download available at the Stanford University Network’s excellent download center. They have a cool amount of install instructions as well, and it goes without a hitch. Then comes trouble.

Got the Test listing from the language specification purely for test purposes and fire up the thing. Javac path hasn’t been set yet so I compile by referring to it via the complete path. But java seem to be already on the PATH. So I compile and try to run, and java dares to spit this out at me:

astdb@localhost:~$ /opt/jdk/jdk1.6.0_04/bin/javac -g Test.java
astdb@localhost:~$ java Test Hello, World!
Exception in thread “main” java.lang.ClassFormatError: Test (unrecognized class file version)
at java.lang.VMClassLoader.defineClass(libgcj.so.7)
at java.lang.ClassLoader.defineClass(libgcj.so.7)
at java.security.SecureClassLoader.defineClass(libgcj.so.7)
at java.net.URLClassLoader.findClass(libgcj.so.7)
at java.lang.ClassLoader.loadClass(libgcj.so.7)
at java.lang.ClassLoader.loadClass(libgcj.so.7)
at java.lang.Class.forName(libgcj.so.7)
at gnu.java.lang.MainThread.run(libgcj.so.7)
astdb@localhost:~$

What the?? i didn’t make the class file, you did. So if YOU can’t interpret it, who the heck can??

No amount of fooling around that night gives any help and I’m tired. JK consortium is about to kill me. So I meet with Shash yesterday, and tell him. He digs about the Internet, and comes up with the exact thing. What was the glitch? Simply the cool java command I’ve been conveniently using hasn’t been pointing to /opt/jdk/jdk1.6.0_04/bin/java, which I was intending to invoke, but to some muck. So come home, say PATH=”/opt/jdk/jdk1.6.0_04/bin:$PATH” and all is seem to be set. All quiet on Western Front. Let me brew a cup of Java.