But, it's not exactly turn-key. You need to:
1. download the source
2. compile it for a Mac
3. download a language file
4. copy it to the appropriate directory
5. run it on TIFF files that need to be renamed to a .tif extension.
Tesseract won't run unless you copy the language file to /usr/local/share/tessdata. Which is strange, because it uses it very irregularly. Most of the miss-read results are simple English words: you get "iist" instead of "list", "lf" instead of "if". It makes you wonder how exactly it is applying this language file.
If you use a Mac utility like Textedit, or Word, or Open Office, the spell-checker can find and help you fix these in a matter of moments. But, still, it's irritating, when you have a long document. This software needs to be 'productized'.
So, the actual sequence:
1. go here, and download tesseract-2.03.tar.gz.
2. In a Terminal window (Applications->Utilities), find your download directory, cd there, and:
: gunzip tesseract-2.03.tar.gz
: tar xvf tesseract-2.03.tar
3. cd to the tesseract-2.03 directory, then:
:./configure
:sudo make
:sudo make install
4. Go back here, and download tesseract-2.00.eng.tar.gz, then, find your download directory, and:
: gunzip tesseract-2.00.eng.tar.gz
: tar xvf tesseract-2.00.eng.tar
: cd tessdata
: sudo bash
: cp * /usr/local/share/tessdata/
Then hit control-d to exit the sudo bash shell.
Make a TIFF file, be sure it has a .tif extension, and then issue a command like this:
tesseract document-image.tif document-results
... and then you'll have text in document-results.txt
Works great. It should come standard with a Mac. With a graphic user interface. And some corrections to the language file use.
