PCLinuxOS Magazine
Repo Review: Gscan2pdf

by CgBoy

Gscan2pdf is a document scanning tool that allows you to easily post-process and create PDF or DjVu documents from your scanned pages. It has support for Optical Character Recognition (OCR), and also has a number of different image enhancing filters.

Gscan2pdf is pretty straightforward to use. Just hit the little scanner button in the toolbar to open up the Scan Document window. From here, you can select your scanner from the dropdown menu (Scanning in Gscan2pdf is handled via the SANE library). There are some options available for changing the color mode, page size and rotation, double or single sided page mode, number of pages to be scanned, hue, brightness, white level, and numerous other settings.

The images you scan will appear in the vertical bar on the left, from which you can easily rearrange their order by dragging them into place. You can also load in any normal image files, such as pages you may have already scanned, to process and turn into PDF or DjVu documents.

After you've finished scanning, there are a number of post-processing options you can use to help clean up the scanned images. You can rotate, crop, sharpen, invert, and change the brightness, contrast, and threshold values of each page (The threshold setting will make all pixels darker than a certain value black, and change the rest to white). Gscan2pdf also has a tool to help you better align and center the scanned images, but I did not seem to have great results when using it.

One of Gscan2pdf's more interesting features is the ability to generate text from scanned papers using an Optical Character Recognition engine (Tesseract or Cuneiform must be installed for it to work). The generated text can then be used to allow you to search for words and select text in the saved out DjVu or PDF documents. You can start the OCR process from the Tools menu and view the results in the OCR Output tab. It gives you a few options, such as which OCR engine to use, which language to recognize, and whether or not to apply any threshold changes in order to make the text on the scanned image easier for the computer to read. The whole OCR process is not terribly fast and can give mixed results. However, it is still a useful option to have.

Gscan2pdf can export your scanned pages to DjVu and obviously PDF documents, as well as a variety of different image formats. If you have used the OCR tool, you can also export the document as just a plain TXT file containing the text generated by the OCR engine.


I found Gscan2pdf to work fairly well for stitching together some scanned pages I had from an old telescope manual. I did experience some lag with the user interface at times, but I think that was probably related to the OCR feature, which seemed to slow things down a bit when used. All in all, Gscan2pdf is an excellent tool for anyone who needs to scan a large amount of paperwork.

