PDF OCR

Reads the text out of scanned PDFs and adds a searchable text layer behind the image. Uses the Tesseract OCR engine.

What it does

Old scanned contracts, phone-photo invoices, archive documents - you cannot Ctrl+F inside them because there is no real text, just an image. This tool reads the image and adds a real text layer behind it. The PDF looks identical but is now searchable and copyable.

How to use

  1. Drag the scanned PDFs into the list.
  2. Pick the document language from Primary Language (Turkish, English, German, French or mixed).
  3. Leave Auto-Deskew and De-speckle on (recommended for scans).
  4. Click Run.

You get one searchable PDF per input, looking exactly the same as before.

Language options

LanguageWhen to pick it
Turkish (tur)Turkish-only document
English (eng)English-only document
German (deu)German-only document
French (fra)French-only document
Turkish + EnglishMixed academic papers, documents with quotes
English + GermanMixed technical documents

Mixed-language options improve accuracy but the run takes around 30-50% longer.

Options

  • Auto-Deskew: Straightens scanned pages that came out tilted. Leave on for scans.
  • De-speckle: Removes dust spots and small black marks. Leave on for old or low-quality scans.
  • Force OCR: Wipes any existing text layer and re-runs OCR from scratch. Only turn on if the current text layer is broken or garbled.

Examples

Make a scanned contract archive searchable: Add 50 contracts, set language to Turkish, run with default settings. All of them become Ctrl+F searchable.

Mixed-language academic paper: Add the paper, pick "Turkish + English", run.

Fix a broken text layer: A PDF looks searchable but Ctrl+F returns garbage. Turn Force OCR on, pick the language, run.

German manual: Add the PDF, pick German, run with defaults.

Watch out

  • The tool needs Tesseract OCR installed on your machine. Download: https://github.com/UB-Mannheim/tesseract/wiki
  • Turkish, German and French require extra language packs installed inside Tesseract.
  • Encrypted PDFs are not handled. Unlock them with PDF Encrypt first.
  • Very low resolution scans (under 150 DPI) give poor recognition.
  • Slightly tilted pages are auto-corrected, but pages rotated a full 90 degrees are not.

License

This tool is Ultimate only. It does not appear in the Free or Office plans.