PDF OCR
Reads the text out of scanned PDFs and adds a searchable text layer behind the image. Uses the Tesseract OCR engine.
What it does
Old scanned contracts, phone-photo invoices, archive documents - you cannot Ctrl+F inside them because there is no real text, just an image. This tool reads the image and adds a real text layer behind it. The PDF looks identical but is now searchable and copyable.
How to use
- Drag the scanned PDFs into the list.
- Pick the document language from Primary Language (Turkish, English, German, French or mixed).
- Leave Auto-Deskew and De-speckle on (recommended for scans).
- Click Run.
You get one searchable PDF per input, looking exactly the same as before.
Language options
| Language | When to pick it |
|---|---|
| Turkish (tur) | Turkish-only document |
| English (eng) | English-only document |
| German (deu) | German-only document |
| French (fra) | French-only document |
| Turkish + English | Mixed academic papers, documents with quotes |
| English + German | Mixed technical documents |
Mixed-language options improve accuracy but the run takes around 30-50% longer.
Options
- Auto-Deskew: Straightens scanned pages that came out tilted. Leave on for scans.
- De-speckle: Removes dust spots and small black marks. Leave on for old or low-quality scans.
- Force OCR: Wipes any existing text layer and re-runs OCR from scratch. Only turn on if the current text layer is broken or garbled.
Examples
Make a scanned contract archive searchable: Add 50 contracts, set language to Turkish, run with default settings. All of them become Ctrl+F searchable.
Mixed-language academic paper: Add the paper, pick "Turkish + English", run.
Fix a broken text layer: A PDF looks searchable but Ctrl+F returns garbage. Turn Force OCR on, pick the language, run.
German manual: Add the PDF, pick German, run with defaults.
Watch out
- The tool needs Tesseract OCR installed on your machine. Download: https://github.com/UB-Mannheim/tesseract/wiki
- Turkish, German and French require extra language packs installed inside Tesseract.
- Encrypted PDFs are not handled. Unlock them with PDF Encrypt first.
- Very low resolution scans (under 150 DPI) give poor recognition.
- Slightly tilted pages are auto-corrected, but pages rotated a full 90 degrees are not.
License
This tool is Ultimate only. It does not appear in the Free or Office plans.