OCR Extractor

Pulls text out of PDFs and images. Multiple AI engines, multiple languages. Live preview for a single file, or batch a whole folder.

What it does

Use this to extract data from scanned invoices, digitise old documents, turn image text into editable text, or make a scanned PDF archive searchable.

Note: This tool is universal (PDFs and images). For PDF-only the pdf_tools/ocr is simpler and faster.

How to use

Three modes:

Single (single-file live preview)

  1. Drag the file.
  2. The preview appears on the left.
  3. Pick language and engine.
  4. Recognised text updates live on the right.

Batch

  1. Add multiple files or a folder.
  2. Pick an Output Format: TXT, JSON, DOCX, or searchable PDF.
  3. Click Run.

Settings

Saves defaults for engine, language, DPI, confidence threshold.

Supported formats

Input: JPG, PNG, WebP, BMP, TIFF, PDF.

Output: TXT, JSON (structured), DOCX, Searchable PDF.

OCR engines

EngineTrait
Tesseract (default)Fast, broad language support.
EasyOCRBetter on complex text. Speeds up with a GPU.
PaddleOCRGood for Asian languages. GPU supported.

Language options

Tesseract uses every installed language pack. Default is English + Turkish (eng, tur). Turkish needs tur.traineddata installed in Tesseract.

DPI options

For PDF rendering: 150, 200, 300, 400, 600. Default 300 is balanced, 600 is high quality but slow.

Examples

Pull data from an invoice: Single mode, drag the invoice image, pick Turkish, copy the text.

Make a scanned archive searchable: Batch mode, add the folder, output Searchable PDF, run.

Handwritten note photos to text: Single mode, EasyOCR, Turkish, gives better results on cursive.

Extract structured JSON: Batch mode, add the form scans, output JSON, run. For programmatic use.

Watch out

  • Tesseract must be installed on the system.
  • EasyOCR and PaddleOCR need their Python packages installed.
  • Turkish needs an extra language pack.
  • If the PDF already has a text layer, OCR is overkill, extract the text directly.
  • Very low-resolution images hurt accuracy.
  • Handwriting is limited, typewritten/digital text is the sweet spot.

License

This tool is Ultimate only. Disabled in the Free and Office plans.