Optical Character Recognition (OCR)

Extract text from images, PDFs, and document scans using AI-powered OCR for automated document processing in your workflows.

The OCR node extracts readable text from images, scanned documents, PDFs, photographs of text, and other visual media containing written content. It uses vision-capable AI models from OpenAI and Anthropic to interpret and transcribe text from images, making it ideal for processing uploaded documents, receipts, forms, labels, and any image that contains text you need to capture digitally.

Supported Providers and Models

Provider	Models	Notes
OpenAI	GPT-4.1, GPT-4.1 Mini, GPT-4o	GPT-4.1 for highest accuracy
Anthropic	Claude Sonnet 4.6, Claude Sonnet 4.5, Claude Haiku 4.5	Claude Sonnet 4.6 for detailed extraction

OCR in Buildorado is powered by vision-capable models that understand both the visual layout and the textual content of images. This means they can handle not just printed text but also handwriting, stylized fonts, text at angles, and text embedded in complex visual contexts.

How OCR Works

Unlike traditional OCR engines that use pattern matching to recognize individual characters, Buildorado's OCR node leverages large vision models that understand the full context of an image. This approach:

Handles varied fonts, sizes, and styles without pre-configuration
Reads text at any angle or orientation
Interprets handwriting with reasonable accuracy
Understands document structure (headers, paragraphs, tables, lists)
Maintains reading order even in complex multi-column layouts

The model processes the entire image and returns all detected text as a single string output.

Configuration

Provider

Select OpenAI (GPT-4 Vision) or Anthropic (Claude Vision). The model dropdown updates accordingly.

Model

Choose the vision-capable model for text extraction:

OpenAI models:

Model	Speed	Quality	Best For
GPT-4.1	Moderate	Excellent	Highest accuracy, complex documents
GPT-4.1 Mini	Fast	Good	Quick extraction, simple documents
GPT-4o	Fast	Good	General-purpose extraction

Anthropic models:

Model	Speed	Quality	Best For
Claude Sonnet 4.6	Moderate	Excellent	Detailed extraction, complex layouts
Claude Sonnet 4.5	Moderate	Good	General-purpose extraction
Claude Haiku 4.5	Fast	Good	Simple documents, high volume

Credential

Select a saved API key for the chosen provider. See Credential Management for setup instructions.

Image/PDF URL

The URL of the image or PDF to process for text extraction. This field supports template variables, typically referencing a file upload form field.

Supported formats:

Format	Extension	Notes
JPEG / JPG	.jpg, .jpeg	Most common for photos and scans
PNG	.png	Best for screenshots and digital documents
WebP	.webp	Modern web format
GIF	.gif	First frame only
PDF	.pdf	Document files with text or scanned pages

The image or PDF must be accessible via a public or signed URL. Buildorado file uploads automatically generate accessible URLs.

Tips for best results:

Higher resolution images produce more accurate text extraction.
Ensure adequate contrast between text and background.
Minimize shadows, glare, and obstructions over text areas.
For multi-page documents, process each page as a separate image or use a single PDF.

Language Hint

An optional hint that tells the model which language to expect in the image. While the model can auto-detect languages, providing a hint improves accuracy for:

Languages with similar-looking characters (e.g., distinguishing Chinese, Japanese, and Korean)
Mixed-language documents where one language dominates
Low-quality images where character recognition is ambiguous

Available language hints: Auto-detect, English, Spanish, French, German, Japanese, Korean, Chinese, Arabic.

Leave this set to Auto-detect for most common languages and clear images.

Output

The OCR node produces:

Field	Type	Description
`text`	string	All extracted text content from the image
`blocks`	array	Array of text blocks with `text`, `confidence`, and optional `boundingBox` (when available)
`language`	string	Detected language of the text
`model`	string	The model that was used
`provider`	string	The provider that was used

The extracted text is available to downstream nodes via template variables. Common downstream uses include:

Feeding the text into an Agent node for analysis, classification, or summarization
Storing the text in a Google Sheet or database
Searching the text for specific keywords using a Branch node
Including the text in email notifications

Use Cases

Receipt and Invoice Processing

Automate expense management:

An employee uploads a receipt photo through an expense form.
The OCR node extracts all text from the receipt.
An Agent node (with structured JSON output) parses the extracted text into fields: vendor name, date, total amount, tax, line items.
The structured data is pushed to a Google Sheet or accounting system.
Receipts above a threshold are routed to a manager for approval.

Document Digitization

Convert paper documents to searchable text:

A user uploads a scanned document or PDF through a form.
The OCR node extracts the full text content.
The text is stored alongside the original image for full-text search.
An Agent node generates a summary and tags the document by category.

Business Card Processing

Extract contact information from business card photos:

A sales rep photographs a business card and uploads it via a mobile-friendly form.
OCR extracts all visible text.
An Agent node (structured JSON output) parses the text into name, title, company, email, phone, and address fields.
The contact is automatically created in HubSpot or Salesforce.

Form and Survey Scanning

Digitize paper forms and surveys:

A user uploads a photo of a filled-out paper form.
OCR extracts all handwritten and printed text.
An Agent node maps the extracted text to the corresponding form fields.
The digitized data enters the same workflow as digital form submissions.

ID and Document Verification

Extract data from identification documents:

A user uploads a photo of their driver's license or passport.
OCR extracts the name, date of birth, ID number, and expiration date.
A Branch node checks if the document is expired (comparing the expiration date to today).
Valid documents proceed through the workflow; expired ones trigger a re-upload request.

Label and Packaging Reading

Process product labels and packaging:

A quality control form includes a photo of a product label.
OCR extracts ingredient lists, nutritional information, batch numbers, and expiration dates.
An Agent node checks for compliance with labeling regulations.
Non-compliant products are flagged for review.

Handwritten Note Capture

Digitize handwritten notes and whiteboards:

A user photographs handwritten meeting notes or a whiteboard.
OCR extracts the text (accuracy varies with handwriting legibility).
An Agent node organizes the notes into structured action items.
The organized notes are sent to the team's Slack channel and stored in Notion.

OCR vs. Vision: When to Use Each

Both OCR and Vision nodes process images, but they are optimized for different tasks:

Aspect	OCR	Vision
Primary goal	Extract text content	Understand image content
Output	Raw text from the image	Descriptive analysis or answers
Best for	Documents, receipts, labels, forms	Photos, scenes, verification, analysis
Understands layout	Yes (reading order, structure)	Yes (spatial relationships)
Answers questions	No (text extraction only)	Yes (responds to prompts about the image)
PDF support	Yes	No

Use OCR when your primary goal is to get the text out of an image or PDF.

Use Vision when you need to understand what the image shows, verify visual content, or answer specific questions about the image.

Combine both when you need to extract text and also understand the visual context. For example, extract text from a receipt with OCR, then use Vision to verify the receipt is genuine and not a screenshot.

Best Practices

Maximize image quality. Higher resolution and better contrast directly improve OCR accuracy. If users upload photos, provide guidance on image quality (good lighting, steady camera, full document in frame).
Process one page at a time. For multi-page documents, use separate file upload fields or a Loop node to process each page individually. Alternatively, upload a PDF file which the model can handle directly.
Combine OCR with structured extraction. OCR returns raw text. To get structured data (specific fields, values, categories), pipe the OCR output into an Agent node with a prompt that parses the text into JSON.
Provide a language hint when processing documents in languages that use non-Latin scripts or when the image quality is low.
Validate extracted data. OCR is not perfect. For critical data (financial amounts, ID numbers), consider adding a human review step or validation logic downstream.
Handle empty or unreadable images. Add error handling for cases where the image contains no text or the text is too blurry to read. The node may return empty or garbled text rather than failing outright.
Consider file size. Very large images are resized by the provider. If the text is small relative to the image, ensure the resolution is high enough that text remains legible after any resizing.

Limitations

Handwriting recognition accuracy varies significantly with handwriting legibility. Neat handwriting works well; messy handwriting may produce errors.
The node processes one image or PDF per execution. For batch OCR, use a Loop node.
Very small text or text in low-contrast environments may not be extracted accurately.
The node returns raw text. It does not automatically structure the text into fields or key-value pairs. Use a downstream AI node for structured extraction.
Complex layouts (overlapping text, extreme angles, heavily decorated backgrounds) may reduce accuracy.
The image must be accessible via URL. Local file paths are not supported.
Execution is subject to a 120-second timeout.

Optical Character Recognition (OCR)

On this page