Optical Character Recognition (OCR)
Extract text from images, PDFs, and document scans using AI-powered OCR for automated document processing in your workflows.
The OCR node extracts readable text from images, scanned documents, PDFs, photographs of text, and other visual media containing written content. It uses vision-capable AI models from OpenAI and Anthropic to interpret and transcribe text from images, making it ideal for processing uploaded documents, receipts, forms, labels, and any image that contains text you need to capture digitally.
Supported Providers and Models
| Provider | Models | Notes |
|---|---|---|
| OpenAI | GPT-4.1, GPT-4.1 Mini, GPT-4o | GPT-4.1 for highest accuracy |
| Anthropic | Claude Sonnet 4.6, Claude Sonnet 4.5, Claude Haiku 4.5 | Claude Sonnet 4.6 for detailed extraction |
OCR in Buildorado is powered by vision-capable models that understand both the visual layout and the textual content of images. This means they can handle not just printed text but also handwriting, stylized fonts, text at angles, and text embedded in complex visual contexts.
How OCR Works
Unlike traditional OCR engines that use pattern matching to recognize individual characters, Buildorado's OCR node leverages large vision models that understand the full context of an image. This approach:
- Handles varied fonts, sizes, and styles without pre-configuration
- Reads text at any angle or orientation
- Interprets handwriting with reasonable accuracy
- Understands document structure (headers, paragraphs, tables, lists)
- Maintains reading order even in complex multi-column layouts
The model processes the entire image and returns all detected text as a single string output.
Configuration
Provider
Select OpenAI (GPT-4 Vision) or Anthropic (Claude Vision). The model dropdown updates accordingly.
Model
Choose the vision-capable model for text extraction:
OpenAI models:
| Model | Speed | Quality | Best For |
|---|---|---|---|
| GPT-4.1 | Moderate | Excellent | Highest accuracy, complex documents |
| GPT-4.1 Mini | Fast | Good | Quick extraction, simple documents |
| GPT-4o | Fast | Good | General-purpose extraction |
Anthropic models:
| Model | Speed | Quality | Best For |
|---|---|---|---|
| Claude Sonnet 4.6 | Moderate | Excellent | Detailed extraction, complex layouts |
| Claude Sonnet 4.5 | Moderate | Good | General-purpose extraction |
| Claude Haiku 4.5 | Fast | Good | Simple documents, high volume |
Credential
Select a saved API key for the chosen provider. See Credential Management for setup instructions.
Image/PDF URL
The URL of the image or PDF to process for text extraction. This field supports template variables, typically referencing a file upload form field.
Supported formats:
| Format | Extension | Notes |
|---|---|---|
| JPEG / JPG | .jpg, .jpeg | Most common for photos and scans |
| PNG | .png | Best for screenshots and digital documents |
| WebP | .webp | Modern web format |
| GIF | .gif | First frame only |
| Document files with text or scanned pages |
The image or PDF must be accessible via a public or signed URL. Buildorado file uploads automatically generate accessible URLs.
Tips for best results:
- Higher resolution images produce more accurate text extraction.
- Ensure adequate contrast between text and background.
- Minimize shadows, glare, and obstructions over text areas.
- For multi-page documents, process each page as a separate image or use a single PDF.
Language Hint
An optional hint that tells the model which language to expect in the image. While the model can auto-detect languages, providing a hint improves accuracy for:
- Languages with similar-looking characters (e.g., distinguishing Chinese, Japanese, and Korean)
- Mixed-language documents where one language dominates
- Low-quality images where character recognition is ambiguous
Available language hints: Auto-detect, English, Spanish, French, German, Japanese, Korean, Chinese, Arabic.
Leave this set to Auto-detect for most common languages and clear images.
Output
The OCR node produces:
| Field | Type | Description |
|---|---|---|
text | string | All extracted text content from the image |
blocks | array | Array of text blocks with text, confidence, and optional boundingBox (when available) |
language | string | Detected language of the text |
model | string | The model that was used |
provider | string | The provider that was used |
The extracted text is available to downstream nodes via template variables. Common downstream uses include:
- Feeding the text into an Agent node for analysis, classification, or summarization
- Storing the text in a Google Sheet or database
- Searching the text for specific keywords using a Branch node
- Including the text in email notifications
Use Cases
Receipt and Invoice Processing
Automate expense management:
- An employee uploads a receipt photo through an expense form.
- The OCR node extracts all text from the receipt.
- An Agent node (with structured JSON output) parses the extracted text into fields: vendor name, date, total amount, tax, line items.
- The structured data is pushed to a Google Sheet or accounting system.
- Receipts above a threshold are routed to a manager for approval.
Document Digitization
Convert paper documents to searchable text:
- A user uploads a scanned document or PDF through a form.
- The OCR node extracts the full text content.
- The text is stored alongside the original image for full-text search.
- An Agent node generates a summary and tags the document by category.
Business Card Processing
Extract contact information from business card photos:
- A sales rep photographs a business card and uploads it via a mobile-friendly form.
- OCR extracts all visible text.
- An Agent node (structured JSON output) parses the text into name, title, company, email, phone, and address fields.
- The contact is automatically created in HubSpot or Salesforce.
Form and Survey Scanning
Digitize paper forms and surveys:
- A user uploads a photo of a filled-out paper form.
- OCR extracts all handwritten and printed text.
- An Agent node maps the extracted text to the corresponding form fields.
- The digitized data enters the same workflow as digital form submissions.
ID and Document Verification
Extract data from identification documents:
- A user uploads a photo of their driver's license or passport.
- OCR extracts the name, date of birth, ID number, and expiration date.
- A Branch node checks if the document is expired (comparing the expiration date to today).
- Valid documents proceed through the workflow; expired ones trigger a re-upload request.
Label and Packaging Reading
Process product labels and packaging:
- A quality control form includes a photo of a product label.
- OCR extracts ingredient lists, nutritional information, batch numbers, and expiration dates.
- An Agent node checks for compliance with labeling regulations.
- Non-compliant products are flagged for review.
Handwritten Note Capture
Digitize handwritten notes and whiteboards:
- A user photographs handwritten meeting notes or a whiteboard.
- OCR extracts the text (accuracy varies with handwriting legibility).
- An Agent node organizes the notes into structured action items.
- The organized notes are sent to the team's Slack channel and stored in Notion.
OCR vs. Vision: When to Use Each
Both OCR and Vision nodes process images, but they are optimized for different tasks:
| Aspect | OCR | Vision |
|---|---|---|
| Primary goal | Extract text content | Understand image content |
| Output | Raw text from the image | Descriptive analysis or answers |
| Best for | Documents, receipts, labels, forms | Photos, scenes, verification, analysis |
| Understands layout | Yes (reading order, structure) | Yes (spatial relationships) |
| Answers questions | No (text extraction only) | Yes (responds to prompts about the image) |
| PDF support | Yes | No |
Use OCR when your primary goal is to get the text out of an image or PDF.
Use Vision when you need to understand what the image shows, verify visual content, or answer specific questions about the image.
Combine both when you need to extract text and also understand the visual context. For example, extract text from a receipt with OCR, then use Vision to verify the receipt is genuine and not a screenshot.
Best Practices
- Maximize image quality. Higher resolution and better contrast directly improve OCR accuracy. If users upload photos, provide guidance on image quality (good lighting, steady camera, full document in frame).
- Process one page at a time. For multi-page documents, use separate file upload fields or a Loop node to process each page individually. Alternatively, upload a PDF file which the model can handle directly.
- Combine OCR with structured extraction. OCR returns raw text. To get structured data (specific fields, values, categories), pipe the OCR output into an Agent node with a prompt that parses the text into JSON.
- Provide a language hint when processing documents in languages that use non-Latin scripts or when the image quality is low.
- Validate extracted data. OCR is not perfect. For critical data (financial amounts, ID numbers), consider adding a human review step or validation logic downstream.
- Handle empty or unreadable images. Add error handling for cases where the image contains no text or the text is too blurry to read. The node may return empty or garbled text rather than failing outright.
- Consider file size. Very large images are resized by the provider. If the text is small relative to the image, ensure the resolution is high enough that text remains legible after any resizing.
Limitations
- Handwriting recognition accuracy varies significantly with handwriting legibility. Neat handwriting works well; messy handwriting may produce errors.
- The node processes one image or PDF per execution. For batch OCR, use a Loop node.
- Very small text or text in low-contrast environments may not be extracted accurately.
- The node returns raw text. It does not automatically structure the text into fields or key-value pairs. Use a downstream AI node for structured extraction.
- Complex layouts (overlapping text, extreme angles, heavily decorated backgrounds) may reduce accuracy.
- The image must be accessible via URL. Local file paths are not supported.
- Execution is subject to a 120-second timeout.