Give your AI agents a reliable way to read documents. PDF4 is the PDF API and MCP server that LLM apps call to convert, OCR, extract, parse and summarize PDFs — returning clean Markdown and JSON built for RAG and tool-use.
LLMs are great at reasoning over text — but most real-world knowledge is locked in PDFs: invoices, contracts, reports, manuals and scanned forms. Raw PDF bytes are not something a model can read, and dumping a whole document into a prompt wastes context and money.
PDF tools for AI agents solve this by turning messy documents into structured, LLM-ready text your agent can actually use. PDF4 handles three jobs that show up in almost every agentic app:
Convert PDFs to Markdown or JSON, chunk them, and embed for retrieval. Headings, lists and tables are preserved so your retrieval stays accurate.
Expose PDF operations as callable tools. The agent decides when to OCR, extract or summarize a document, then acts on the result.
Build invoice, contract and knowledge-base agents that pull fields, route documents and generate outputs end to end.
The fastest way to wire PDFs into an LLM app is the PDF MCP server. The Model Context Protocol is the open standard that lets AI clients discover and call external tools, and PDF4 implements it — so Claude, Cursor, Windsurf and any MCP-compatible agent can list and invoke PDF4 tools directly, no custom integration code required.
Connect over the SSE transport and authenticate with an MCP API key as a Bearer token. The agent sees PDF operations as native tools it can call during a task.
Prefer plain HTTP? Every tool is also a REST endpoint — see the PDF API docs for the full list and the MCP section under /mcp/sse.
For retrieval-augmented generation you want LLM-ready text, not a wall of broken characters. PDF4 produces layout-aware output that keeps document structure intact:
Beyond conversion, PDF4 gives agents the AI document operations they reach for most:
Here's a realistic pattern: register PDF4 as a tool your agent can call, then run it. The function-calling schema below is illustrative; the HTTP call uses PDF4's real endpoint shape (Bearer token + multipart file upload).
Long-running operations follow the standard async flow: the API returns an operation_id; poll /api/pdf/status/{operation_id} and fetch the result from /api/pdf/download/{operation_id}. See the API docs for exact request/response shapes.
Drop in a PDF invoice, extract line items and totals as JSON, and let the agent reconcile or post to your ledger.
Batch-convert PDFs to Markdown, chunk and embed them, and power a RAG assistant grounded in your real docs.
Summarize long contracts, extract key clauses and dates, and surface answers with citations back to the source.
OCR image-only PDFs first, then extract or summarize — turning paper scans into structured agent input.
PDF tools for AI agents are document operations — convert, OCR, extract, parse, summarize and translate — exposed as an API or MCP server so an LLM agent can call them as functions. Instead of stuffing raw PDF bytes into a prompt, your agent sends a PDF to PDF4 and gets back clean Markdown, JSON or plain text it can reason over, embed for RAG, or feed into the next tool.
A PDF MCP server implements the Model Context Protocol so AI clients like Claude, Cursor and Windsurf can discover and call PDF tools directly. PDF4 ships an MCP server at /mcp/sse: connect with your MCP API key as a Bearer token and the agent can list available PDF tools and invoke them — converting, extracting and summarizing PDFs without any custom glue code.
Send the PDF to PDF4's PDF to Markdown or PDF to JSON endpoint. You get layout-aware, LLM-ready text with headings, lists and tables preserved — ideal for chunking and embedding into a vector store for retrieval-augmented generation. For scanned documents, run OCR first so the text layer is searchable.
Yes. The Extract Data endpoint uses AI to pull structured fields — invoice numbers, totals, dates, parties, line items — from PDFs and return them as JSON. Pass an optional schema describing the fields you want, or let it auto-detect common documents like invoices, contracts and forms.
PDF4's summarize endpoint gives agents a one-call way to condense a long PDF into a concise summary. Because it runs server-side and returns plain text, your agent stays inside its context budget instead of loading the whole document — useful for triage, knowledge-base ingestion and document automation.
Yes. REST endpoints use a Bearer token (JWT) in the Authorization header; the MCP server uses an MCP API key as a Bearer token. Create a free account to get credentials, then point your agent or HTTP client at the documented endpoints.
Get a free API key, connect the MCP server, and start parsing PDFs into Markdown and JSON your LLM can use.