Stay up to date on what’s shipping in the Extend platform.
Extraction Performance 4.8.0
Extraction Performance 4.8.0 is now the latest base processor version. It upgrades the base models used for core extraction and large array strategies
Stay up to date on what’s shipping in the Extend platform.
Extraction Performance 4.8.0 is now the latest base processor version. It upgrades the base models used for core extraction and large array strategies
Processor cost preview and charge-level usage breakdown
Run responses now include a finer-grained credit breakdown. Each entry in usage.breakdown can now include a charges array that itemizes the cost drivers behind that run’s credits — for example base processor, review agent, or page-specific add-ons like agentic text correction. Each charge lists the billing product, unit (page or cell), quantity, credits consumed, and applicable page numbers when billing is page-scoped.
For background on billing units, see How credits work. For the run usage shape, see usage on the extract run response.
charges on each usage.breakdown entry with product, unit, quantity, credits, and page-level detail where applicableCancel in-flight parse runs
You can now cancel parse runs that are still processing. Call POST /parse_runs/{id}/cancel on the 2026-02-09 API to cancel an in-progress parse run and set its status to CANCELLED. Only runs with status PROCESSING can be cancelled.
POST /parse_runs/{id}/cancel — abort an in-progress parse run (2026-02-09 API)Validation rules: INCLUDES and INCLUDESANY operators
Workflow validation formulas now support INCLUDES and INCLUDESANY for checking whether text appears inside any element of an array field. Unlike CONTAINS and CONTAINSALL, which match whole array elements, these operators perform case-insensitive substring checks—useful when you need to verify that one or more keywords appear in fields such as line item descriptions.
See Formulas for the full list of validation operators.
INCLUDES(array, text) — returns true if text appears anywhere inside any element of arrayINCLUDESANY(array, value1, value2, ...) — returns true if any of the values appears anywhere inside any element of arrayParse runs now include minOcrConfidence and avgOcrConfidence on chunk and block metadata — the minimum and average per-word OCR confidence across the words in that region.
Both fields are returned on every chunk and block, and are null when word-level confidence isn’t produced for that region. Values are in the range 0–1.
For word-level confidence scores, set returnOcr.words to true.
For API version 2026-02-09, you can optionally pass generate on POST /extractors instead of supplying config. Provide one to five sample inputs as Extend file IDs or file URLs and Extend will generate a JSON extraction schema from those examples and return the extractor with the schema applied. Optionally add generate.instructions (up to 2,500 characters) to provide additional context about the document type or requirements for how values should be extracted. You cannot combine generate with cloneExtractorId or config.
generate.files: 1–5 entries, each a file { id } or { url }.generate.instructions: optional free-text guidance to steer schema generation.You can opt in to an advanced extraction setting that adds a fixed “current date” line to the system prompt. The model can use it when a field depends on today, relative phrases like “30 days from now”, or ambiguous short dates on the page (for example interpreting 02/03/26). The value is taken from when the run was created in UTC.
The option is off by default. For API version 2026-02-09, set advancedOptions.currentDateEnabled to true. See the JSON Schema extraction guide for how advanced options fit into your setup.
Run objects now include richer usage metadata so you can see how credits relate to underlying work alongside the billed amount (credits). The existing usage.credits value is unchanged. For full run payloads and webhook events, responses can also include totalCredits, representing all contributing charges for that logical run (for example extraction plus parsing when parsing was billed for that run), and a breakdown array listing each contributing resource type, id, and credit amount.
On list endpoints, summaries include credits and totalCredits but omit breakdown to keep payloads small. Runs written before totalCredits and breakdown were stored may expose only credits; treat totalCredits and breakdown as optional. For background on billing units, see How credits work. For the full object shape, see usage on the extract run response.
Extraction citation mode control (line, word, block)
When bounding box citations are enabled on an extractor, you can set citationMode in advancedOptions to line, word, or block so citation polygons match the granularity you want. If you leave it unset, behavior matches what you have today (line-based citation processing plus block overlap handling across supported parse engines).
Extraction pipelines that use the parse 2.0.0-beta engine can now return bounding box citations for extracted values, so you are not limited to older parse versions when you need spatial references.
See Citations for how citations appear on extracted fields.
citationMode — optional; configure in extractor advanced options in Extend Studio or on extractors (line, word, or block)Two new endpoints make bulk background processing easier:
/extract/batch — queue thousands of files for extraction in the background without running into rate limits./parse/batch — bulk background parse operations.