Back

Rebuilt Structured Output System - Launch Week (Day 3)

Sid and Ritvik
December 20, 2025

A lot of structured extraction boils down to asking a model to output JSON and hoping it's right.

That works for simple documents. It falls apart on dense tables, multi-section layouts, and the heterogeneous formats that show up in real production workflows. When extraction is part of a critical pipeline, "mostly works" isn't good enough.

Today we're launching a completely rebuilt structured output system in Pulse.

Schema-First Extraction

The new system is designed around a schema-first approach. You define the target schema, and Pulse runs a two-step pipeline:

  1. Documents are converted into structured markdown, preserving layout and relationships.
  2. That intermediate output is transformed into schema-aligned data using proprietary models trained specifically for this task.

This separation improves accuracy on complex documents because each step can focus on what it does best. The markdown conversion handles document understanding. The schema alignment handles data transformation.

Built-In Traceability

Every extracted field includes citations back to its exact location in the source document.

When a downstream consumer asks where a value came from, you don't have to dig through the original PDF manually. The citation is there. This makes validation faster and gives teams confidence that outputs are grounded in the source material.

Programmatic Corrections

The update also introduces new endpoints that let teams correct or enhance structured outputs without reprocessing documents end to end.

Found an edge case? Fix the output directly and move on. No need to re-run the entire extraction pipeline for a single field correction.

Full frontend support for reviewing, editing, and validating results is included as well.

Available now in the Pulse platform and API.

platform.runpulse.com

Want to see the new structured output system on your documents? Schedule a demo.