Back

One Billion Pages Processed: Introducing the Next Chapter of Pulse

Sid and Ritvik
December 18, 2025

Building for failure modes, not milestones

When we started Pulse, we did not think in terms of scale milestones. We thought in terms of failure modes.

The earliest work was driven by breakdowns we kept seeing in real documents. Tables collapsing under minor layout shifts. Reading order drifting across sections. Fields extracting cleanly but incorrectly. These were not edge cases. They were the default in the environments we were building for.

As Pulse moved from early deployments into long-running production systems, document volume increased steadily. More importantly, the role documents played changed. They stopped being inputs to experiments and became part of core operational workflows.

What one billion pages actually reflects

Processing one billion pages is not interesting because of the number itself. It is interesting because of the constraints it imposes.

At this scale, documents represent financial risk, regulatory exposure, and operational dependencies. Errors are rarely obvious. The most dangerous failures look valid on the surface and only reveal themselves downstream.

The documents we see in production are messy by default. They are scanned, rotated, inconsistently formatted, assembled from multiple sources, and often lack reliable metadata. Many span decades of evolving templates and standards.

Operating reliably under these conditions requires more than high average accuracy. It requires systems that behave predictably across variation and degrade safely when assumptions break.

When document intelligence becomes a systems problem

A common assumption in document AI is that stronger models will eventually smooth over complexity. In practice, this breaks down quickly in production.

Document intelligence is not a single inference step. It is a pipeline. Pre-processing, normalization, layout understanding, structured representations, schema enforcement, and evaluation all play a role in whether outputs are trustworthy.

At scale, the hardest problems are small and compounding. Orientation drift that corrupts tables. Layout shifts that detach values from labels while preserving syntax. Minor structural changes that silently break downstream logic.

These are not modeling errors in isolation. They are system-level failures.

Pulse evolved by treating these failure modes as first-class constraints rather than exceptions.

The Pulse platform today

Pulse is built as a platform for running document workflows in production, not as a collection of isolated features.

Documents entering Pulse are normalized and transformed into a structured markdown representation before schema extraction occurs. This intermediate layer is what makes the system inspectable and debuggable. It allows teams to understand how a document was interpreted, not just what fields were produced.

Schema-first extraction builds on top of this foundation to enforce consistency across large document sets and over time. This approach reduces drift when formats change and makes downstream behavior more predictable.

The platform supports both UI-driven workflows and direct API integration, designed to meet teams where they operate while maintaining the same underlying guarantees.

Why we updated the Pulse brand and site

For much of Pulse’s history, branding was intentionally secondary. The priority was accuracy, reliability, and production resilience.

As adoption grew, it became clear that the way the product presents itself needed to reflect the rigor of the underlying system. Teams evaluating Pulse wanted to understand how it worked, how it failed, and how it would fit into existing infrastructure.

The updated Pulse brand and site are focused on clarity. The new experience goes deeper into how documents move through the platform, how pipelines are structured, and how Pulse is deployed in practice. The goal is to explain the system honestly without oversimplifying the underlying mechanics.

Opening Pulse to broader use

Pulse is now open for anyone to use through the platform or directly via the API, with the ability to process up to 20K pages for free.

This reflects how we believe document systems should be evaluated. With real data, under real conditions, by the teams responsible for operating them.

Over the coming days, we will be rolling out additional platform capabilities. Each will be available as it ships and documented as part of the core system.

Our focus remains unchanged. Build document intelligence that behaves reliably when documents are no longer just files, but infrastructure.

Document intelligence only matters if it works quietly, consistently, and correctly at scale.

Try Pulse for free here.