Footnote Extraction Is Now Generally Available on the Pulse Platform

Sid and Ritvik

March 12, 2026

Footnotes are deceptively simple. They sit quietly at the bottom of a page, easy to overlook, but in the documents that matter most to enterprises, they carry an outsized share of the actual information. A single footnote in a 10-K filing can restate revenue recognition assumptions. A footnote buried in a legal contract can redefine the scope of an indemnification clause. Actuarial tables routinely push their most important qualifications into fine print that lives below the main content. For anyone building automated workflows around these documents, footnotes aren't a nice-to-have, they're essential.

The challenge is that footnotes break the linear reading order of a page, and most extraction tools respond by either ignoring them, merging them into surrounding text, or stripping out the reference linkage that makes them meaningful. Teams end up writing fragile post-processing logic to recover information that should have been captured cleanly from the start.

Side-by-side view of the Pulse UI showing the original PDF on the left with highlighted footnote regions and author affiliations, and the Footnotes panel on the right listing all 5 detected footnotes with their page references and reference counts.

What's new

After working closely with a number of our enterprise customers to refine this capability in production, Pulse now offers footnote extraction as a generally available feature across the platform. Every footnote in a processed document is returned as a distinct, structured object with three key properties: the reference marker that identifies it, the full footnote text, and the positional metadata that links it back to exactly where it was cited in the body of the document.

This means the relationship between a superscript marker in a paragraph or table cell and the corresponding footnote content at the bottom of the page is fully preserved in the output, giving you the ability to reconstruct the original reading context programmatically without any manual mapping or post-processing.

How it works

Footnotes are automatically detected and extracted as part of the standard Pulse output whenever they appear in a document, with nothing to configure on your end. The feature works across PDFs, scanned documents, and digitally native files, and it handles the edge cases that come up most often in enterprise document workflows:

Multi-page footnotes. When a footnote begins on one page and continues on the next, Pulse stitches the content together and returns it as a single, complete object rather than splitting it across page boundaries.

Footnotes inside tables. Financial filings and regulatory documents frequently embed footnotes within table cells. Pulse extracts these alongside the table structure itself, so both the tabular data and its qualifying footnotes are captured in a single pass.

Variable formatting. Documents often use different footnote styles across sections, switching between numeric markers, symbols, or lettered references. Pulse normalizes these into a consistent output format regardless of how they appear in the source.

Close-up of the Footnotes panel with the first footnote expanded, showing the full "Equal contribution" footnote text and all 8 in-text references mapped back to individual authors with their element IDs and page locations.

What this looks like in practice

Before footnote extraction, teams processing documents like SEC filings or insurance contracts would typically run their extraction pipeline, get back structured content that was missing footnotes or had them jumbled into body text, and then layer on custom regex or heuristic logic to find, isolate, and re-link the footnote content after the fact. That post-processing was brittle, hard to maintain across document types, and often the first thing to break when a new format showed up.

Now, footnotes come through as structured data from the start, arriving linked to their references, complete across page boundaries, and ready for LLM context windows, database ingestion, or compliance workflows without any intermediate cleanup step.

Get started

Footnote extraction is live now for all Pulse platform users. Visit the documentation for the updated output schema, and reach out to the team if you have questions about integrating this into your existing pipeline. Try it here.

Want to chat? Click here.

Best Practices

Decoding the Doctor's Hand: Transforming Handwritten Medical Notes

Pulse achieves 92% accuracy on handwritten medical notes where traditional systems capture 54%.

Sid and Ritvik

March 18, 2025

Announcements

Introducing Extraction Library - Launch Week (Day 5)

A centralized system of record for extraction workflows that gives teams full version history, traceability, and safe iteration on schemas and prompts without breaking production.

Sid and Ritvik

December 22, 2025

Footnote Extraction Is Now Generally Available on the Pulse Platform

What's new

How it works

What this looks like in practice

Get started

Related articles