After a year of building a proprietary training corpus, a dedicated human annotation pipeline, and an entirely new model architecture, we're releasing Pulse Ultra 2: our highest-accuracy document extraction model to date.
Pulse Ultra 2 is built on a completely rearchitected pipeline:
Reduced network hops. The previous Pulse architecture routed documents through multiple discrete stages, each as a separate service call: OCR, layout detection, table detection, cell extraction, structure reconstruction. Pulse Ultra 2 collapses this into a unified end-to-end architecture. Fewer hops, fewer failure points, fewer places for errors to compound.
End-to-end model architecture. Instead of chaining independent models that each solve a subtask, Enterprise processes the full page in a single forward pass that jointly handles layout understanding, cell detection, text recognition, and structure prediction. This eliminates the error propagation problem where an upstream OCR mistake cascades into a downstream structure error.
Merged cell handling. Colspan and rowspan detection was the single biggest accuracy bottleneck in the previous generation. Pulse Ultra 2 models cell spanning as a native output rather than a post-processing heuristic applied after grid detection.
Multilingual OCR. Rebuilt text recognition for CJK (Chinese, Japanese, Korean), Arabic/RTL, and Cyrillic scripts. Our data pipeline previously was trained predominantly on Latin-script documents. Pulse Ultra 2 was trained on a multilingual corpus spanning 100+ languages.
Table boundary detection. Improved handling of multi-table pages, borderless tables, and tables embedded in complex layouts. This is a culmination of architecture and data improvements.
Given the high computational demand, we're rolling out with specific rate limits for non-enterprise accounts while we scale capacity. Reach out to our team with the link here for higher rate limits or dedicated capacity.