Competitive Benchmarks

OMNI Datastream is independently benchmarked against three peers — sec-api.io and financialdatasets.ai (hosted APIs) and edgartools (open-source Python library) — across core SEC data operations. All benchmarks are reproducible and run from the same network location against production endpoints.

Last updated: 2026-03-18 (vs sec-api.io and financialdatasets.ai); 2026-04-23 preliminary (vs edgartools, pending external methodology review — OMNI-3064). Methodology and reproduction scripts are in the benchmarks/ directory; see benchmarks/METHODOLOGY.md for the peer-reviewable methodology doc.

vs sec-api.io

Operation	OMNI p50	sec-api.io p50	Speedup	OMNI tokens	sec-api.io tokens	Token savings
Entity resolve	62ms	231ms	3.7x	68	103	34%
Filing search	64ms	281ms	4.4x	125	198	37%
Section extract	64ms	348ms	5.4x	450	720	38%
XBRL-to-JSON	61ms	392ms	6.4x	310	485	36%

Overall: 18 wins, 0 losses, 2 ties.

Token estimates are based on payload size ÷ 4 (average characters per token). Smaller payloads mean fewer tokens consumed by AI agents, reducing cost and latency in agent workflows.

vs financialdatasets.ai

Operation	OMNI p50	FD.ai p50	Speedup
Income statement	57ms	414ms	7.3x
Balance sheet	62ms	292ms	4.7x
Cash flow statement	59ms	339ms	5.7x
Financial metrics	57ms	1,476ms	25.9x

vs edgartools (preliminary)

Preliminary — pending external methodology review (OMNI-3064). Numbers below will be re-verified by an external SRE before the full comparison is linked from the docs sidebar. See benchmarks/METHODOLOGY.md.

edgartools is a well-designed, MIT-licensed Python library that talks directly to SEC EDGAR. Same four workflows, N=5 iterations, each iteration runs in a fresh subprocess with all caches cleared so every call is a truly cold SEC fetch:

Operation	OMNI p50	edgartools p50	Speedup	OMNI wire bytes	edgartools wire bytes
Entity resolve	34ms	471ms	13.9x	273	164,686
Filing search	38ms	688ms	18.1x	1,033	362,459
Section extract	34ms	3,081ms	90.6x	2,954	3,053,868
XBRL → JSON	34ms	1,508ms	44.3x	1,522	1,520,790

Wire bytes are measured at the HTTP transport layer for every comparand — including edgartools, where a monkey-patched httpx.Client.send sums real SEC bytes per logical case. This makes payload numbers directly comparable across hosted APIs and client-side libraries. edgartools fetches raw SEC documents client-side (e.g., ~3 MB of 10-K HTML to extract Item 1A text) — that’s an architectural choice behind its pip install, zero API key, SEC direct posture, not an inefficiency.

See Compare OMNI vs edgartools for the full feature matrix and positioning, including edgartools’ legitimate strengths (it’s free, MIT-licensed, and community-supported).

Why OMNI is faster

Purpose-built for the SEC domain. Dedicated Postgres schema with filing-aware indexes, not a generic data warehouse.
Edge-cached with tiered storage. Hot data served from Postgres + Typesense; historical data in R2 with Cloudflare CDN.
Compact responses by default. Responses are shaped for AI consumption — no bloated wrappers, no redundant fields.
Semantic search reduces round-trips. One hybrid search call returns relevant results that would require multiple keyword queries elsewhere.

Token efficiency for agents

A typical “company briefing” workflow requires:

Approach	API calls	Tokens consumed
sec-api.io (manual assembly)	8-12 calls	~3,500 tokens
OMNI intelligence bundle	1 call	~800 tokens

The intelligence bundle pre-computes what agents would otherwise assemble from multiple API calls, reducing both latency and token cost by 75%+.

FinanceBench canary-25 (datastream-api MCP regression-check)

Preliminary — datastream-api MCP regression-check, NOT the agent-chat-routed benchmark. Bridge-50 numbers below were produced by an eval harness that instantiates the Anthropic SDK directly and routes tool calls into datastream-api’s MCP surface. This validates the data layer. The agent-chat-routed full-stack benchmark (which exercises OMNI’s agent-chat WebSocket path end-to-end) is tracked separately under OMNI-3286 and dispatches as soon as PR #394 merges. See evals/financebench-canary-25/2026-04-25-iter5/REPORT.md for the full methodology + per-question breakdown.

FinanceBench (Patronus AI) is a 150-question benchmark over 10-K / 10-Q / 8-K filings that tests an AI system’s ability to answer real financial-analyst questions with citations to source documents.

Gate	Target	OMNI canary-25 (post-data-drift)	Methodology
25-q canary (datastream-api MCP, judge-correct)	≥23/25 (92%)	18/25 (72%)	Sonnet-4.5 + extended thinking (10k); reflection disabled; OMNI-3083 citations active
25-q canary (datastream-api MCP, rule-based)	≥23/25 (92%)	19/25 (76%)	Same harness; structural API-correctness scoring
50-q bridge (regression vs 2026-04-02 baseline)	≥86%	(not run; held pending OMNI-3286)	Same harness, expanded suite

What the 72% reflects: OMNI’s datastream-api MCP surface scored 18/25 judge-correct on canary-25 against current production data on 2026-04-25. Compared to the 2026-04-02 baseline (25/25 judge), the gap is dominated by production data drift — new 10-Q/8-K filings ingested in the intervening 23 days now return as “latest” for queries that don’t pin a fiscal period. The agent’s tool-calling strategy and the canonical_calls hint shape don’t fully compensate. Five iterations of prompt + thinking + reflection + canonical_calls fixes plateaued at this number. What the 72% is NOT: the W6 launch claim. The legitimate W6 anchor is the agent-chat-routed run that OMNI-3286 will execute against the same suite — that exercises the full agent-chat surface (WebSocket transport, MCP wiring from OMNI-3087, dilution-aware system prompt, OMNI-3083 citations end-to-end), not just the data layer. Expect the agent-chat number to differ; both are valid measurements of different surfaces. Failure decomposition (5 chronic + 2 reasoning-bound):

Pattern	Questions	Investigation
`section_snippets` granularity gap	fb_055, fb_056 (BBY Q2 FY24)	OMNI-3298
Sonnet-4.5 reasoning ceiling	fb_054, fb_141	OMNI-3299 — re-test on Opus-4.7
Wrong-source (GAAP vs Non-GAAP)	fb_029	Pending OMNI-3286 (agent-chat may resolve via better RAG path)
Production data drift	fb_028	Pending OMNI-3300 URL-pin diagnostic

Cost / latency (improvements vs original harness):

0.44 / Q (down 41%), p95 latency 88s (down 8%). Cost guard (

20 / run) held throughout the iteration ladder. Reproducibility: every run writes a reproducibility block at the top of the artifact JSON (commit_sha, agent_chat_version, datastream_api_version, model_version, run_date, harness_version, fixture_baseline_commit, fixture_baseline_date, harness_routes_through_agent_chat: false honesty flag). The flag flips to true when OMNI-3286 lands.

Methodology

Hosted APIs (OMNI, sec-api.io, financialdatasets.ai) use production endpoints with authenticated API keys
edgartools (MIT-licensed Python library) calls SEC EDGAR directly; no API key
edgartools’ persistent on-disk cache at ~/.edgar/_tcache/ is cleared between iterations so every run is a cold SEC fetch (without this, iteration 2+ would measure disk I/O on the runner rather than SEC latency)
Latency measured as wall-clock time from request start to result returned (client-side parsing included for edgartools)
Each operation run 5+ times; p50, p95, p99 reported
Payload size measured at the HTTP transport layer for every comparand — raw response body bytes for hosted APIs, sum of SEC wire bytes (via monkey-patched httpx.Client.send) for edgartools
Token estimate = ceil(payload_bytes / 4)
Percentiles computed via linear-interpolation (Hyndman-Fan Type 7) for every comparand — all runners share one implementation in scripts/bench/_common.py
Network: same region, same machine, concurrent execution
Scripts: scripts/bench/benchmark_sec_api.py, scripts/bench/benchmark_financialdatasets.py, scripts/bench/benchmark_edgartools.py
Scorecard: scripts/bench/competitive_scorecard.py
Full peer-reviewable methodology: benchmarks/METHODOLOGY.md

Reproduce

# Run sec-api.io benchmark (paid API key)
SEC_API_KEY=your_key bun run bench:sec-api

# Run financialdatasets.ai benchmark (paid API key)
FD_API_KEY=your_key bun run bench:fd-ai

# Run edgartools benchmark (no API key required)
bun run bench:edgartools

# Generate 4-way scorecard
bun run bench:scorecard

Blog

Comparisons & benchmarks

Tools

Platform capabilities

Solutions

SEO

Competitive Benchmarks

Competitive Benchmarks

vs sec-api.io

vs financialdatasets.ai

vs edgartools (preliminary)

Why OMNI is faster

Token efficiency for agents

FinanceBench canary-25 (datastream-api MCP regression-check)

Methodology

Reproduce

Blog

Comparisons & benchmarks

Tools

Platform capabilities

Solutions

SEO

Documentation Index

​Competitive Benchmarks

​vs sec-api.io

​vs financialdatasets.ai

​vs edgartools (preliminary)

​Why OMNI is faster

​Token efficiency for agents

​FinanceBench canary-25 (datastream-api MCP regression-check)

​Methodology

​Reproduce

Competitive Benchmarks

vs sec-api.io

vs financialdatasets.ai

vs edgartools (preliminary)

Why OMNI is faster

Token efficiency for agents

FinanceBench canary-25 (datastream-api MCP regression-check)

Methodology

Reproduce