Research Paper

Bespoke OLAP: Synthesizing Workload-Specific
One-Size-Fits-One Database Engines

Johannes Wehrstein, Timo Eckmann, Matthias Jasny, Carsten Binnig · Systems Group, TU Darmstadt

11.78×

Speedup vs DuckDB
(TPC-H SF 20)

9.76×

Speedup vs DuckDB
(CEB SF 2)

~$10

Cost to synthesize
one DBMS

∞

Workloads supported
(any SQL template set)

Paper (arXiv) Artifacts Code W&B Runs Live Demo

Abstract

Modern OLAP engines are designed to support arbitrary analytical workloads, but this generality incurs structural overhead, including runtime schema interpretation, indirection layers, and abstraction boundaries, even in highly optimized systems. An engine specialized to a fixed workload can eliminate these costs and exploit workload-specific data structures and execution algorithms for substantially higher performance. Historically, constructing such bespoke engines has been economically impractical due to the high manual engineering effort. Recent advances in LLM-based code synthesis challenge this tradeoff by enabling automated system generation. However, naively prompting an LLM to produce a database engine does not yield a correct or efficient design, as effective synthesis requires systematic performance feedback, structured refinement, and careful management of deep architectural interdependencies. We present Bespoke OLAP, a fully autonomous synthesis pipeline for constructing high-performance database engines tightly tailored to a given workload. Our approach integrates iterative performance evaluation and automated validation to guide synthesis from storage to query execution. We demonstrate that Bespoke OLAP can generate a workload-specific engine from scratch within minutes to hours, achieving order-of-magnitude speedups over modern general-purpose systems such as DuckDB.

Bespoke OLAP in Action

Watch the agent autonomously optimize a generated C++ engine in real time.

The Bespoke agent iteratively patches, compiles, and benchmarks the generated engine.

Interactive Live Demo

Run Bespoke-generated queries against real data directly in your browser.

Live Now

Explore the full BespokeOLAP engine synthesis story.

Step through workload analysis, storage design, generated C++ implementations, and measured speedups query by query in the interactive demo.

22 TPC-H queries Real code versions Measured against DuckDB

Open Demo Jump to Q1

Leaderboard

Single-threaded total runtime across all 22 TPC-H / all CEB queries. All systems pinned to a single core, in-memory. Lower is better. Bespoke engines synthesized with GPT 5.2 Codex.

TPC-H — Scale Factor 10 ~10 GB

#	System	Total Runtime	vs DuckDB
1	Bespoke OLAP Ours Best	2.43 s	10.74×
2	Umbra	9.54 s	2.73×
3	DuckDB	26.1 s	1.00× (baseline)
4	ClickHouse	458 s	0.06×

TPC-H — Scale Factor 20 ~20 GB

#	System	Total Runtime	vs DuckDB
1	Bespoke OLAP Ours Best	4.6 s	11.78×
2	Umbra	20.4 s	2.66×
3	DuckDB	54.4 s	1.00× (baseline)

CEB (IMDB) — Scale Factor 2 21 tables

#	System	Total Runtime	vs DuckDB
1	Bespoke OLAP Ours Best	2.3 s	9.76×
2	DuckDB	22.1 s	1.00× (baseline)

Bar chart showing 11.78x speedup on TPC-H and 9.76x on CEB — Total runtime comparison: Bespoke OLAP vs. DuckDB. Left: TPC-H (SF 20). Right: CEB / IMDB (SF 2).

100%

Every Query Faster

Bespoke OLAP outperforms DuckDB on every single query — per-query speedups range from 5.7× to 104× on TPC-H and 1.5× to 1,466× on CEB. No query left behind.

70×

Scales Better Than DuckDB

On CEB, the speedup grows from 9.2× at SF 2 to over 70× at SF 10 as DuckDB's runtime grows disproportionately. Bespoke storage layouts are more robust to increasing data volumes.

~$10

Minutes, Not Years

Building a high-performance DBMS takes years of engineering. Bespoke OLAP synthesizes one from scratch in minutes to hours for roughly $10 in LLM API costs.

How It Works

A fully automated 3-stage pipeline from workload specification to a validated, optimized C++ engine.

Storage Plan Generation

The LLM agent analyzes the SQL query templates and the schema, then designs a custom storage layout — choosing column encodings, sort orders, and materialized join artifacts — tailored to the specific access patterns of the target workload.

Base Implementation

From the storage plan, the agent generates a complete C++ engine: a loader (Parquet → raw layout), a builder (raw layout → bespoke database), and query executors for each SQL template. All code is compiled and validated for correctness against DuckDB reference results.

Optimization Loop

The agent enters an iterative loop: profile → identify bottleneck → apply patch → compile (hot-reload) → re-run → check speedup. The loop continues until the speedup converges or a turn budget is exhausted. Each state is snapshotted in git for full reproducibility.

System Architecture

End-to-end pipeline from workload specification to a verified, benchmark-ready engine.

Bespoke OLAP pipeline overview: workload spec, code synthesis, live-patch DBMS, verification — The Bespoke OLAP pipeline. (1) The agent receives a workload specification and dataset schema. (2) It synthesizes a storage layout and C++ engine via continuous patching and incremental compilation. (3) The engine is assembled with hot-reload support. (4) Correctness is continuously verified against random query instantiations at multiple scale factors.

Beyond the Paper

Since publication, we have extended Bespoke OLAP with additional LLM backends and database baselines. Results will be added to the leaderboard as they are finalized.

Additional LLM Models

The paper uses GPT 5.2 Codex. We have additionally run Bespoke OLAP with the following models:

Claude Sonnet 4.6

LLM Model

Claude Opus 4.6

LLM Model

GLM-5

LLM Model

Qwen 3.5

LLM Model

MiniMax-M2.5

LLM Model

Additional Database Baselines

Expanding comparisons beyond DuckDB to include more production OLAP systems:

Umbra

Database Baseline

ClickHouse

Database Baseline

Citation

If you use Bespoke OLAP in your research, please cite:

@article{wehrstein2025bespokeolap,
  title     = {Bespoke OLAP: Synthesizing Workload-Specific One-Size-Fits-One Database Engines},
  author    = {Wehrstein, Johannes and Eckmann, Timo and Jasny, Matthias and Binnig, Carsten},
  journal   = {arXiv preprint arXiv:2603.02001},
  year      = {2025},
  url       = {https://arxiv.org/abs/2603.02001}
}

Bespoke OLAP: Synthesizing Workload-SpecificOne-Size-Fits-One Database Engines

Abstract

Bespoke OLAP in Action

Interactive Live Demo

Leaderboard

How It Works

Storage Plan Generation

Base Implementation

Optimization Loop

System Architecture

Beyond the Paper

Additional LLM Models

Additional Database Baselines

Citation

Bespoke OLAP: Synthesizing Workload-Specific
One-Size-Fits-One Database Engines