Designing fabric: a version-controlled HTAP database

May 19, 2026 by Nico Bautista Hobin · LinkedIn · [email protected]


Most teams supporting a real product end up with two databases. The OLTP database (e.g. Postgres, Spanner, MySQL) handles the live application. The OLAP database (e.g. Iceberg, BigQuery, Snowflake, Databricks SQL) handles the analytics. The two are stitched together by a CDC pipeline, and the cost of that stitch shows up in two places: the storage bill (you pay twice) and the on-call rotation (someone gets paged when the pipeline breaks). Furthermore, neither database has any sort of integrated version control system. This means that you can't branch off production to experiment safely (every "let's try X" turns into a day-long environment copy) and you can't see who changed what and easily revert it (when something breaks, debugging and rolling back is manual surgery).

This has been the default architecture for fifteen years. The premise of fabric is that the stack underneath has changed enough that it shouldn't be the default anymore. Object storage is bottomless and cheap. Open columnar formats are mature. Query engines are commoditized. The pieces to build one database that handles both production writes and analytics workloads, with Git-like branching and full row-level history, exist as open-source primitives. fabric is what happens when you compose them correctly.

This document is the technical brief. It walks through fabric's architecture, the design choices behind it, what's working today, and what's deliberately out of scope. It assumes you're a database-literate reader. The five-minute version lives on the homepage. The implementation is closed source today; we plan to open-source fabric at v1.0.0.

The problem with two databases

The OLTP/OLAP split exists because the workloads have different shapes. Your application writes single rows at low latency and reads them by primary key. Your analytics scans millions of rows and aggregates them across many dimensions. Postgres handles the first well and chokes on the second. BigQuery handles the second well and is too slow for the first. So you run both, and you build a pipeline to keep them in sync.

That pipeline is the source of two pain points worth understanding precisely:

You pay for storage twice. Every row your application writes lands in Postgres, then gets copied (via CDC or a periodic dump) into the warehouse. At scale, your warehouse storage bill is your application storage bill plus a markup, for the same data. Snowflake and BigQuery price this transparently ("$23 per TB-month"); they're not hiding it. It's just structurally true that you're storing every row twice and paying a markup on both copies.

The pipeline breaks. CDC connectors are some of the most fragile infrastructure in a typical data stack. Schema migrations break them. High-throughput writes lag them. They run on infrastructure you don't operate (Fivetran, Airbyte, custom Debezium) and fail in ways that page someone on a Sunday. The "we just need a CDC pipeline" engineering ticket is shorter than the "we keep fighting the CDC pipeline" reality.

On top of this, neither database versions its data. Postgres and BigQuery treat their tables as the present moment, with old values overwritten in place. There's no commit graph to branch off and no immutable history to inspect. That single missing primitive costs you in two places.

Experiments cost a sandbox. Without version control, there's no branch to fork off production into. So when you want to test a query change, tune an index, or train a model, you can't touch production — you stand up a sandbox instead. That sandbox needs the data, which means a copy. The copy takes hours or days, depending on size. The sandbox needs to be wired up to a compute environment. The experiment runs. You merge the result back, or discard it. Each experiment is a one-to-three-day project. So your team runs three experiments where they wanted to run thirty.

You can't see what changed. Without version control, old values are overwritten in place. When a row goes wrong in production, "who updated this customer's tier from premium to free on Tuesday?" is not a question your database answers. You search through application logs, audit tables if you built them, and slack messages. You debug for hours. And if you find the bad change, rolling it back is manual surgery: there's no git revert for a database. So compliance teams build custom audit layers on top of Postgres, just to get the row-level history their database should already provide.

Why now

The OLTP/OLAP split has been painful for a long time. What's changed is that the stack underneath finally lets you build past it. Three things matured in parallel:

Object storage became universal. S3, R2, GCS, and any of a dozen S3-compatible stores can hold a petabyte for cents per gigabyte-month and serve reads at sustained gigabit rates. The cost asymptote is determined by the underlying hardware, not by what a vendor will charge you. Storage is no longer a database design constraint; it's a commodity layer the database sits on top of.

Open columnar formats are mature. Parquet has had a decade of production use. Lance is its modern successor with better support for vector workloads and fast random access. Both decode efficiently into Arrow, allowing any analytical tool (DuckDB, Polars, pandas, Spark) to share the resulting buffers via zero-copy without further re-encoding. Data sitting on disk in Parquet or Lance is portable across every analytical tool that matters.

Query engines commoditized. Andy Pavlo's Databases in 2022: A Year in Review makes this case from the database-internals side: the moat used to be the engine, and it isn't anymore. Apache DataFusion is a Rust-native query engine with predicate and projection pushdown, vectorized execution, and full SQL support. It composes cleanly with custom catalog and table providers, which is what fabric does. The interesting work has moved up from the query engine itself to how you compose it with storage and a transactional layer.

The supply side is solved. What's needed is someone to compose these pieces into a database that handles both workloads on one set of tables, with version control native. That's fabric.

This compositional thesis is the through-line of Wes McKinney's Looking back 15 years. fabric is the database that falls out of taking that thesis seriously, applied to unified OLTP+OLAP rather than the analytical tier alone.

The demand side is also accelerating. Humans tolerated weekly refreshes and quarterly experiments. AI agents operate on minute-scale loops. They want fresh data, isolated experimentation environments, and atomic merges, and they want a hundred of them in parallel. Every write happens without a human in the loop to catch the mistake, so the database itself has to remember what changed and let you undo it. The existing stack wasn't designed for that volume of experimentation, and forcing it to scale isn't a path forward; it just makes the existing pain worse. The solution is the same as for human researchers: stop copying data to experiment on it.

fabric's architecture

The architectural premise is simple: your data should live in exactly one place, in open formats, and you should be able to branch and audit it like code. Everything else falls out of that.

At the top level, fabric is one logical database that handles both transactional writes and analytical queries against the same tables. Hot data lives in Postgres for low-latency reads and ACID writes. Cold data lives in object storage as columnar files. The query engine resolves reads across both tiers via a merge-on-read protocol, so analytical queries always see the latest committed state regardless of where the data physically sits. There is no replication step. There is no eventual-consistency window.

The hierarchy under the database is catalog → branch → schema → table. A catalog is the top-level isolation boundary (typically per environment: dev, staging, prod). Underneath each catalog, branches are first-class. Every catalog has a main branch by default, and branching off main creates an isolated view of the entire catalog: every schema, every table. A branch isn't a copy. It's a view of shared underlying data with isolated writes. This is the same model as Git, applied to a database.

Below the branch level, schemas are pure namespaces (the same as Postgres), and tables are the data primitive. Each table has a Postgres-style schema (typed columns) and is internally an auditable store: a composition of three append-only logs — upsert, delete, and commit — that gives you full insert/update/delete semantics with native row-level history.

The system runs as five gRPC microservices: query, write, lifecycle, metadata, and a cache manager. A Flight SQL gateway fronts the query and write services for SQL clients (JDBC, ODBC, ADBC). The whole thing is written in Rust on top of tonic for gRPC and DataFusion for query execution. The architecture is sketched below; you can ignore the boxes if you don't care how the services compose and read the rest of this document without losing the plot.

fabric architecture: a catalog containing three per-branch deployments (Flight SQL server, query service, write service, lifecycle service, metadata service, cache manager service, Postgres, and a per-branch SSD cache), all sharing one object store. SQL and gRPC clients connect into one branch; an S3 client can read the bucket directly (eventually consistent).

Three things deserve highlighting because they're the load-bearing architectural choices:

Object storage is the only permanent layer. Customer data at rest lives only in object storage. Postgres is hot-only and ephemeral. It holds recent writes that haven't yet been flushed to cold, and that's it. If a fabric instance is torn down and re-instantiated, the catalog reconstitutes from the bucket and queries resume against the same data. This is the "bottomless storage" design Neon popularized for Postgres, applied to a unified OLTP+OLAP system.

Branches share data via metadata, not copies. Each branch runs its own isolated set of fabric pods and its own Postgres. The new branch's Postgres starts empty. The query engine merges three things at read time: the new branch's hot tier (its Postgres), the new branch's cold tier (its own log files in object storage, also initially empty), and the source branch's cold tier as of the fork point. No data is copied between branches. We come back to the branch creation flow in detail below. (See Status and roadmap for what's shipped today versus the target.)

Every mutation is logged at the row level. When you UPDATE a row, the old version isn't overwritten. The new version is appended to the upsert log with a transaction ID, author, and timestamp. When you DELETE, a tombstone is appended to the delete log. The current state of a row is computed at read time by walking the log: latest non-deleted version wins. This gives you full row-level history natively, without an external audit table, and makes time-travel queries (as_of any prior transaction) and rollback (restore the state at any prior transaction) first-class operations rather than features bolted on later.

The next three sections go deeper on each pillar.

Bottomless object storage

The storage layer answers two questions: where does your data live, and what does that imply for cost and lock-in.

Where it lives. Object storage. By default, we host the bucket and you get the convenience. The interesting option is the alternative: bring your own bucket. Point fabric at an S3 bucket in your AWS account. We never get read or write access to your AWS account; we use scoped credentials you provision. Your data at rest lives only in your storage. If you decide to stop using fabric, your data is already where you keep your data. The "exit" is aws s3 sync to wherever else you want it, against the same bucket you'd already be syncing from.

What we cache. fabric is not a pure object-storage database. The hot tier is a Postgres instance that holds recent writes and the catalog metadata. A per-branch cache manager populates a local SSD cache with frequently accessed cold segments. Memory caches the working set. All of this is ephemeral: if you tear down a fabric instance, the caches go away and reconstitute from your bucket on the next request. The persistent layer is your bucket alone.

This means the truthful claim is "we don't need to store your data at rest." We do touch your data: we read it from your bucket into our SSD cache and memory while serving queries, and we write recent committed data into our Postgres before flushing to your bucket. But none of those layers is the permanent home of your data. Only the bucket is.

What this implies for cost. When you bring your own bucket, your AWS bill covers your storage. fabric's bill covers compute. The two are unbundled. You can negotiate S3 pricing with AWS directly. You can move to R2 or GCS if their pricing improves. You can use S3 Glacier or Intelligent Tiering on data we don't read hot. The compute side scales with usage; at rest, when no queries are running, compute scales to zero.

What this implies for lock-in. The data on disk in your bucket is in open formats: Lance by default, Parquet supported (Vortex and Nimble are on the roadmap). The same file that fabric reads to serve a query is readable by DuckDB, Polars, pandas, Spark, and any other Arrow-aware analytical tool. The Flight SQL gateway speaks an open wire protocol (Arrow Flight SQL) over JDBC/ODBC/ADBC, so any BI tool that speaks one of those drivers (Looker, Mode, Tableau, Hex, Metabase, basically all of them) connects without modification. The gRPC API uses standard .proto definitions; any gRPC client in any language can talk to it.

The point of all this isn't aesthetic. It's that the cost of leaving fabric is essentially free. You don't rewrite your ETL pipelines. You don't re-translate proprietary SQL dialects. You don't extract data from a proprietary storage format. You point your analytical tools at the bucket they were already implicitly pointing at, and you're done.

Branching, like Git on real data

Branches are the feature that makes the version-controlled framing concrete. The mental model is Git mapped onto a database: every catalog has a main, you can fork a new branch from any committed transaction on main, you write into the new branch in isolation, and you merge it back atomically when you're done. The implementation is what makes that mental model work without copying data, even on multi-terabyte tables.

The branch creation flow. When you call create_branch(name, base, fork_as_of=T), three things happen:

  1. Flush base's hot tier to cold up to T. All data on the base branch committed at or before transaction T is flushed from base's Postgres to base's cold-tier segments in object storage. This is part of the normal storage lifecycle, just bounded to ensure the base's cold tier contains the complete state at T.
  2. Record the new branch's metadata. A row in the catalog records the new branch's name, its parent (base), and the fork point (T). That's it. No data tables are copied; no file in object storage is duplicated.
  3. Spin up isolated infrastructure for the new branch. Each branch runs its own set of fabric pods and its own Postgres instance. The new branch's pods come up, its Postgres initializes empty, and the branch is ready for reads and writes. (Target architecture; see Status and roadmap.)

The clever bit is that the new branch doesn't need a copy of the data, even at petabyte scale, because the query engine handles the read-time merge across branches. We come back to this below.

Reads on the new branch. A query against the new branch is resolved by the query engine in three layers, merged at read time:

  1. The new branch's hot tier (its own Postgres). Initially empty; accumulates as the new branch is written to.
  2. The new branch's cold tier (its own log files in object storage). Also initially empty; accumulates as the lifecycle flushes hot to cold on the new branch.
  3. The base branch's cold tier as of fork_as_of=T. Read directly from base's segments in object storage. The base branch is unaffected by this read; segments are immutable once written.

The merge resolves to the latest non-deleted version per row, with the new branch's writes shadowing the base's state where they overlap. This is the same merge-on-read algorithm the query engine uses for the single-branch case (hot + cold), generalized to also include the base branch's cold tier at the fork point. The new branch sees the world as a union of "what base looked like at T" plus "what I've done since."

Writes on the new branch. Mutations go to the new branch's own logs (upsert, delete, transaction) in its own Postgres, eventually flushed to its own cold-tier segments in its own object-storage bucket. The base branch is never touched. This is the cleanest possible isolation guarantee: a runaway query on a branch can't lock base's Postgres, because branches don't share a Postgres. A bad write on a branch can't corrupt base's cold tier, because the segments live in different buckets.

Cross-schema atomicity. A fabric branch spans every schema in the catalog. This matters because real applications write to multiple schemas in one transaction, and branches need to preserve those semantics. On main, a transaction that inserts into public.users and analytics.events commits both rows atomically. On a branch, the same transaction does the same thing, with the same atomicity. Cross-branch reads are never valid (you read one branch at a time), which keeps the isolation model clean.

Branch creation time. The cost of create_branch is dominated by pod and Postgres spin-up. That's measured in minutes, not seconds. The flush step (step 1 above) is usually small for a healthy system: the lifecycle runs continuously, so the hot tier rarely has a large backlog. The metadata write (step 2) is milliseconds. The infrastructure provisioning (step 3) is where the time goes.

This is an honest trade-off. A "branches in 50ms" claim requires giving up either hot-tier isolation (so a runaway branch can affect production) or write isolation (so branches share a Postgres and can lock each other). We've chosen full infrastructure isolation per branch, which costs spin-up time and buys correctness. For the use case (real experiments on real production data), minutes is the right point on the trade-off curve. Two-day data copies become two-minute branch creations, and the resulting branches are real isolation, not a shared sandbox with asterisks.

Concurrency and merging. Within a branch, writes serialize through the branch's Postgres. Across branches, there's no contention: each branch has its own infrastructure, its own write logs. The only place branches interact is at merge time. merge_branch resolves the source branch's current state via set-based SQL and writes it into the target's logs as one atomic transaction. Consumers of the target either see all the changes or none. Row-level conflicts (the same row updated on both sides since the fork) resolve last-writer-wins by default; configurable strategies are on the roadmap.

When a branch is no longer needed, delete_branch flushes its hot tier to cold, soft-deletes the branch in the catalog (via the same auditable store tombstone used for row deletes), and spins down its pods and Postgres. The cold-tier segments are reclaimed by a later lifecycle pass, so the branch stays recoverable during the grace window.

Row-level versioning, native

The third pillar is the answer to "what changed, when, and how do I get back to before it changed."

The storage shape. Inside the auditable store, an UPDATE is an append to the upsert log; a DELETE is an append to the delete log; and every commit is recorded in the commit log. The current state of a row at any point in time is computed by walking the upsert and delete logs and picking the latest non-deleted version per primary key. Nothing is ever overwritten in place.

Auditability falls out of this. The question "who changed this row, when, and what was it before" is answerable by walking the upsert log filtered to that row's primary key. The result is the full mutation history, tagged with the transaction that produced each version, which itself is tagged with author and timestamp. This works for any row, any table, at any time, without you having configured anything. The same mechanism that gives you correct read semantics also gives you the audit trail.

Time-travel falls out of this. The same walk, parameterized by a transaction ID or timestamp, gives you the state as of any point in history. "What did the user table look like at midnight UTC on March 14" is a single query with an as_of parameter. The cost is proportional to the depth of the version chain for the rows you read, not to the size of the table.

Restore falls out of this. Walking the log gives you the state at a prior commit. Restoring to that state is a merge: write the log entries that would produce the prior state as a new commit. The user-facing operation is the database equivalent of git revert. The granularity is the row: you can restore the whole table to a prior state, or a subset of rows, or even a single row. The mechanism is the same as branch merging, scoped to a different operation.

Retention. Storing every version of every row forever isn't always what you want. Retention is configured at three levels (catalog, schema, table) with two independent fields: retain_max_versions (how many old versions to keep per row) and retention_duration_seconds (how long to keep them). A row version is eligible for removal during compaction only when it exceeds both bounds. Compaction is a periodic lifecycle operation that applies retention and rewrites the cold-tier files; until then, all versions are available for time-travel.

The trade-off is honest: retention erodes the audit trail. If you set retain_max_versions=10, the eleventh prior version of any row is gone forever after the next compaction. Most teams want this for write-heavy tables; some teams, especially in regulated industries, want retain_max_versions=NULL (keep everything) on certain tables. The configuration is per-table, so you can have both.

No lock-in: every interface is a standard

Every interface fabric exposes is an industry standard with multiple independent implementations. The point isn't that we like standards aesthetically; it's that this is what makes "you can try it out and then leave" a real claim instead of a marketing claim.

SQL via Flight SQL. The Flight SQL gateway speaks Arrow Flight SQL, an open wire protocol from the Apache Arrow project. It exposes the database to any JDBC, ODBC, or ADBC client. BI tools, notebooks, data quality checkers, schema visualizers: anything that speaks one of those drivers connects. DML (INSERT, UPDATE, DELETE) translates to gRPC mutations under the hood. Transactions, branching, and audit operations are exposed via Flight SQL action endpoints.

gRPC for programmatic access. Clients use two services: query (reads) and write (mutations). Together they cover catalog and schema management, branching, transactions, streaming reads, and lifecycle operations. The metadata, lifecycle, and cache manager services are internal — query and write call into them, not clients. The gRPC interfaces are defined in standard .proto schemas; any gRPC client in any language can talk to the system. A Python, Go, Rust, Java, or TypeScript client is just a code-gen away.

Open columnar formats at rest. Data in your bucket sits in Parquet or Lance, both of which are open formats with mature ecosystems. The exact same file fabric reads to serve a query is readable by DuckDB, Polars, pandas, Spark, or any other Arrow-aware tool. No exporter, no re-encoding, no proprietary intermediate format. If fabric the company goes away, your data is still there and still readable.

The practical implication: there is no exit migration. The data is where you keep your data. The wire protocols are the wire protocols. Your tools work. You stop paying fabric for compute. That's the entire exit story.

Status and roadmap

The engine is real. Today, fabric covers the full write path (catalog, branch, and schema management, transactions, data mutations), the full read path (merge-on-read across hot and cold tiers), time-travel queries (as_of), row-level audit, the storage lifecycle (persist, compact, snapshot, retention), and standard SQL via JDBC/ODBC/ADBC. End-to-end integration tests exercise the full lifecycle.

A few things on the roadmap we're excited about:

Design partners. fabric is currently onboarding a small number of design partners. The criteria is a team running production on a split system today (OLTP + warehouse + CDC), feeling the pain daily, and willing to give feedback during the design-partner phase. Partners get direct access to the engineering team, dedicated support, and a voice in the roadmap. fabric gets sharp signal from teams who actually live the problem. If that's you, write to [email protected].

Prior art and acknowledgments

fabric stands on the shoulders of open-source giants worth naming explicitly. None of the constituent technologies are fabric's invention. The contribution is the composition.

This document will get more specific as fabric's design hardens through the design-partner phase. Comments, corrections, and disagreements to [email protected].

Shape what's next

Apply to be a design partner.

We want a handful of teams running production on a split system today, and feeling the pain daily. You get direct access to the engineering team and a voice in the roadmap. We get sharp signal from people who live the problem.

[email protected]