Ingestion modes
Snapshot, Delta, Historical, CDC, and Stored Procedure — when to use each.
Every object you sync — every artifact — carries one ingestion mode. It's the most important choice you make per table, because it decides how rows land in the target and how the next run behaves. You set it per artifact, so a single pipeline can mix modes freely.
There are five: Snapshot, Delta, Historical, CDC, and Stored Procedure.
The five modes at a glance
| Mode | How it writes | Best for | Key options |
|---|---|---|---|
| Snapshot | Truncate and reload the whole table every run | Small reference and dimension tables | Snapshot retention |
| Delta | Upsert new and changed rows, matched on merge keys | Operational data that grows over time | Watermark column, Merge keys, Backload, Detect deletes |
| Historical | Append new rows; existing rows are never touched | Event streams, time-series, ledgers | Append-only |
| CDC | Apply inserts, updates, and deletes from the source log | Sources with transaction-log access | Log-based capture |
| Stored Procedure | Run a stored procedure on the source to produce the data | Sources where a proc owns extraction | Source-defined logic |
Snapshot — full refresh every run
Snapshot replaces the target table on every run with a fresh copy of the source. Because it truncates and reloads, it's dup-safe by construction — you can't accumulate duplicate rows no matter how often it runs.
Use it for small, slow-changing tables: lookups, dimensions, reference data. It needs no watermark. The one option it exposes is snapshot retention, which controls how prior snapshots are kept.
It's the simplest mode to reason about and impossible to get duplicates from. The only cost is rereading everything each run, which is fine until the table gets large.
Delta — incremental upsert
Delta is the workhorse for data that grows. Each run reads only the rows newer than the last successful run, using a watermark column (typically something like modified_date or an auto-incrementing id), then upserts them — inserting new rows and updating existing ones matched on your merge keys.
Its options:
- Watermark column — the field used to find new and changed rows.
- Merge keys — the column(s) that identify a row so updates land on the right record.
- Backload N days — re-pull the last N days on each run to catch late-arriving edits.
- Detect deletes — also reflect rows that disappeared from the source.
If the source's modified_date only updates on insert (not on update), Delta will miss your edits. Verify it first: change one row and confirm the next run picks it up.
Historical — append-only
Historical treats the target as an append-only ledger: each run adds new rows, and existing rows are never updated or deleted. That makes it a natural fit for event streams, time-series, and audit logs where history is the point.
The mode picker also describes Historical as a one-time full load that auto-disables after it completes successfully, while the artifact card frames it as an append-only ledger. Keep the append-only mental model — new rows in, nothing rewritten — and confirm the behavior on your first run.
CDC — change data capture
CDC reads inserts, updates, and deletes directly from the source's transaction log rather than re-querying tables. It's the most accurate way to mirror a fast-changing source and the lightest on the source itself, but it requires log access and is only available on supported sources (for example, MSSQL).
Stored Procedure — source-defined extract
Stored Procedure calls a procedure on the source to produce the data, instead of Databasin generating the query. Reach for it when the extraction logic is owned by the source system or a DBA — the in-app picker calls out Epic healthcare systems as the prime example.
Per-artifact options, in one place
Depending on the mode, the ingestion step exposes some of these:
- Watermark column — the change marker for incremental modes.
- Merge keys (Merge Columns) — identify rows for upserts.
- Backload N days — re-pull a recent window each run.
- Detect deletes — propagate source deletions.
- Snapshot retention — how long prior snapshots are kept.
- Custom SQL / Custom WHERE — an Ingestion behavior selector with three settings: Auto (read the object as-is), Custom SQL (replace the source
SELECT), or Custom WHERE Clause (add a filter). This is the only in-flight shaping a pipeline does; heavier transforms belong in an automation or in Lakehouse SQL after the sync.
Coming from Full / Incremental / Append?
If you've used other tools, the names map cleanly:
| You used to call it… | In Databasin it's… |
|---|---|
| Full / full reload | Snapshot |
| Incremental / merge-upsert | Delta |
| Append | Historical |
CDC and Stored Procedure have no equivalent in that older vocabulary — they're additions for log-based sources and source-owned extracts.