DuckDB
DuckDB is an open-source column-oriented Relational Database Management System.
The DuckDB Wrapper allows you to read data from DuckDB within your Postgres database.
Preparation
Before you can query DuckDB, you need to enable the Wrappers extension and store your credentials in Postgres.
Enable Wrappers
Make sure the wrappers
extension is installed on your database:
1 |
|
Enable the DuckDB Wrapper
Enable the duckdb_wrapper
FDW:
1 2 3 |
|
Store your credentials (optional)
By default, Postgres stores FDW credentials inside pg_catalog.pg_foreign_server
in plain text. Anyone with access to this table will be able to view these credentials. Wrappers is designed to work with Vault, which provides an additional level of security for storing credentials. We recommend using Vault to store your credentials.
DuckDB can connect to many data sources, the credential to be saved in Vault depends on which data source you're going to use. For example, to store AWS credentials for S3 connection, you can run below SQL and note down the secret IDs returned:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Connecting to DuckDB
We need to provide Postgres with the credentials to connect to DuckDB. We can do this using the create server
command. Depends on the data source, there are different server options needs to be specified. Below is the list of supported data sources and their corresponding server options.
For any server options need to be stored in Vault, you can add a prefix vault_
to its name and use the secret ID returned from the select vault.create_secret()
statement as the option value.
AWS S3
Server Option | Description | Required | Default |
---|---|---|---|
type | Server type, must be s3 |
Y | |
key_id | The ID of the key to use | Y | |
secret | The secret of the key to use | Y | |
region | The region for which to authenticate | us-east-1 |
|
endpoint | Specify a custom S3 endpoint | s3.amazonaws.com |
|
session_token | A session token passed to use as temporary credentials | ||
url_compatibility_mode | Can help when URLs contain problematic characters | true |
|
url_style | Either vhost or path |
vhost |
|
use_ssl | Whether to use HTTPS or HTTP | true |
|
kms_key_id | AWS KMS (Key Management Service) key for Server Side Encryption S3 |
A create server
statement example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
This s3
server type can also be used for other S3-compatible storage services such like Supabase Storage. For example,
1 2 3 4 5 6 7 8 9 10 |
|
AWS S3 Tables
Server Option | Description | Required | Default |
---|---|---|---|
type | Server type, must be s3_tables |
Y | |
key_id | The ID of the key to use | Y | |
secret | The secret of the key to use | Y | |
s3_tables_arn | S3 Tables ARN (available in the AWS Management Console) | Y | |
region | The region for which to authenticate | us-east-1 |
A create server
statement example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Cloudflare R2
This is to access Cloudflare R2 using the S3 Compatibility API.
Server Option | Description | Required | Default |
---|---|---|---|
type | Server type, must be r2 |
Y | |
key_id | The ID of the key to use | Y | |
secret | The secret of the key to use | Y | |
account_id | The account ID to use | Y |
A create server
statement example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Cloudflare R2 Data Catalog
This is to access Cloudflare R2 Data Catalog.
Server Option | Description | Required | Default |
---|---|---|---|
type | Server type, must be r2_catalog |
Y | |
token | The R2 API token to use | Y | |
warehouse | Warehouse name in R2 Data Catalog | Y | |
catalog_uri | R2 Data Catalog URI | Y |
A create server
statement example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Apache Polaris
This is to access Apache Polaris Iceberg service.
Server Option | Description | Required | Default |
---|---|---|---|
type | Server type, must be polaris |
Y | |
client_id | The client ID to use | Y | |
client_secret | The client secret to use | Y | |
warehouse | Warehouse name | Y | |
catalog_uri | Polaris REST Catalog URI | Y |
A create server
statement example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Lakekeeper
This is to access Lakekeeper Iceberg service.
Server Option | Description | Required | Default |
---|---|---|---|
type | Server type, must be lakekeeper |
Y | |
client_id | The client ID to use | Y | |
client_secret | The client secret to use | Y | |
oauth2_scope | OAuth2 authentication scope | Y | |
oauth2_server_uri | Lakekeeper OAuth2 authentication URI | Y | |
warehouse | Warehouse name | Y | |
catalog_uri | Lakekeeper REST Catalog URI | Y |
A create server
statement example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Iceberg
This is to access generic Iceberg services. Check above for other specific Iceberg services like S3 Tables, R2 Data Catalog and etc. All the S3 options are supported with below additional options.
Note
Reading from Iceberg REST Catalogs backed by remote storage that is not S3 or S3 compatible is not supported yet.
Server Option | Description | Required | Default |
---|---|---|---|
type | Server type, must be iceberg |
Y | |
warehouse | Warehouse name | Y | |
catalog_uri | REST Catalog URI | Y | |
token | The API token to use | ||
client_id | The client ID to use | ||
client_secret | The client secret to use | ||
oauth2_scope | OAuth2 authentication scope | ||
oauth2_server_uri | OAuth2 authentication URI |
A create server
statement example used to access local Iceberg service:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Create a schema
We recommend creating a schema to hold all the foreign tables:
1 |
|
Options
The full list of foreign table options are below:
table
- Fully qualified source table name in DuckDB, required.
This can also be a subquery enclosed in parentheses, for example,
1 |
|
or, an URI points to remote file or a function (with corresponding type of server),
1 |
|
1 |
|
Entities
We can use SQL import foreign schema to import foreign table definitions from DuckDB.
For example, using below SQL can automatically create foreign tables in the duckdb
schema.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Currently only Iceberg-like servers, such as S3 Tables, R2 Data Catalog and etc., support import foreign schema
without specifying source tables. For other types of servers, source tables must be explicitly specified in options. For example,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The imported table name format from Iceberg-like server is:
<server_type>_<schema_name>_<table_name>
For example, the above statement will import a table name s3_tables_docs_example_guides
.
For other types of server with explicitly specified sources tables, the imported foreign table names have the schema and sequence number as prefix with this format:
<schema_name>_<sequence_number>_<filename_stem>
For example, by using belew statement,
1 2 3 4 5 6 7 8 |
|
The imported foreign table names are:
s3_0_products
s3_1_users
Note
By default, the import foreign schema
statement will silently skip all the incompatible columns. Use the option strict
to prevent this behavior. For example,
1 2 3 4 5 6 |
|
DuckDB Tables
This is an object representing DuckDB table.
Ref: DuckDB Table
Operations
Object | Select | Insert | Update | Delete | Truncate |
---|---|---|---|---|---|
table | ✅ | ❌ | ❌ | ❌ | ❌ |
Usage
You can manually create the foreign table like below if you did not use import foreign schema
.
1 2 3 4 5 6 7 8 9 10 |
|
Query Pushdown Support
This FDW supports where
, order by
and limit
clause pushdown.
Supported Data Types
Postgres Type | DuckDB Type |
---|---|
boolean | BOOLEAN, BOOL, LOGICAL |
"char" | TINYINT, INT1 |
smallint | SMALLINT, INT2, SHORT |
real | FLOAT, FLOAT4, REAL |
integer | INTEGER, INT4, INT, SIGNED |
double precision | DOUBLE, FLOAT8 |
bigint | BIGINT, INT8, LONG |
numeric | DECIMAL, NUMERIC |
text | BIT, VARCHAR, CHAR, BPCHAR, TEXT, STRING |
date | DATE |
time | TIME |
timestamp | TIMESTAMP, DATETIME |
timestamptz | TIMESTAMP WITH TIME ZONE, TIMESTAMPTZ |
jsonb | JSON, ARRAY, LIST, MAP, STRUCT, UNION |
bytea | BLOB, BYTEA, BINARY, VARBINARY |
uuid | UUID |
Limitations
This section describes important limitations and considerations when using this FDW:
- Only supports certain server types, which data is stored remotely
- Only supports specific data type mappings between Postgres and DuckDB
- Only supports read operations (no INSERT, UPDATE, DELETE, or TRUNCATE)
- When using Iceberg REST Catalog, only supports AWS S3 (or compatible) as the storage
- Materialized views using these foreign tables may fail during logical backups
Examples
Basic Example
First, create a s3
server:
1 2 3 4 5 6 7 8 |
|
Then import foreign table from a parquet file and query it:
1 2 3 4 5 6 7 8 9 |
|
This is the same as creating the foreign table manually like below,
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Read AWS S3 Tables
First, create a s3_tables
server:
1 2 3 4 5 6 7 8 9 |
|
Then, import all the tables in docs_example
namespace and query it:
1 2 3 4 |
|
Read Cloudflare R2 Data Catalog
First, follow the steps in Getting Started Guide to create a R2 Catalog on Cloudflare. Once it is completed, create a r2_catalog
server like below:
1 2 3 4 5 6 7 8 |
|
Then, import all the tables in default
namespace and query it:
1 2 3 4 |
|
Query Pushdown Examples
Follow the above Read R2 Data Catalog example, below are some query pushdown examples:
1 2 3 4 5 6 7 8 9 |
|