Apache Iceberg
Apache Iceberg is a high performance open-source format for large analytic tables.
The Iceberg Wrapper allows you to read data from Apache Iceberg within your Postgres database.
Preparation
Before you can query Iceberg, you need to enable the Wrappers extension and store your credentials in Postgres.
Enable Wrappers
Make sure the wrappers
extension is installed on your database:
1 |
|
Enable the Iceberg Wrapper
Enable the iceberg_wrapper
FDW:
1 2 3 |
|
Store your credentials (optional)
By default, Postgres stores FDW credentials inside pg_catalog.pg_foreign_server
in plain text. Anyone with access to this table will be able to view these credentials. Wrappers is designed to work with Vault, which provides an additional level of security for storing credentials. We recommend using Vault to store your credentials.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Connecting to Icerberg
We need to provide Postgres with the credentials to connect to Iceberg. We can do this using the create server
command.
For any server options need to be stored in Vault, you can add a prefix vault_
to its name and use the secret ID returned from the select vault.create_secret()
statement as the option value.
Connecting to AWS S3 Tables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Connecting to Iceberg REST Catalog + AWS S3 (or compatible) storage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Info
For other optional S3 options, please refer to PyIceberg S3 Configuration.
Create a schema
We recommend creating a schema to hold all the foreign tables:
1 |
|
Options
The full list of foreign table options are below:
table
- Fully qualified source table name with all namespaces in Iceberg, required.
Entities
We can use SQL import foreign schema to import foreign table definitions from Iceberg.
For example, using below SQL can automatically create foreign tables in the iceberg
schema.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Note
By default, the import foreign schema
statement will silently skip all the incompatible columns. Use the option strict
to prevent this behavior. For example,
1 2 3 4 5 6 |
|
Iceberg Tables
This is an object representing Iceberg table.
Ref: Iceberg Table Spec
Operations
Object | Select | Insert | Update | Delete | Truncate |
---|---|---|---|---|---|
table | ✅ | ❌ | ❌ | ❌ | ❌ |
Usage
You can manually create the foreign table like below if you did not use import foreign schema
.
1 2 3 4 5 6 7 8 9 10 |
|
Query Pushdown Support
This FDW supports where
clause pushdown with below operators.
Operator | Note |
---|---|
= , > , >= , < , <= , <> , != |
|
is null , is not null |
|
x , not x , x is true , x is not true |
column x data type is boolean |
x between a and b |
column x data type can be datetime or numeric types |
like 'abc%' , not like 'abc%' |
only support starts with pattern |
in (x, y, z) , not in (x, y, z) |
Note
For multiple filters, only logical AND
is supported. For example,
1 2 3 4 5 |
|
Supported Data Types
Postgres Type | Iceberg Type |
---|---|
boolean | boolean |
real | float |
integer | int |
double precision | double |
bigint | long |
numeric | decimal |
text | string |
date | date |
time | time |
timestamp | timestamp, timestamp_ns |
timestamptz | timestamptz, timestamptz_ns |
jsonb | struct, list, map |
bytea | binary |
uuid | uuid |
Limitations
This section describes important limitations and considerations when using this FDW:
- Only supports specific data type mappings between Postgres and Iceberg
- Only supports read operations (no INSERT, UPDATE, DELETE, or TRUNCATE)
- Apache Iceberg schema evolution is not supported
- When using Iceberg REST catalog, only supports AWS S3 (or compatible) as the storage
- Materialized views using these foreign tables may fail during logical backups
Examples
Basic Example
First, create a server for AWS S3 Tables:
1 2 3 4 5 6 7 8 |
|
Import the foreign table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Then query the foreign table:
1 |
|
Read Cloudflare R2 Data Catalog
First, follow the steps in Getting Started Guide to create a R2 Catalog on Cloudflare. Once it is completed, create a server like below:
1 2 3 4 5 6 7 8 9 10 |
|
Then, import all the tables in default
namespace and query it:
1 2 3 |
|
Query Pushdown Examples
1 2 3 4 5 6 7 8 9 10 |
|