- Native management: RisingWave manages the table’s lifecycle (creation, schema, writes). You interact with it like any other RisingWave table (querying, inserting, using in materialized views).
- Iceberg format storage: Data is physically stored according to the Iceberg specification, using a configured Iceberg catalog and object storage path. This ensures compatibility with the Iceberg ecosystem.
- Simplified pipelines: You don’t need a separate CREATE SINK step to export data out of RisingWave into Iceberg format if Iceberg is your desired end format managed by RisingWave. Data ingested or computed can land directly in these Iceberg tables.
- Interoperability: Tables created with the Iceberg engine are standard Iceberg tables and can be read by external Iceberg-compatible query engines (like Spark, Trino, Flink, Dremio) using the same catalog and storage configuration.
Setup and usage
1. Create an Iceberg connection
The Iceberg connection contains information about catalog and object storage. For syntax and properties, seeCREATE CONNECTION
.
The following examples show how to create an Iceberg connection using different catalog types.
For enhanced security, you can store credentials like S3 access keys as secrets instead of providing them directly. If you wish to use this feature, see Manage secrets.
JDBC catalog
2. Configure the engine to use the connection
You need to configure the iceberg table engine to use the connection that was just created. All tables created with the Iceberg table engine will use the connection by default.3. Create a table with the Iceberg engine
Now, you can create a table using the standardCREATE TABLE
syntax, but adding the ENGINE = iceberg
clause.
The commit_checkpoint_interval
parameter controls how frequently (every N checkpoints) RisingWave commits changes to the Iceberg table, creating a new Iceberg snapshot. The default value is 60. Typically, the checkpoint time is 1s. That means RisingWave will commit changes to the Iceberg table every 60s.
The approximate time to commit to Iceberg can be calculated as time = barrier_interval_ms × checkpoint_frequency × commit_checkpoint_interval
. barrier_interval_ms
and checkpoint_frequency
are system parameters that define the base checkpointing rate; commit_checkpoint_interval
is configurable in the Iceberg table engine.
4. Basic operations (insert and select)
For tables created with the Iceberg table engine, you can insert data using standardINSERT
statements and query using SELECT
. You can use them as base tables to create materialized views or join them with regular tables. However, RisingWave doesn’t support renaming or changing the schemas of natively managed Iceberg tables.
5. Streaming ingestion into Iceberg tables
Besides creating a table with connector, you can also sink data from another RisingWave source, table, or materialized view into a table created with the Iceberg engine.Use Amazon S3 Tables with the Iceberg engine
RisingWave’s Iceberg table engine supports using Amazon S3 Tables as an Iceberg catalog. This allows you to manage Iceberg tables directly within AWS S3 Tables via RisingWave. Follow these steps to configure this integration:- Create an Iceberg Connection
rest
catalog type. You must provide your AWS credentials and specify the necessary S3 Tables REST catalog configurations.
Required REST Catalog Parameters:
Parameter Name | Description | Value for S3 Tables |
---|---|---|
catalog.rest.signing_region | The AWS region for signing REST catalog requests. | e.g., us-east-1 |
catalog.rest.signing_name | The service name for signing REST catalog requests. | s3tables |
catalog.rest.sigv4_enabled | Enables SigV4 signing for REST catalog requests. Set to true . | true |
CREATE CONNECTION
Replace placeholder values
<...>
with your specific configuration details.- Configure the Iceberg table engine
- Create a table using the Iceberg table engine
ENGINE = iceberg
. These tables will be registered in your specified Amazon S3 Tables catalog.
my_iceberg_table
using the Iceberg format, with metadata stored and accessible via Amazon S3 Tables.
Features and considerations
Append-only tables
Added in v2.4.0.
- For normal tables, RisingWave keeps a copy of the table in row format to support streaming updates. In this case, streaming data will be immediately available for downstream streaming jobs (materialized views and sinks built on the table with iceberg table engine).
- For append-only tables, data will only be stored in Iceberg. Downstream streaming jobs will be powered by Iceberg streaming source. Streaming data will only be available after the data is committed to Iceberg.
Time travel
Iceberg’s snapshotting mechanism enables time travel queries. You can query the state of the table as of a specific timestamp or snapshot ID using theFOR SYSTEM_TIME AS OF
or FOR SYSTEM_VERSION AS OF
clauses.
External access
Since the data is stored in the standard Iceberg format, external systems (like Spark, Trino, Dremio) can directly query the tables created by RisingWave’s Iceberg engine. To do this, configure the external system with the same Iceberg connection details used in RisingWave:- Catalog type (
storage
,jdbc
,glue
,rest
) - Catalog configuration (URI, warehouse path, credentials)
- Object storage configuration (endpoint, credentials)
public.users_iceberg
in RisingWave would be accessed as table users_iceberg
within the public namespace/database in the configured Iceberg catalog by an external tool.
Limitations
RisingWave does not currently have a built-in automatic compaction service for tables created with the Iceberg engine. Streaming ingestion, especially with frequent commits (lowcommit_checkpoint_interval
), can lead to many small files. You may need to run compaction procedures manually using external tools that operate on Iceberg tables to optimize read performance. We are actively working on integrating compaction features.