Data Lakehouse integration

This section illustrates how KAWA can be integrated in your data lakehouse.

1. Overview

KAWA integrates seamlessly into the consumption layer of your data lakehouse, providing an intuitive interface for data exploration and analytics while directly leveraging the Iceberg API to manage KAWA tables and ingest user data—all within a dedicated S3 bucket. It bridges to the processing layer by utilizing existing execution engines like Trino to perform scalable analytical queries using standard SQL. This integration allows users to harness the full suite of KAWA’s features—such as Python ETL, dynamic columns, and rich visualizations—natively within the lakehouse, combining performance, flexibility, and usability in a unified environment.

2. Configuration guide

This configuration guide outlines how to integrate KAWA into a data lakehouse architecture using Trino as the execution engine, S3 as the object storage layer, and Hive Metastore as the metadata layer.

This guide supposes the following prerequisites:

  • Working instance of a hive metastore

  • An existing S3 bucket with a R/W account for KAWA

  • A Trino instance with a Read only account for KAWA + CREATE and DROP views in a catalogue managed by KAWA.

2.1 Configuring Trino

If you do not wish to activate the write back feature, you can skip this paragraph.

2.1.1 Creating an Iceberg catalog

In Trino, you have to configure a new catalogue using the iceberg connector.

Example for the Trino catalog kawa (content of the file: kawa.properties)

2.1.2 Create a new schema via Trino

In the new catalog (Here, we are working in the kawa catalogue), create a new schema via Trino.

2.2 Configuring KAWA

2.2.1 Readonly configuration

In order for KAWA to function in read only mode on Trino, you need the following four environment variables:

2.2.2 Read+Write configuration

To add the write back capability, in addition to the four variables described above, three more variables are necessary

Environment variables:

Once those four variables are set, use the Python SDK to configure the S3 access as well as the Hive URL. This will allow KAWA to initialize the Iceberg API with the correct information and credentials.

Server configuration:

For these to be taken in account, the KAWA server must be restarted.

2.3 Recap of the configured components

This diagram shows all the different configuration elements that were covered in the previous paragraphs.

Note that KAWA_TRINO_WRITER_CATALOG (In red) must match the name of the catalog in Trino ( $(KAWA_TRINO_WRITER_CATALOG).config).

Last updated

Was this helpful?