BigQuery native connection
1. Overview
KAWA's native connection to Google BigQuery allows organizations to leverage BigQuery as KAWA's data warehouse. KAWA's computation engine sits directly on top of a live BigQuery deployment — every query KAWA generates runs in BigQuery, against the customer's actual datasets. There is no extraction, no copy, and no separate analytics store.
This unlocks the full set of KAWA capabilities — Python-based ETL, dynamic schema augmentation, and real-time insights — orchestrated entirely within the customer's BigQuery environment. KAWA inherits BigQuery's elastic compute and storage, IAM model, and data residency.
KAWA connects to BigQuery using Google's official google-cloud-bigquery Java client (the first-party native SDK). The only secret KAWA ever needs is a GCP service account JSON key.
If instead you want to use BigQuery only as an ingestion source (KAWA extracts data from BQ and loads it into a different KAWA warehouse such as ClickHouse), configure it as a regular datasource through the KAWA UI rather than following this guide.
2. Configuration guide
2.1 Configuring BigQuery
KAWA uses two distinct service accounts:
A write-back account, used by KAWA to create and write its internal tables (user uploads, pandas DataFrame outputs, materializations, etc.) inside a dataset dedicated to KAWA.
A read-only account, used by KAWA to query the customer's existing datasets across one or more GCP projects.
The two can be the same account with broader IAM, but separating them is recommended.
a. Creating a project and dataset for KAWA to write in
KAWA needs a dedicated dataset (the BigQuery equivalent of a Snowflake schema) where it will store its own tables. This dataset can live in any GCP project — typically the same project as the rest of the customer's BigQuery footprint.
PROJECT_ID=<your-project-id>
bq --project_id=${PROJECT_ID} mk \
--dataset \
--description="KAWA internal write-back dataset" \
${PROJECT_ID}:kawa_analyticsb. Create a service account and role for KAWA's write-back account
This account writes KAWA-managed tables into the kawa_analytics dataset created above.
The resulting kawa-write-back-key.json is the only secret KAWA needs for write-back.
c. Create a service account and role with read-only access to the customer's data
This account is used by KAWA to query the customer's datasets. Grant it roles/bigquery.dataViewer on each dataset KAWA users should be able to query, and roles/bigquery.jobUser on each project KAWA will run queries in.
DO NOT GRANT ANY OTHER PERMISSIONS BESIDES
dataViewerANDjobUserto this service account. The read-only account should only be able to query — never write or modify data outside the dedicatedkawa_analyticsdataset.
If KAWA should query datasets across multiple GCP projects, repeat the jobUser and dataViewer bindings for each project, and list every project in KAWA_BIGQUERY_READER_PROJECT_LIST (see below).
2.2 Configuring KAWA
KAWA reads its BigQuery configuration from the secret store at startup. The following keys must be set as environment variables on the KAWA backend container, or pulled from a managed secret store — KAWA supports:
Google Secret Manager (recommended when KAWA is deployed on GCP — keeps the BigQuery JSON keys inside the same trust boundary as BigQuery itself)
AWS Secrets Manager (for KAWA on AWS)
Azure Key Vault (for KAWA on Azure)
HashiCorp Vault (cloud-agnostic — recommended for on-prem and multi-cloud deployments)
Plain Kubernetes / environment variables for POC and development setups
In every case the secret names (e.g. KAWA_BIGQUERY_READER_JSON_KEY) and values are the same — only the storage backend changes.
Read-only account:
KAWA_BIGQUERY_READER_JSON_KEY— the full contents ofkawa-readonly-key.jsonas a single-line JSON string.KAWA_BIGQUERY_READER_PROJECT_LIST— optional — comma-separated list of GCP project IDs the reader is allowed to query. Defaults to the project named in the JSON key.
Write-back account:
KAWA_BIGQUERY_WRITER_JSON_KEY— the full contents ofkawa-write-back-key.jsonas a single-line JSON string.KAWA_BIGQUERY_WRITER_DATASET=kawa_analyticsKAWA_BIGQUERY_WRITER_TABLE_PREFIX=KW__
The writer account always operates on the project named in its own JSON key — it does not need a project list.
Example — environment variables
JSON keys are passed as single-line JSON strings. The GCP-issued JSON already escapes newlines in private_key with \n, so no manual escaping is required.
Example — Google Secret Manager
When KAWA runs on GCP, the JSON keys can be stored in Google Secret Manager instead of being passed as plain env vars:
KAWA fetches the secrets by name at startup and uses them exactly as if they had been provided via environment variables.
Other backends
The same pattern applies to AWS Secrets Manager, Azure Key Vault, and HashiCorp Vault — store each KAWA_BIGQUERY_* key as a named secret, grant the KAWA backend's identity access to read it, and KAWA will resolve it transparently. HashiCorp Vault is the recommended option for on-prem and multi-cloud deployments where a single secret store needs to serve KAWA regardless of where it runs.
Workload Identity Federation (using GCP-side identity instead of a long-lived JSON key) is not currently supported — KAWA expects a JSON key.
Last updated
Was this helpful?

