KAWA On-premise Installation (with docker)
KAWA On-premise Installation (with docker)

KAWA On-premise Installation (with docker)

📌
This document describes how to install a single node of KAWA on premise with Docker. Please feel free to contact our support team here: support@kawa.ai

This setup is ideal for small to medium deployments. It leverages the vertical scalability of Clickhouse as well as the stability of KAWA. It should be paired up with some scheduled backups of both the Clickhouse database and the Postgres database (please contact support@kawa.ai if you need assistance to setup up those backups, as they will depend on the existig tools and methodology you have in place in your infrastructure).

1. Prerequisites

1.a) General requirements

We currently support Ubuntu Systems 20.04 LTS.

You need an account with the ability to run sudo on the target machine

You need to access our registry here: registry.gitlab.com/kawa-analytics-dev (⚠️ via docker login, it will not work from your browser)

You need docker + docker compose installed on the target machine.

Lastly, you need to be in possession of a valid KAWA license.

1.b) Hardware requirements

RAM

For small amounts of data (up to ~200 GB compressed), it is best to use as much memory as the volume of data. For large amounts of data and when processing interactive (online) queries, you should use a reasonable amount of RAM (128 GB or more) so the hot data subset will fit in the cache of pages. Even for data volumes of ~50 TB per server, using 128 GB of RAM significantly improves query performance compared to 64 GB.

CPU

KAWA will use all available CPU to maximize performance. So the more CPU - the better. For processing up to hundreds of millions / billions of rows, the recommended number of CPUs is at least 64-cores. Both AMD64 and ARM64 architectures are supported.

Storage Subsystem

SSD is preferred. HDD is the second best option, SATA HDDs 7200 RPM will do. The capacity of the storage subsystem directly depends on the target analytics perimeter.

2. Installation procedure

Please follow those steps to install KAWA.

2.a) Install the KAWA Server

  1. Download the installation package for docker - link below:
kawa20.tar.gz2.4KB
  1. Upload it onto the server on which you wish to install KAWA
  2. Extract its content:
tar xvf kawa20.tar.gz
cd docker-compose-install
  1. Input your credentials

Edit the file

vim assets/kawa-registry.credentials 

and replace the first line by your token name, the second by the token value. Those credentials should have been communicated to you by the KAWA support team.

A valid file looks like so:

wayne-enterpises
GbT3zdqLPofY3RTdR56

⚠️ Make sure there is no new line after the token value

  1. Run the installation script
sudo ./install.sh

Concerning SMTP and HTTPS:

📌
SMTP: You will be asked if you wish to configure SMTP. If that is the case, you will be prompted for a SMTP username and a password. The configuration of the SMTP server itself will be done later on (host, port, etc..) HTTPS: You will be prompted if you wish to use HTTPS to connect to KAWA. If it is the case, you will have to provide your ssl certificate (.crt) and your private key (.key) files.

⚠️ The private key file must be in the PKCS8 format with PEM encoding:

----BEGIN PRIVATE KEY-----
FGtwrtwrRWgwrtkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC4V27AzO53qFL4
RNnJhLbQ6H32TjNUpRkDwZYjzhjXclSz4tVmfHUhzev2UgEvpdPDu9yM42LTlvKf
(...)
Yjg83FpyQ866uInZQ3zv5IL+kFzJ0UBc0tOYjAOxAoGAHtJYxd54Ckb8bREb3w39
DKWZojHYPXCtm+MjpPgVF95XEfYoW9Fm1Eul8s0bZWe4m5ywD0fei7Lojxp72dD1
+4UeFll8DI+gLlo2N9ka5dT+KZQcv35M+hzrfNoewcaxTCjQvyu9VdeSPdAEPT77
euiPbW7ULBJ6VBbuUdw+viA=
-----END PRIVATE KEY-----

⚠️ The certificate must also be PEM encoded:

----BEGIN CERTIFICATE-----
DKWZojHYPXCtm+MjpPgVF95XEfYoW9Fm1Eul8s0bZWe4m5ywD0fei7Lojxp72dD1
FGtwrtwrRWgwrtkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC4V27AzO53qFL4
(...)
FGtwrtwrRWgwrtkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC4V27AzO53qFL4
eFPv12PFiKw4B4adtTAR6LjVevRMqBvbJ16pVuVcg7SHAIXplOTnay0=
-----END CERTIFICATE-----
  1. Start docker compose using root:
sudo docker compose up -d

All set - you can now connect to your new KAWA instance.

image
  1. Install the python client

This is required to complete all the administration tasks:

https://docs.kawa.ai/python-api-for-admin#172c37d3a51a4327b811be67a14dda35

2.b) Change the setup-admin credentials and activate your software

⚠️
Please complete this step before creating accounts for other users.

All of the admin tasks can be performed through KAWA’s Python API.

Please refer to sections 1 and 2 of this document to change the setup-admin credentials:

https://docs.kawa.ai/python-api-for-admin#172c37d3a51a4327b811be67a14dda35

To activate your software with your license, refer to this link:

https://docs.kawa.ai/python-api-for-admin#56fad66390cd4987aa32f9990e29bf88

2.c) Connect a Python script runner

📌
This will let users use the Python capabilities of KAWA.

In order to connect a python script runner to workspaces of KAWA, please follow this documentation:

Head directly to sections 2b and 2c. The AES key is not needed when running the on premise installation with docker, both the server and the runner share the AES key secret.

https://docs.kawa.ai/python-script-runners#2e3c36f73cc940c89195047690d3a794

Below, find the correct parameters for the Docker compose setup:

from kywy.client.kawa_client import KawaClient

kawa = KawaClient(kawa_api_url='https://<YOUR URL>')
kawa.set_api_key(api_key_file='<PATH TO THE KEY FILE>')
kawa.set_active_workspace_id(workspace_id=1)
cmd = kawa.commands

cmd.add_script_runner(
    name='Main runner',
    host='kawa-script-runner',
    port=8815,
    tls=False
)

print(kawa.runner_health())

2.d) Setup SMTP

Setting up SMTP will let users create their own accounts on KAWA (If configured to use KAWA’s own authentication mechanism) and enable automations to send emails.

To configure SMTP, follow this documentation:

https://docs.kawa.ai/python-api-for-admin#acaf2bf0e5a748b4a2092bf50f6232d0

3. About secrets

KAWA can be used with HashiCorp Vault (https://www.vaultproject.io/) to store its secret.

However, this installation process leverages docker secret to pass secrets to the various containers in a safe way.

Please note that the secrets are loaded from various files stored next to the docker-compose.yml file. All those files are owned by the kawa-system user and can only be read by them.

If you wish to load the secrets from environment variables instead of files:

1) Stop your kawa server using docker compose down

2) Edit the docker-compose.yml by doing the following:

secrets:
  smtp-credentials:
    file: smtp.credentials
  server-certificate:
    file: server.crt
  server-private-key:
    file: server.key
  kawa-master-key:
    file: kawa.master.key
  postgres-password:
    # Here for instance: tell KAWA to read the password from the
    # env variable: DB_PASSWORD
    environment: DB_PASSWORD
  runner-aes-key:
    file: kawa.runner.key

3) Set the environment variables that you defined in the file (Here: DB_PASSWORD) to the intended values. Use for instance export DB_PASSWORD=abcdef

4) Restart the service by doing docker compose up. This usually fails: KAWA will complain that the secret file cannot be found. In order to fix it, stop kawa and run docker compose rm. After that, when you restart the service, the secrets should correctly be passed to the instance.

4. Overview of the installed services

Docker compose deploys the following services on the host OS:

  • One instance of a clickhouse server
  • One instance of a postgres server
  • One instance of an apache Arrow Flight server (grpc)
  • One instance of KAWA application/web server
  • (Optional) One instance of HashiCorp Vault for secret management
image

4.a) Inbound traffic

All inbound traffic is made through KAWA Server, using the HTTP(S) protocol. This traffic is generally initiated either via the Python Client or via the KAWA WebUI. (The WebUI assets - js/css/html are delivered by KAWA Server as well). KAWA Server exposes multiple API (REST and RPC like) in addition to the static content for the KAWA GUI.

4.b) Outbound traffic

KAWA Server will connect to:

  • Data provider systems such as databases, external APIs (Airtable, Google Sheet, etc) to perform ETLs.
  • (Optional) A SMTP server to manage outbound email communications, both to manage user accounts and deliver automation messages
  • (Optional) Any outbound API (Like Slack, internal APIs) to publish data through Automations.

4.c) Internal traffic

KAWA Server will connect to Postgres to persist its state (Entity Repository). Postgres is also used as an event store for KAWA.

All the computations will happen on Clickhouse. Even though the clickhouse ports are exposed outside the docker container, all the user interactions with Clickhouse is ALWAYS made through KAWA to ensure Row Level Security and governance.

An instance of Python Arrow flight server (GRPC) is used to execute custom automations and user defined python scripts.