- 1. Prerequisites
- 1.a) General requirements
- 1.b) Hardware requirements
- 2. Installation procedure
- 2.a) Install the KAWA Server
- 2.b) Change the setup-admin credentials and activate your software
- 2.c) Connect a Python script runner
- 2.d) Setup SMTP
- 3. About secrets
- 4. Overview of the installed services
- 4.a) Inbound traffic
- 4.b) Outbound traffic
- 4.c) Internal traffic
This setup is ideal for small to medium deployments. It leverages the vertical scalability of Clickhouse as well as the stability of KAWA. It should be paired up with some scheduled backups of both the Clickhouse database and the Postgres database (please contact support@kawa.ai if you need assistance to setup up those backups, as they will depend on the existig tools and methodology you have in place in your infrastructure).
1. Prerequisites
1.a) General requirements
We currently support Ubuntu Systems 20.04 LTS.
You need an account with the ability to run sudo on the target machine
You need to access our registry here: registry.gitlab.com/kawa-analytics-dev (â ď¸Â via docker login, it will not work from your browser)
You need docker + docker compose installed on the target machine.
Lastly, you need to be in possession of a valid KAWA license.
1.b) Hardware requirements
RAM
For small amounts of data (up to ~200 GB compressed), it is best to use as much memory as the volume of data. For large amounts of data and when processing interactive (online) queries, you should use a reasonable amount of RAM (128 GB or more) so the hot data subset will fit in the cache of pages. Even for data volumes of ~50 TB per server, using 128 GB of RAM significantly improves query performance compared to 64 GB.
CPU
KAWA will use all available CPU to maximize performance. So the more CPU - the better. For processing up to hundreds of millions / billions of rows, the recommended number of CPUs is at least 64-cores. Both AMD64 and ARM64 architectures are supported.
Storage Subsystem
SSD is preferred. HDD is the second best option, SATA HDDs 7200 RPM will do. The capacity of the storage subsystem directly depends on the target analytics perimeter.
2. Installation procedure
Please follow those steps to install KAWA.
2.a) Install the KAWA Server
- Download the installation package for docker - link below:
- Upload it onto the server on which you wish to install KAWA
- Extract its content:
tar xvf kawa20.tar.gz
cd docker-compose-install
- Input your credentials
Edit the file
vim assets/kawa-registry.credentials
and replace the first line by your token name, the second by the token value. Those credentials should have been communicated to you by the KAWA support team.
A valid file looks like so:
wayne-enterpises
GbT3zdqLPofY3RTdR56
â ď¸Â Make sure there is no new line after the token value
- Run the installation script
sudo ./install.sh
Concerning SMTP and HTTPS:
â ď¸ The private key file must be in the PKCS8 format with PEM encoding:
----BEGIN PRIVATE KEY-----
FGtwrtwrRWgwrtkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC4V27AzO53qFL4
RNnJhLbQ6H32TjNUpRkDwZYjzhjXclSz4tVmfHUhzev2UgEvpdPDu9yM42LTlvKf
(...)
Yjg83FpyQ866uInZQ3zv5IL+kFzJ0UBc0tOYjAOxAoGAHtJYxd54Ckb8bREb3w39
DKWZojHYPXCtm+MjpPgVF95XEfYoW9Fm1Eul8s0bZWe4m5ywD0fei7Lojxp72dD1
+4UeFll8DI+gLlo2N9ka5dT+KZQcv35M+hzrfNoewcaxTCjQvyu9VdeSPdAEPT77
euiPbW7ULBJ6VBbuUdw+viA=
-----END PRIVATE KEY-----
â ď¸ The certificate must also be PEM encoded:
----BEGIN CERTIFICATE-----
DKWZojHYPXCtm+MjpPgVF95XEfYoW9Fm1Eul8s0bZWe4m5ywD0fei7Lojxp72dD1
FGtwrtwrRWgwrtkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC4V27AzO53qFL4
(...)
FGtwrtwrRWgwrtkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC4V27AzO53qFL4
eFPv12PFiKw4B4adtTAR6LjVevRMqBvbJ16pVuVcg7SHAIXplOTnay0=
-----END CERTIFICATE-----
- Start docker compose using root:
sudo docker compose up -d
All set - you can now connect to your new KAWA instance.
- Install the python client
This is required to complete all the administration tasks:
https://docs.kawa.ai/python-api-for-admin#172c37d3a51a4327b811be67a14dda35
2.b) Change the setup-admin credentials and activate your software
All of the admin tasks can be performed through KAWAâs Python API.
Please refer to sections 1 and 2 of this document to change the setup-admin credentials:
https://docs.kawa.ai/python-api-for-admin#172c37d3a51a4327b811be67a14dda35
To activate your software with your license, refer to this link:
https://docs.kawa.ai/python-api-for-admin#56fad66390cd4987aa32f9990e29bf88
2.c) Connect a Python script runner
In order to connect a python script runner to workspaces of KAWA, please follow this documentation:
Head directly to sections 2b and 2c. The AES key is not needed when running the on premise installation with docker, both the server and the runner share the AES key secret.
https://docs.kawa.ai/python-script-runners#2e3c36f73cc940c89195047690d3a794
Below, find the correct parameters for the Docker compose setup:
from kywy.client.kawa_client import KawaClient
kawa = KawaClient(kawa_api_url='https://<YOUR URL>')
kawa.set_api_key(api_key_file='<PATH TO THE KEY FILE>')
kawa.set_active_workspace_id(workspace_id=1)
cmd = kawa.commands
cmd.add_script_runner(
name='Main runner',
host='kawa-script-runner',
port=8815,
tls=False
)
print(kawa.runner_health())
2.d) Setup SMTP
Setting up SMTP will let users create their own accounts on KAWA (If configured to use KAWAâs own authentication mechanism) and enable automations to send emails.
To configure SMTP, follow this documentation:
https://docs.kawa.ai/python-api-for-admin#acaf2bf0e5a748b4a2092bf50f6232d0
3. About secrets
KAWA can be used with HashiCorp Vault (https://www.vaultproject.io/) to store its secret.
However, this installation process leverages docker secret to pass secrets to the various containers in a safe way.
Please note that the secrets are loaded from various files stored next to the docker-compose.yml
file. All those files are owned by the kawa-system user and can only be read by them.
If you wish to load the secrets from environment variables instead of files:
1) Stop your kawa server using docker compose down
2) Edit the docker-compose.yml
by doing the following:
secrets:
smtp-credentials:
file: smtp.credentials
server-certificate:
file: server.crt
server-private-key:
file: server.key
kawa-master-key:
file: kawa.master.key
postgres-password:
#Â Here for instance: tell KAWA to read the password from the
#Â env variable: DB_PASSWORD
environment: DB_PASSWORD
runner-aes-key:
file: kawa.runner.key
3) Set the environment variables that you defined in the file (Here: DB_PASSWORD
) to the intended values. Use for instance export DB_PASSWORD=abcdef
4) Restart the service by doing docker compose up
. This usually fails: KAWA will complain that the secret file cannot be found. In order to fix it, stop kawa and run docker compose rm
. After that, when you restart the service, the secrets should correctly be passed to the instance.
4. Overview of the installed services
Docker compose deploys the following services on the host OS:
- One instance of a clickhouse server
- One instance of a postgres server
- One instance of an apache Arrow Flight server (grpc)
- One instance of KAWA application/web server
- (Optional) One instance of HashiCorp Vault for secret management
4.a) Inbound traffic
All inbound traffic is made through KAWA Server, using the HTTP(S) protocol. This traffic is generally initiated either via the Python Client or via the KAWA WebUI. (The WebUI assets - js/css/html are delivered by KAWA Server as well). KAWA Server exposes multiple API (REST and RPC like) in addition to the static content for the KAWA GUI.
4.b) Outbound traffic
KAWA Server will connect to:
- Data provider systems such as databases, external APIs (Airtable, Google Sheet, etc) to perform ETLs.
- (Optional) A SMTP server to manage outbound email communications, both to manage user accounts and deliver automation messages
- (Optional) Any outbound API (Like Slack, internal APIs) to publish data through Automations.
4.c) Internal traffic
KAWA Server will connect to Postgres to persist its state (Entity Repository). Postgres is also used as an event store for KAWA.
All the computations will happen on Clickhouse. Even though the clickhouse ports are exposed outside the docker container, all the user interactions with Clickhouse is ALWAYS made through KAWA to ensure Row Level Security and governance.
An instance of Python Arrow flight server (GRPC) is used to execute custom automations and user defined python scripts.