Create a Data Model
🎓

Create a Data Model

đź“Ś
This tutorial will show how to create a simple data model from a few data sources.

Here is the model we will build with KAWA.

image

1. Importing your data

For this scenario, we will import data from four CSV files, each one corresponding to a table of our schema.

đź“Ś
This can also be achieved with tables imported from a database.

In KAWA, create a new workspace for this use case: Client Order Model

image

1.a) Import the orders fact table

From the datasource section, click on “+” > Upload a file.

image

Download this file and upload it into KAWA.

orders - transactions.csv912.5KB

Click on Next:

image

1.b) Define the primary key

To define the primary key, do the following:

image
đź“Ś
You can have many columns participating in the primary key.

1.c) Tag the foreign keys

The next step is to configure the attributes on the dimensions that you will use to perform the joins - the foreign keys.

đź“Ś
Attributes behave like Tags you put on dimensions.

Here, we have two foreign keys: Customer ID and Product ID.

On each of those two columns, do the following: Create Attribute, Type in the name of the new attribute, and click Next.

image
image

The target configuration should be like this:

image
⚠️
 Do not configure the foreign key as part of the primary key in that case.

When your configuration is correct: click on “Save and Run” at the bottom left.

1.d) Import the dimension tables

Repeat the process described above to import the two dimension tables:

product-dimension.csv137.9KB

The configuration for them should be:

image
image

1.e) Import the event table

Import the event csv into KAWA. There is no configuration to apply there, no primary key nor attribute.

events.csv47.3KB

2. Creating your model

This is the list of existing datasources that you need to start this step:

image

2.a) Exploring the Fact table in a sheet

Go in the Sheets section and open the orders - transaction sheet (The name may differ if you named your datasource differently on the import step)\\

image

Click on the “Lineage” menu item at the top:

image

The current model is simplistic: One sheet and one Data Source. We call this Data Source the primary Data Source of the model because it will impose its primary keys. All the JOINS that we will do will be LEFT JOINS on this data source.

đź“Ś
This implies that we will always have exactly AS MANY ROWS as there are primary keys in the primary datasource (When we do not filter the data).

Click on the “Model” menu item at the top:

image

Here you can see all the columns imported from the Primary Data Source - The orders.

2.b) Joining the dimension tables through the Unlock Feature

In the “Views” section, click on “+ Add Fields”. Notice that the two dimension tables are available for joining. click on UNLOCK for both of them, and Save.

image

This is what you obtain:

image

Go ahead and add the Customer Name from the customer dimension. Add as well Category, Sub-Category and Product name from the product dimension.

Click on the “Lineage” menu item at the top:

image

You can now see that three data sources are used to feed our Model. Hover over the edges to obtain details about joins (List of foreign keys that are used to perform the LEFT JOINS)

2.c) Joining another fact table through linked view

Create the Linked view to the Order sheet

Click on the “Customer Name” Field in your grid. This field was added from the “+ Add field” menu, from the customer dimension.

image

From there, pick the “Even” sheet, resulting from the Event import.

image

We are telling KAWA that the “Customer Name” column from the current sheet will be linked to the “Attendee” column of the events sheet.

When you click on “Apply”, you should see the following:

image

The top grid will filter the bottom on all events whose attendee is the “Customer Name”

đź“Ś
If the bottom grid is empty, it means that the selected “Customer Name” at the top does not have any associated events.

Add columns from the bottom Grid to the Top one

Just drag and drop the “Cost” column from the bottom to the top. A new column will be added like so:

image

This new column will be the SUM of the “Cost” of all events associated with each customer.

đź“Ś
This is a 1 to Many join. The aggregation function that is used is the one that is shown at the bottom. It can be changed at any time.
image
đź“Ś
Filters can be applied at the bottom as well, and the will be applied to the values from the top.

Click on the “Lineage” menu item at the top:

image

Here is the final lineage where all the elements appear.