Combining the data

Combining the data

With the data sources and base views in place we are in a position of starting to combine all the data. The first steps are to have a denormalized, standardized view of each data source so we get a set of base business entities that we will use later to build more complex reports.

Cleaning the Marketing names

First let's take a look at the marketing data. The main problem we have, as discussed earlier, is that the country names are not standardized due to a manual input process. To solve this we have the marketing lookup table ready, so let's go ahead and combine these sources:

  • Right-click over the "03-data model" folder and select New > Join.

  • Drag the "marketing" view onto the workspace.

  • Drag the "marketinglookup" view onto the workspace.

  • Connect the "area" column from the marketing table to the "marketingarea_0" column of the marketing lookup table.

  • In the Output tab, remove the "standardizedarea_0" and "marketingarea_0" columns

  • Rename the view to "marketing_full" and click .

With these steps we have created a new view that contains the standardized names for each country, instead of the wrong ones that we originally had in the Excel file. To check that everything is working properly click on :

and then on Execute again:

We see that the data is being returned correctly and now the contry names are following our internal naming conventions.

Denormalizing the star schema

The next step to prepare our data for meaningful reporting is to denormalize the star schema we have in our database, so it's easier for users of this data to create dervied views aggregating or summarizing it. Remember, we have one fact table (sales) and four dimension tables (dates, products, ordermethods and areas).

We will do this transformation by using a five-way join:

  • Right-click over "03-data model" and select New > Join

  • Drag onto the workspace the views sales, dates, products, ordermethods and areas.

  • Connect the following columns by using drag & drop:

    sourcetarget
    sales.datedates.id
    sales.productproducts.id
    sales.ordermethodordermethods.id
    sales.areaareas.id

    After connecting the four fields we should have the following state:

  • Now move to the output tab. First rename the view to "sales_full", then remove the duplicate fields and rename the fields that we are keeping so the schema looks like this:

Click to create the view. Now we have a view that contains the full sales data in a easy to process denormalized format. We are ready to connect the data that is coming from both sources.