Welcome to the Denodo Agile BI Tutorial. In this tutorial, we will review the basic steps to analyze enterprise data for gaining operational insight.
These are the lessons of this tutorial:
- Presentation of the Use Case/Business Problem.
- Use Denodo to Implement the Solutions (Connect to Data Sources, Combine the Data, Create Final Reports to Publish Data for Clients)
- Consume Data from a BI Tool.
Let's begin by reviewing the business problem at hand.
Let's suppose we work for a company that sells outdoor furniture and equipment, let's call it "PPY International". The company sells its products in several countries in North America and Europe, both to end-customers and distributors.
The situation we face today is that we have received an urgent request from the VP of Marketing. She needs a report on the performance of the different marketing campaigns that have been running in different countries and regions, to plan and budget for next year's campaigns. The campaign data is attached to her email as an Excel file.
The first problem that we have is that the information needed to create that report is not readily available in a single place and in an unified format. The Marketing data is in an Excel file, and, for this typical example, we have the product and sales information in a relational database, stored in a star schema.
Once we get the data together, we find another problem: information that comes from different sources usually needs to be normalized so it can be combined together in a meaningful way. In our case, the most obvious case for normalization will be with the country names used in the Excel spreadsheet (see image below).
Finally, traditional approaches to business intelligence tend to result in building the same logic over and over again across different BI and Reporting tools for generating similar, or even the same reports, in different places or by different departments.
Having the information centralized by Denodo means that the business logic needed to create all the reports will be in a single spot, so it can be easily maintained and modified, and it can be easily shared with a set of external BI tools without having to replicate code across them.
Let's take a look at the data sources of our example.
Marketing promotions
The first data source is the Excel file that we received from Marketing. It has two worksheets:
- The Marketing promotions sheet contains information about the promotions that the marketing department carried on in the past. Each marketing promotion has a start and an end date, a country where it ran and a product that applied to, plus several other fields such as the discount per item, the type of promotion, etc.
- The second sheet contains a Country lookup table. If we take a detailed look at the first sheet, we will see that the country names have a variety of spellings and formats. It is very common for some of the data to be typed manually - and, in this case, data tends to apply the format rules in a rather flexible way.
To solve this issue and normalize the data, we have chosen to quickly write a lookup table within the Excel file itself, in a separate worksheet, to be able to go from the incorrect name to the standardized one.
Products and sales
The other main data source we have is our central products and sales database. Let's take a look at the schema:
The database is structured as a star schema, with the main sales fact table and the areas, dates, ordermethods and products dimension tables.
Each table contains the following data:
- Sales: This is a fact table that contains one entry per individual item transaction. It has an integer ID for the sale and the rest of fields are foreign keys to entries in the other dimension tables (dates, products, ordermethods and areas), and an ID for the marketing promotion id that this item was sold under, if any (null otherwise).
- Areas: the areas dimension table contains information about the countries where our company operates in.
- Dates: a dimension table with IDs for all the dates that are referred to by events in our database.
- Ordermethods: a dimension data for which methods were used to order the product in a sale.
- Products: the main product reference in our database, contains some generic product data.
Let's organize all this data in a coherent view of the marketing promotion performance. If you have followed previous tutorials, you are aware of the process, but let's explain it one more time:
- Connect Denodo to the data sources, to retrieve the data in real time.
- Combine the data performing the needed transformations and normalizations.
- Publish the data for the client application.
- Create our final reports ready for consumption.
To organize our virtual objects, we are going to use the "tutorial" database, and the virtual folders defined in the VDP Naming Conventions. If you have not created the folders yet, please go ahead and create three folders, named
-
1 - connectivity
2 - integration
3 - business entities
4 - report views
5 - data services
8 - stored procedures
Then, create two sub folders 1 - data sources
and 2 - base views
under the 1 - connectivity
folder.
The first step is connecting to the data sources. As we saw in the Denodo Basics tutorial, Denodo will store just metadata - no data will be imported from the source (unless we explicitly tell Denodo to do so, through caching). Instead, the data will be retrieved in real time.
The first connection will be to our database of sales and products. What we need to do is to:
Right-click over the "1- data sources" sub folder and select + New > Data source.
This will open the wizard to select a data source. Select "Generic database". That opens the JDBC data source dialog.
Here we do the following:
- Type the name of the datasource, in our case
ds_sales
. - Set the database URI. In our example the database server is running in localhost and the default port, so our URI is
jdbc:mariadb://mariadb:3306/sales
. Modify these settings as needed for your environment.
- Set the username as root and password as admin for your db.
Now move to the Advanced tab:
- select the Driver classpath as mariadb-2.7 and set the Driver class as org.mariadb.jdbc.Driver.
- Then, click on "Connection Pool configuration", and make sure "Test connections" is selected.
- Click
After these steps we have created our data source for the sales database. When we click on Create base view
Denodo will show us the introspection screen where we will select which tables we are going to import.
Go ahead and select all the tables under the "sales" database, and choose the folder as "/1 - connectivity/2 - base views". Now, add a prefix as "bv_salesdb_" and then click Create selected
.
Now you can simply double click on any view to see the schema of that base view. For example:
The second connection will be to the Excel file. This file contains two worksheets as discussed earlier; we have to consider them separately so we will create two different connections to the source, one for each worksheet.
To create the first one, we right-click over the "1 - data sources" folder and select New > Data source.
In the opening wizard, select Excel
data source. In the following dialog:
- Type the name of the datasource (for example, ds_marketing).
- Select the type of file, in this case Excel 2007 or later.
- Choose the file location: select "HTTP Client", and then under configuration provide the HTTP method as GET and the Base URL as
http://webserver:1080/excel/Marketing_promotions.xlsx
- Worksheet: the name of the worksheet is "Marketing promotions", without the quotes.
- The start cell is A1, and the end cell is I43.
- Finally, make sure you have "Has headers" and "Stream tuples" selected.
Click , that will create the data source.
Now, let's create a base view from it, by clicking on Create base view
.
You will see the schema of the new view. Here you can change the data types as seen in the following screenshot:
and click to create the final base view.
Now it's time to create the base view for the second worksheet. The process is almost identical to the first worksheet. Start by right-clicking over the "1 - data sources'' sub folder and selecting New > Data source
and configure the next dialog in the same way as before, except:
- The worksheet is named "Country lookup".
- The start and end cells are A1 and B36.
After clicking , we have our data source ready. Now, by clicking on
Create base view
, we access the base view editing dialog, where we will edit the final view as follows:
Click . Now we have our two Excel views in addition to the views coming from the database.
We are almost ready, but let's first move the base views to the "2 - base views" sub folder so they are well organized. Just select all of them and drag them to the destination folder. After that is done we will start doing data combinations.
With the data sources and base views in place we are in a position of starting to combine all the data. The first steps are to have a denormalized, standardized view of each data source so we get a set of base business entities that we will use later to build more complex reports.
First let's take a look at the marketing data. The main problem we have, as discussed earlier, is that the country names are not standardized due to a manual input process. To solve this we have the marketing lookup table ready, so let's go ahead and combine these sources:
- Right-click over the "2 - integration" folder and select
New > Join
. - Drag the "bv_marketing_promotions" view onto the workspace.
- Drag the "bv_marketing_country_lookup" view onto the workspace.
- Connect the "area" column from the "bv_marketing_promotions" table to the "marketingarea_0" column of the "bv_marketing_country_lookup" table.
- In the Output tab, remove the "area" and "marketingarea_0" columns
- Rename the view to
iv_marketing_promotions_normalized
and click.
With these steps we have created a new view that contains the standardized names for each country, instead of the wrong ones that we originally had in the Excel file. To check that everything is working properly click on Query
and then on Execute
We see that the data is being returned correctly and now the country names are following our internal naming conventions.
The next step to prepare our data for meaningful reporting is to denormalize the star schema we have in our database, so it's easier for users of this data to create derived views aggregating or summarizing it. Remember, we have one fact table (sales) and four dimension tables (dates, products, ordermethods and areas).
We will do this transformation by using a five-way join:
- Right-click over "2 - integration" and select New > Join
- Drag onto the workspace the views bv_salesdb_sales, bv_salesdb_dates, bv_salesdb_products, bv_salesdb_ordermethods and bv_salesdb_areas.
- Connect the following columns by using drag & drop:
After connecting the four fields we should have the following state:
- Now move to the output tab. First rename the view to
iv_sales
, then remove the duplicate fields and rename the fields that we are keeping so the schema looks like this:
Click to create the view.
Now we have a view that contains the full sales data in an easy to process denormalized format. We are ready to connect the data that is coming from both sources.
With the base business entities in place we can begin to answer business questions with the data.
Combining Sales and Marketing Data
The first step is to combine the two main business entities that we have created so far, marketing and sales data, so we can analyze the correlation between the sales information and specific marketing promotions.
This is again a join, so follow these steps:
- Right-click over the "2 -integration" folder and select
New > Join
- Drag the views "iv_sales" and "iv_marketing_promotions_normalized" onto the workspace.
- Connect the "iv_sales.promotion" column to the "iv_marketing_promotions_normalized.promotionid_0" column, using drag & drop.
Just move to the Join conditions
tab and then click on "Inner join", then "Change join type", then "Left outer". This will ensure we receive all of the sales rows.
Last, in the output tab rename the view to iv_sales_with_promotions
and remove the duplicate fields so we are left with the following schema:
Clicking will create the view.
This view contains the full sales and marketing data; when we execute it Denodo retrieves the information from all the sources in real time and combines it to provide you the full view of the data.
Creating the Final Business Reports
Once the full sales and marketing data is combined and we have a comprehensive, fully denormalized view of all our data, we can create the final report that we will use for answering the questions asked by our business requirements.
We can do this by using an aggregation:
- Right-click on the "4 - report views" folder and select
New > Selection
. - Drag the "iv_sales_with_promotions" view onto the workspace.
- Now, move to the
Group by
tab and select "Use group by". With the group by enabled, we will add the following fields as group by fields: - iv_sales_with_promotions.country (selected from the list on the left side).
- A new aggregation expression with the following specification:
case when (description is null) then false else true end
- Another two aggregation expressions with the expressions:
getyear(fulldate)
getmonth(fulldate)
With these "group by" fields we are aggregating all the monthly sales per country, divided in two groups: one for the promotion sales and one for the organic sales.
- Let's move to the
Output
tab. Here, rename the view to "monthly_sales_by_country" and add the following aggregation expressionstotalsales ⇒ count(*)
month ⇒ getmonth(first(fulldate))
year ⇒ getyear(first(fulldate))
area ⇒ first(area)
in_promotion ⇒ case WHEN (promotion_description is null ) THEN false ELSE true END
- And finally click
to create the view.
If we execute it we will get these results:
You can see that for each country and each month, we have two rows: one for value true and other for value false for the "in_promotion" column, and then the total number of sales for each one of them.
Now we will create a summarized version of this report that aggregates the information per geographical area (North America and Europe). We could easily do this in any of the usual BI tools, but doing it in the virtualization layer means that it will be available to all client applications, so we don't have to replicate this logic across different tools.
- Start by creating another selection view in the folder "4 - report views".
- Drag the "monthly_sales_by_country" onto the workspace.
- Move to the
Group by
tab, check "Use group by" and then select as the group by fields all of them except country and totalsales:
- In the output tab, rename the view to
monthly_sales_by_area
and add one aggregation field:totalsales ⇒ sum(totalsales)
- Finally, click
.
Executing this view shows a similar result to the country sales, but now we have a lot less rows as we are aggregating all the countries per area together.
In this example we are going to access our report from Tableau. This BI tool uses JDBC to access Denodo and send standard SQL queries. By default Denodo offers all the views (base or derived) through JDBC and ODBC automatically; we only need to do manual publishing steps in the case of creating web services.
So in this case, this step is trivial: we do not need to do anything regarding report publication, we only need to consume it from Tableau.
Connecting to Denodo
Tableau provides a Denodo Connector using which views from Virtual DataPort are accessed and reports are generated. To connect to Denodo - just follow these steps
- Create a new data source using the Denodo Connector.
- Provide the server details and user credentials to connect to the database
- Click Sign in.
- Select the view: drag and drop into the work space. We have two,
monthly_sales_by_country
andmonthly_sales_by_area
Creating the Report
Once the data connection is live, we can use the standard features from Tableau to create a dashboard that highlights the performance of the marketing promotions over time, split by country, split by area and total across all regions:
With this report at hand, that highlights the promotion-driven sales in orange versus the organic sales in blue, we can measure the performance of each marketing campaign per country and per month, so we can realize at a glance that the campaigns in Germany have low performance when compared to the rest of countries, or that the end of the year has been a great performer in Canada, like the outstanding performance of the first marketing promotion of the year in the UK.
In this tutorial we have seen how a business question that traditionally has been difficult to answer can be solved with Denodo in a fast and easy manner. Not only that, but implementing a solution using Data Virtualization allows us to have a live report - as we have not imported any data, but retrieved in real time, in the future when our sales database grows and new marketing data becomes available, the report will be always up to date at any point.
On top of that, as we have created semantically relevant business entities within our virtualization layer, we are in a position where deriving new reports or entities is trivial, so we can continue building views and adding value based on disparate data; the fact that the data sources are physically distributed and have a variety of formats does not impact the ease of building new analysis infrastructure on top of them, so we can have an intelligent framework that is centralized, so it prevents duplication of business logic and brings out the real value of the data within your organization.
This concludes the Use Case: Agile BI tutorial - if you want to continue exploring what the Denodo Platform can do for you, take a look at the official documentation and play with your own use cases.