Data Fabric & Denodo

Nowadays, organizations are facing the challenge of working with a variety of data scattered across multiple locations and the growing demand to provide high data analytics.

Here, where the Logical Data Fabric concept emerges as an efficient data management strategy. This method serves as an optimal solution to organizations' current data access challenges since it offers seamless access to their semantic layer.

In a time where data-driven decision making is a key indicator for success, implementing a Logical Data Fabric in the organization would be indispensable in obtaining the required data analytics to support such decisions in a speedy and efficient manner.

In modern business environments, Organizations have different analytical needs. These needs are a result of the diverse data sources they engage with serving different purposes such as financial reporting, market predictions and strategic decision making. This necessitates having a data architecture capable of effectively managing and integrating different data sources, applying required transformations and providing governed and secure data access.

A data fabric must embrace the ideas of distributed data and logical access. What does that mean?

  • Distributed implies that the modern data ecosystem is composed of multiple elements. There is no one-size-fits-all system in data management. A modern data ecosystem requires data warehouses, data lakes, operational stores, noSQL sources, real-time feeds, and more. In addition, hybrid and multi-cloud environments are becoming the norm, increasing the distribution of data.
  • Logical means that access to data is done through a logical abstraction layer. This hides the complexity of the backend and provides a single access point for consumption, security, and governance. The logical layer must also enable multiple integration strategies. Its metadata should enable direct access to the sources, but also real-time federation, selective materialization of specific datasets (e.g. caching, aggregate aware tables), extract, load, and transform (ELT) curation in a data lake, and full dataset replication.

Distributed and logical architectures offer several advantages when compared to Monolithic architectures.The Distributed and logical architectures support diverse data demands of different data consumers simultaneously. These architecture provide the following benefits:

  1. Data Reuse: The data can be reused for different analytical demands and by different parts of the organization.
  2. Minimal Data Replication. There is better management for data replication since the data is fetched in real time.
  3. Fast Data provisioning and and Cost Reduction: Data can be accessed and reused easily Therefore, costs associated with time-savings , integrations, migrations or even infrastructural costs can be further minimized.
  4. Adoption and Evolution. Organizations in the modern business environment are evolving constantly and their data needs change and expand.

Denodo Architecture supports several key Data Fabric capabilities, such as

  1. Data Virtualization Engine
  2. Augmented Data Catalog
  3. Active Metadata
  4. Al-Based Recommendations
  5. Semantic Layer with Extended Metadata
  6. DataOps and Multi-cloud Provisioning

In this tutorial, we will learn how we can leverage the benefits of a logical Data Fabric using Denodo Platform using some of the key concepts.

Data virtualization is at the heart of any data fabric, as it is the middle layer that abstracts sources from consumption. It is a layer that integrates, manages and delivers data.

Denodo Platform's core competency lies in providing a highly capable Data Virtualization engine which allows you to Connect, Combine and Publish. Enjoy Data Freedom!

For learning about Denodo's Data Virtualization engine, we encourage you to follow the Data Virtualization Basics tutorial. Once it has been completed, you will be able to understand the concepts of connect, combine and publish.

Let us move on to the next capability of a data fabric.

A key aspect of any self-service strategy is the ability for business users to find what data sets are available in the data delivery layer and work out which ones are relevant to them. The data catalog is aimed at providing this ability to all users through intuitive user interfaces, often available as a web portal or marketplace.

The Denodo Data Catalog provides the self-service ability for the business users to find the available datasets in the data delivery layer and enable them to work with relevant datasets.

The Denodo Data Catalog is a web application which has a business-friendly user interface that provides context around the data geared to business users. It provides options for these users to navigate through the data assets based on tags,categories with advanced search option to identify the dataset and query them providing self service ability and ease of access

Data Catalog capabilities are already covered in the Data Discovery tutorial. You can learn more about it from this tutorial

Click Next to move on to the next capability of a Logical Data Fabric.

In a logical architecture, access to any data source is channeled through a single layer, normally enabled by a data virtualization layer, as described in previous sections of this tutorial. This approach gives that logical layer a privileged position to capture data access activity, such as who accessed data, when they accessed it, how it was accessed, from what tool it was accessed, and how long it took.

Active Metadata management is a key feature required for any architecture model that implies the ability to monitor user access to the metadata and monitor the usage of resources.

Denodo Platform provides two monitoring tools.

  • The Denodo Monitor. This tool can be started as an individual component to monitor and log the different Denodo Platform servers in an environment.
  • The Diagnostic & Monitoring Tool. This tool helps monitor the resource usage and user session in real time.

These monitoring tools are already covered in the section 5 of the Automated Lifecycle Management tutorial. You can learn more about it from there.

However, the most interesting aspect of active metadata is that it is also used as the main input for AI algorithms that learn from usage. This active metadata management is the foundation of AI-based automation, which is described more in detail in the next section of the tutorial.

A truly effective data fabric is intelligent and provides automatic recommendations tailored to the specific usage and workloads it serves. It must be able to analyze past activities to predict the future in order to simplify and reduce the cost of using and operating a data fabric. This is at the forefront of research and development and some advanced vendors are currently incorporating AI-based recommendations into data management solutions.

Denodo Platform offers 2 key AI based recommendations features which are

  • Data discovery recommendations
  • Performance recommendations for Smart Query Acceleration

Now let us explore these options.

Data Discovery Recommendations

For Data Discovery Recommendations, let's login to the Data Catalog web application.

In the landing page we can see information like Most used views, Most Recent views and Most Recommended by you.

We can see that these are the views that were recently executed or modified by us in the previous section. Furthermore, we can also see the information regarding Catalog Management . The Data Catalog home page displays the total number of categories, tags , views and Web services synchronized and created in the application .

Lets now try executing a new view and see how the results of Recommendation are influenced. We will execute the client_with_bills and navigate back to the Landing page. We can see the change in the view name in the Elements section of the landing page.

Performance recommendations for Smart Query Acceleration

Now lets see the other AI based Feature which is the Smart Query Acceleration.

Denodo Offers a feature called Summaries This feature includes an AI-driven assistant that, using artificial intelligence, analyzes the past queries (saved by the Denodo Monitor) and recommends the creation of new summaries (using Denodo Virtual DataPort and the Design Studio) to accelerate the queries sent to Denodo.

For more information on summaries you can check the Automatic Summary Recommendations user guide.

Click Next to move on to the next capability of a Logical Data Fabric.

A universal semantic layer has long been understood to be the key to democratizing data. Creating this universal semantic layer is essential to enable business users to align their understandings of the underlying data. Semantic metadata extends traditional metadata (column names, data types, etc.) with additional meaning.

  • Dependencies between elements, which enables users to explore data lineage and enables developers to perform change impact analysis
  • Relationships between related datasets, even across data sources, to simplify exploration and querying
  • Descriptions and other documentation elements that enable a better understanding of what's what
  • Status messages, deprecation notices, or warnings, which enable communication between IT and end users
  • Updates from owners, stewards, approvers, and other governing stakeholders who have performed specific tasks on that data asset
  • Tags and business terms that enable the definition of a standardized data dictionary • Identifiers for sensitive data elements
  • Technical definitions of data metrics (e.g. profit, benefit, margin, etc.), which enable the centralized, platform-agnostic definition of enterprise-wide measurements

Denodo Platform achieves this with the following features:

  • Relationships & Data Lineage
  • Collaboration Features in Data Catalog
  • Global Security Policies

Relationships & Data Lineage

Now lets see how Relationships and Data Lineage are defined and displayed in Design Studio and Data Catalog.

First, let's see the Associations! Associations are helpful in all these areas: Modeling, Browsing the views and Improving performance of queries. Let us check the association in Design Studio.

Now let us double click on the view client and click on Associations

Now, we can observe that once you click on Associations, a small dialog opens below the view where you could see the associated view client_tutorial_address. Now click on the Related view tutorial.address and you can notice that the views are associated with each other.

Let us now login to the Denodo Data Catalog to view the Relationships and Data Lineage from a Business user prespective!

Launch the Denodo Data Catalog from the Denodo Control Center, login into the Data catalog with the default user and password (admin/admin) and click on Sign in. Once you are logged in to the server, synchronize the elements by going to Administration > Sync with VDP.

Now we are all set! Let us explore the relationship and Data Lineage. From the Data Catalog Home page, search for the view address and go to the Relationship Tab

Here, we can see the relationship define between the tutorial.client and tutorial.address view from the Data Catalog

Now go to the Query tab, select all the columns and drag and drop it under Output Column > Execute

From here, you can actually traverse between the views and its would be much easier for the users to understand the relationship between two views

So far so good, we have seen the Relationships. Now, let us see how Data Lineage works. Go to the Data Catalog home page and search for the view personal_data_crm.

Open the view and navigate to Data Lineage option and click on the column primary_phone

You can see that the Data lineage shows in a tree structure all the views that are used to build the current view and when you click on a specific column it shows you the path of the view on from which source it is used and in what operations the column is involved.

Awesome! You just learnt about the Semantic Layer and some of the functionalities which can be performed in the Layer. Let's learn more capabilities in the next section.

Collaboration Features in Data Catalog

The Denodo Data Catalog is able to show status messages, deprecation notices, or warnings in Denodo views which enable communication between IT and end users.

Data Catalog capabilities are already covered in the Data Discovery tutorial. You can learn more about it from the "Collaboration in Data Catalog" section of that tutorial

Global Security Policies

Global Security policies are often used together with "Tags", which are labels that you can assign to views and their columns in Denodo. Later, you can assign Global security policies to views that belong to certain tags for better management.

Now let us see this in action! Let's login into the Design Studio as administrator to see how Global Security Policies help in providing additional security features to mask sensitive data.

  • Let us create a new user from the Design Studio using Administration > User Management option. Click on New
  • Create the user with username and password as jane/Denodo@1

Now let us create a Role from Administration > Role Management. Create the Role with description as follows

Once the role is created, select the Role and click on Edit privileges

.

Give connect privilege over database tutorial and then click on Advanced and for the view personal_data_crm provide Execute and Metadata privileges:

Save the changes. Now go back to User Management from Administration option, click on the user jane > Edit Roles > Assign - business_user role to the user. > Save it

Great! Now you have successfully created a user and associated Role. Let us now try to define tags and security policies

You can check the tutorial, GLOBAL SECURITY POLICIES AND TAGS on how to create tags and security policies. For this tutorial, we will directly create it now

  • Create a new tag called Confidential and do the following
  • Tagged views > Drag and drop the "personal_data_crm"
  • Tagged Columns > Drag and drop the "personal_data_crm" and select client_id, primary_phone and value. Save the Tag
  • Go to Administration > Semantics and Governance > Global Security Policies and create a new policy as customer_information and the input the following as shown in the below image.

  • Log out from Design Studio and login as the user jane/Denodo@1. Now you will see the user will only have access to one view which is personal_data_crm (because of the Role Privileges).
  • Double click on the view and execute it and you we can see that the masking is applied to the Columns/Views tagged as confidential (because of the Global Security Policy)

With this you have successfully learnt the basics of the Semantic Layer with Extended Metadata capability in Denodo Platform.

Click Next to move on to the next capability of a Data Fabric.

Finally, once a logical data fabric is built, it needs to be successfully operated. This involves the management of new developments (version control, deployment management), monitoring and auditing, scheduling recurrent jobs (including notifications, error handling, retries, and other management aspects of batch execution), and many more.

In addition, a data fabric must be agnostic to the location of the deployment, including any cloud provider, and be able to manage provisioning at minimal cost taking advantage of the automated provisioning options that those cloud platforms provide. In many large corporations, architectures go beyond one cloud, and data fabrics often span across different cloud providers as well as different geographies. We can see from the below diagram a real-life example of the application of a multi-cloud, multi-location data fabric.

For example, due to GDPR compliance rules, access to some of the data in the EMEA zone needs to be restricted and only accessible from within Europe. By implementing a multi-cloud logical data fabric with data virtualization, with nodes in each cloud, we can achieve this, by making sure the data governance rules within EMEA and the sensitive data is all held within the EMEA availability zone.

Denodo Platform can be installed in any cloud environment and provides various DataOps features that make the management of new development much easier. The following are the available features:

  • Version Control System and Promoting elements using Solution Manager: you can refer to the Handling Metadata Promotion between Environments section of the Automated Lifecycle Management tutorial and Version Control Systems Integration documentation.
  • Scheduler Server for planned tasks: Denodo Platform includes the Denodo Scheduler component to define automated tasks that will run at the configured time.

As last point of this tutorial, let's discuss about some best practices that you can keep in mind when building your Data Fabric architecture:

  • Data Integration: the architecture has to be able to integrate data from disparate sources irrespective of its location and format.
  • Data Preparation: the architecture has to be able to transform the data into a format ready for consumption.
  • Governance: the architecture has to include a catalog tool to document all the enterprise data for data discovery, lineage and governance.
  • Performance: the architecture has to support AI capabilities to improve the query performance to simplify the usage and operation of the platform, freeing the data team to focus on more strategic initiatives.
  • Security: the architecture has to provide a global data access layer with which to enforce security across the organization, regardless of the capabilities of each data source, enabling consistent enforcement of security policies.

Congratulations, you have successfully learnt the main capabilities of a data fabric.

Well done!