USE CASE: DATA FABRIC

Nowadays, organizations are facing the challenge of working with a variety of data scattered across multiple locations and the growing demand to provide high data analytics.

Here, where the Logical Data Fabric concept emerges as an efficient data management strategy. This method serves as an optimal solution to organizations' current data access challenges since it offers seamless access to their semantic layer.

In a time where data-driven decision making is a key indicator for success, implementing a Logical Data Fabric in the organization would be indispensable in obtaining the required data analytics to support such decisions in a speedy and efficient manner.

In modern business environments, Organizations have different analytical needs. These needs are a result of the diverse data sources they engage with serving different purposes such as financial reporting, market predictions and strategic decision making. This necessitates having a data architecture capable of effectively managing and integrating different data sources, applying required transformations and providing governed and secure data access.

A data fabric must embrace the ideas of distributed data and logical access. What does that mean?

  • Distributed implies that the modern data ecosystem is composed of multiple elements. There is no one-size-fits-all system in data management. A modern data ecosystem requires data warehouses, data lakes, operational stores, noSQL sources, real-time feeds, and more. In addition, hybrid and multi-cloud environments are becoming the norm, increasing the distribution of data.
  • Logical means that access to data is done through a logical abstraction layer. This hides the complexity of the backend and provides a single access point for consumption, security, and governance. The logical layer must also enable multiple integration strategies. Its metadata should enable direct access to the sources, but also real-time federation, selective materialization of specific datasets (e.g. caching, aggregate aware tables), extract, load, and transform (ELT) curation in a data lake, and full dataset replication.

Distributed and logical architectures offer several advantages when compared to Monolithic architectures.The Distributed and logical architectures support diverse data demands of different data consumers simultaneously. These architecture provide the following benefits:

  1. Data Reuse: The data can be reused for different analytical demands and by different parts of the organization.
  2. Minimal Data Replication. There is better management for data replication since the data is fetched in real time.
  3. Fast Data provisioning and and Cost Reduction: Data can be accessed and reused easily Therefore, costs associated with time-savings , integrations, migrations or even infrastructural costs can be further minimized.
  4. Adoption and Evolution. Organizations in the modern business environment are evolving constantly and their data needs change and expand.

Denodo Architecture supports several key Data Fabric capabilities, such as

  1. Data Virtualization Engine
  2. Augmented Data Catalog
  3. Active Metadata
  4. Al-Based Recommendations
  5. Semantic Layer with Extended Metadata
  6. DataOps and Multi-cloud Provisioning

In this tutorial, we will learn how we can leverage the benefits of a logical Data Fabric using Denodo Platform using some of the key concepts.

Data virtualization is at the heart of any data fabric, as it is the middle layer that abstracts sources from consumption. It is a layer that integrates, manages and delivers data.

Denodo Platform's core competency lies in providing a highly capable Data Virtualization engine which allows you to Connect, Combine and Publish. Enjoy Data Freedom!

Let us move on to the next capability of a data fabric.

A key aspect of any self-service strategy is the ability for business users to find what data sets are available in the data delivery layer and work out which ones are relevant to them. The data catalog is aimed at providing this ability to all users through intuitive user interfaces, often available as a web portal or marketplace.

The Denodo Data Catalog provides the self-service ability for the business users to find the available datasets in the data delivery layer and enable them to work with relevant datasets.

The Denodo Data Catalog is a web application which has a business-friendly user interface that provides context around the data geared to business users. It provides options for these users to navigate through the data assets based on tags,categories with advanced search option to identify the dataset and query them providing self service ability and ease of access

Click Next to move on to the next capability of a Logical Data Fabric.

In a logical architecture, access to any data source is channeled through a single layer, usually enabled by a data virtualization layer, as described in previous sections of this tutorial. This approach gives that logical layer a privileged position to capture data access activity, such as who accessed data, when they accessed it, how it was accessed, from what tool it was accessed, and how long it took.

Active Metadata management is a key feature required for any architecture model that implies the ability to monitor user access to the metadata and monitor the usage of resources.

Denodo Platform provides two monitoring tools.

  • The Denodo Monitor. This tool can be started as an individual component to monitor and log the different Denodo Platform servers in an environment.
  • The Diagnostic & Monitoring Tool. This tool helps monitor the resource usage and user session in real time.

However, the most interesting aspect of active metadata is that it is also used as the main input for AI algorithms that learn from usage. This active metadata management is the foundation of AI-based automation, which is described more in detail in the next section of the tutorial.

A truly effective data fabric is intelligent and provides automatic recommendations tailored to the specific usage and workloads it serves. It must be able to analyze past activities to predict the future in order to simplify and reduce the cost of using and operating a data fabric. This is at the forefront of research and development and some advanced vendors are currently incorporating AI-based recommendations into data management solutions.

Denodo Platform offers 2 key AI based recommendations features which are

  • Data discovery recommendations
  • Performance recommendations for Smart Query Acceleration

Now let us explore these options.

Data Discovery Recommendations

For Data Discovery Recommendations, let's login to the Data Catalog web application.

In the landing page we can see information like Most used views, Most Recent views and Most Recommended by you.

We can see that these are the views that were recently executed or modified by us in the previous section. Furthermore, we can also see the information regarding Catalog Management . The Data Catalog home page displays the total number of categories, tags , views and Web services synchronized and created in the application .

Lets now try executing a new view and see how the results of Recommendation are influenced. We will execute the iv_client_with_bills and navigate back to the Landing page. We can see the change in the view name in the Elements section of the landing page.

Performance recommendations for Smart Query Acceleration

Now lets see the other AI based Feature which is the Smart Query Acceleration.

Denodo Offers a feature called Summaries This feature includes an AI-driven assistant that, using artificial intelligence, analyzes the past queries (saved by the Denodo Monitor) and recommends the creation of new summaries (using Denodo Virtual DataPort and the Design Studio) to accelerate the queries sent to Denodo.

Click Next to move on to the next capability of a Logical Data Fabric.

A universal semantic layer has long been understood to be the key to democratizing data. Creating this universal semantic layer is essential to enable business users to align their understandings of the underlying data. Semantic metadata extends traditional metadata (column names, data types, etc.) with additional meaning.

  • Dependencies between elements, which enables users to explore data lineage and enables developers to perform change impact analysis
  • Relationships between related datasets, even across data sources, to simplify exploration and querying
  • Descriptions and other documentation elements that enable a better understanding of what's what
  • Status messages, deprecation notices, or warnings, which enable communication between IT and end users
  • Updates from owners, stewards, approvers, and other governing stakeholders who have performed specific tasks on that data asset
  • Tags and business terms that enable the definition of a standardized data dictionary • Identifiers for sensitive data elements
  • Technical definitions of data metrics (e.g. profit, benefit, margin, etc.), which enable the centralized, platform-agnostic definition of enterprise-wide measurements

Denodo Platform achieves this with the following features:

  • Relationships & Data Lineage
  • Collaboration Features in Data Catalog
  • Global Security Policies

Relationships & Data Lineage

Now lets see how Relationships and Data Lineage are defined and displayed in Design Studio and Data Catalog.

First, let's see the Associations! Associations are helpful in all these areas: Modeling, Browsing the views and Improving performance of queries. Let us check the association in Design Studio.

Now let us double click on the view client and click on Associations

Now, we can observe that once you click on Associations, a small dialog opens below the view where you could see the associated view bv_crm_address and bv_crm_client_type Now click on the Related view bv_crm_address and you can notice that the views are associated with each other.

Let us now login to the Denodo Data Catalog to view the Relationships and Data Lineage from a Business user prespective!

From the Data Catalog Home page, search for the view bv_crm_address and go to the Relationship Tab

Here, we can see the relationship define between the bv_crm_client and bv_crm_address view from the Data Catalog

Now go to the Query tab, select all the columns and drag and drop it under Output Column > Execute

From here, you can actually traverse between the views and its would be much easier for the users to understand the relationship between two views

So far so good, we have seen the Relationships. Now, let us see how Data Lineage works. Go to the Data Catalog home page and search for the view iv_crm_personal_data.

Open the view and navigate to Data Lineage option and click on the column primary_phone

You can see that the Data lineage shows in a tree structure all the views that are used to build the current view and when you click on a specific column it shows you the path of the view on from which source it is used and in what operations the column is involved.

Awesome! You just learnt about the Semantic Layer and some of the functionalities which can be performed in the Layer. Let's learn more capabilities in the next section.

Collaboration Features in Data Catalog

The Denodo Data Catalog is able to show status messages, deprecation notices, or warnings in Denodo views which enable communication between IT and end users.

Global Security Policies

Global Security policies are often used together with "Tags", which are labels that you can assign to views and their columns in Denodo. Later, you can assign Global security policies to views that belong to certain tags for better management.

Click Next to move on to the next capability of a Data Fabric.

Finally, once a logical data fabric is built, it needs to be successfully operated. This involves the management of new developments (version control, deployment management), monitoring and auditing, scheduling recurrent jobs (including notifications, error handling, retries, and other management aspects of batch execution), and many more.

In addition, a data fabric must be agnostic to the location of the deployment, including any cloud provider, and be able to manage provisioning at minimal cost taking advantage of the automated provisioning options that those cloud platforms provide. In many large corporations, architectures go beyond one cloud, and data fabrics often span across different cloud providers as well as different geographies. We can see from the below diagram a real-life example of the application of a multi-cloud, multi-location data fabric.

For example, due to GDPR compliance rules, access to some of the data in the EMEA zone needs to be restricted and only accessible from within Europe. By implementing a multi-cloud logical data fabric with data virtualization, with nodes in each cloud, we can achieve this, by making sure the data governance rules within EMEA and the sensitive data is all held within the EMEA availability zone.

Denodo Platform can be installed in any cloud and provides various DataOps features that make the management of new development much easier. The following are the available features:

  • Version Control System and Promoting elements using Solution Manager: you can refer to the Promoting Metadata tutorial and Version Control Systems Integration documentation.
  • Scheduler Server for planned tasks: Denodo Platform includes the Denodo Scheduler component to define automated tasks that will run at the configured time.

As last point of this tutorial, let's discuss about some best practices that you can keep in mind when building your Data Fabric architecture:

  • Data Integration: the architecture has to be able to integrate data from disparate sources irrespective of its location and format.
  • Data Preparation: the architecture has to be able to transform the data into a format ready for consumption.
  • Governance: the architecture has to include a catalog tool to document all the enterprise data for data discovery, lineage and governance.
  • Performance: the architecture has to support AI capabilities to improve the query performance to simplify the usage and operation of the platform, freeing the data team to focus on more strategic initiatives.
  • Security: the architecture has to provide a global data access layer with which to enforce security across the organization, regardless of the capabilities of each data source, enabling consistent enforcement of security policies.

Congratulations, you have successfully learnt the main capabilities of a data fabric.

Well done!