Denodo Governance Bridge for OpenLineage - User Manual
You can translate the document:
Overview
OpenLineage is an Open Standard for data lineage metadata collection and analysis. Denodo Governance Bridge for OpenLineage retrieves metadata from Denodo Platform to create JSON files with the data lineage following the OpenLineage specification.
Denodo Governance for OpenLineage provides a service endpoint that generates a folder with the data lineage for Denodo according to the OpenLineage specification. Also, it provides a script to load this data lineage into Marquez,the reference implementation of the OpenLineage standard .
Considerations
OpenLineage defines multiple types of events to support both runtime and design lineage:
- Job Run State Updates (RunEvent): describes the execution of a job, emitted at runtime.
- Job Metadata Updates (also known as static lineage) (JobEvent): describes metadata about a job, such as its location in source code or declared inputs/outputs. Emitted at design-time and not associated with a Run.
- Dataset Metadata Updates (DatasetEvent): describes metadata changes related to a dataset, such as schema, ownership, or documentation. Emitted at design-time and not associated with a Run.
Denodo Governance for OpenLinage only focuses on design lineage events. Base views are modelated as DatasetEvents and derived views as JobEvents.
Installation
The distribution of the Denodo Governance Bridge for OpenLineage consists of:
- Command-line executable scripts for Windows and Linux (/bin folder)
- Configuration files: application.properties and log4j2.xml (/conf folder)
- A sample JSON file, input-open-lineage.json, required for the execution using the script to generate data lineage (/conf folder). See the Data lineage generation script subsection under the Generate data lineage section.
- Java libraries (/lib folder)
- Denodo Governance application jar:
denodo-open-lineage-governance-<version>-jar
- Denodo driver jar: denodo-vdp-jdbcdriver-<version>-full.jar
If you need to use a different Denodo driver version from the one that is distributed, you have to replace this jar by the Denodo driver of the proper version.
In order to install the Denodo Governance Bridge for OpenLineage, just download the .zip file and extract the tool into the desired folder.
Before running the Denodo Governance for OpenLineage, the user has to review and complete the application.properties file. It has some default properties that can be reset and some connection information that should be set. Consult the section Configuration of the Denodo Governance Bridge for OpenLineage for more detailed information.
After completing the configuration and running the script denodo-open-lineage-governance.sh|bat available in the /bin folder, you can trigger the application.
Configuration of the Denodo Governance Bridge for OpenLineage
The application.properties file, available in the /conf folder, allows the user to set the properties required to run the application.
Denodo connection properties
- denodo.driver-class-name: the class name of the Denodo JDBC driver.
- denodo.host: Denodo host name.
- denodo.username: username used to connect to Denodo Platform.
- denodo.password: password used to connect to Denodo Platform. It could be encrypted or clear. See the How to encrypt passwords section for a detailed explanation.
- denodo.url: Denodo JDBC connection URL.
OpenLineage properties
- producer.URL: used to know how the metadata was generated. The default value is: https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client
- open.lineage.destination.folder: the destination folder name. This folder will be available in the denodo-open-lineage-governance-<version> directory, after triggering the data lineage generation, and will contain the JSON files with the data lineage which can be loaded in Marquez. The default value is “OpenLineage”.
Usage
Denodo data lineage
The Denodo Governance Bridge for OpenLineage allows generating JSON files with the relations between Denodo elements that give rise to Denodo Derived Views. And also, includes a script to load those files into Marquez in order to see the data lineage graphically.
Generate data lineage
Once the Denodo Governance Bridge for OpenLineage is up and running (script denodo-open-lineage-governance.sh|bat) it offers two methods to generate a folder with the Denodo technical lineage:
- Service endpoint
- Data lineage generation script
Service endpoint
The Denodo Governance Bridge for OpenLineage offers two endpoint services that should be requested using an HTTPS POST method.
The first one, generates a folder with the Denodo data lineage locally to the Bridge server:
https://<server-host>:<server.port>/api/generateLocalOpenLineage |
The second endpoint, allows users to download the compressed .zip file generated with all the Denodo data lineage:
https://<server-host>:<server.port>/api/generateOpenLineage |
The server.port is 8444 by default but the user can configure it in the application.properties file.
The request body should be a JSON object defining:
- A JSON array with the Denodo database names the data lineage is to be generated for.
Example:
{ "denodoDatabases":["db01", "db02", "db03"] } |
Optionally, if you want to customize the elements for which the tool generates data lineage, you should add a filter in the request body. This filter must specify the views that you want to be part of the data lineage, allowing ‘*’ as a wildcard character.
Example:
{ "denodoDatabases":["db01", "db02", "db03"], "filters":["view*", "*test"] } |
In this example, the data lineage will be generated only for views started with ‘view’ or ended with ‘test’.
Accordingly with the request body, the Content-Type header must be added:
Content-Type: application/json |
With the first endpoint, the output shows information about the Denodo databases processed and included in the data lineage generated. Example:
{ "Open Lineage generated for Denodo databases : ": "["db01", "db02", "db03"]" } |
With the second endpoint, the output will be the .zip file generated. Example:
Data lineage generation script
In the bin directory of the distribution you will find the script denodo-generate-open-lineage-governance.sh|bat. You can execute it in order to generate the Denodo data lineage for OpenLineage:
$ cd denodo-governance-openlineage-<VERSION> $ bin/denodo-generate-open-lineage-governance.sh conf/input-open-lineage.json |
Or you can execute it in order to download a .zip file with the Denodo data lineage:
$ cd denodo-governance-openlineage-<VERSION> $ bin/denodo-generate-open-lineage-governance.sh conf/input-open-lineage.json /download-directory |
In this case, the script needs as a second parameter a path to the directory where you want the .zip file to be downloaded.
This script uses curl, a standard tool available in most systems, for invoking the service endpoint of the Denodo Governance Bridge for OpenLineage that generates the Denodo data lineage. You can check if you have curl installed in your system using the command:
$ curl --version |
The output shows the name of the Denodo databases for which the data lineage has been created. Example:
$ bin/denodo-generate-open-lineage-governance.sh conf/input-open-lineage.json HTTP/1.1 200 Content-Type: application/json Transfer-Encoding: chunked Date: Tue, 18 Jun 2024 10:53:14 GMT {"Open Lineage generated for Denodo databases : ":"[db01, db02]"} |
Note that the output of the script also includes the HTTP response headers. You can check the HTTP status code to see if the process was OK.
If the user chooses the download zip option of the script, the output shows the progress of the .zip file download:
$ bin/denodo-generate-open-lineage-governance.sh conf/input-open-lineage.json C:/Downloads % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 527 100 22 100 505 21 488 0:00:01 0:00:01 --:--:-- 510 |
Example of an OpenLineage JSON file created for a the base view customer:
Example of an OpenLineage JSON file created for the derived view book_details:
Create the Denodo Data Lineage in Marquez
Marquez is an LF AI & DATA incubation project to collect, aggregate, and visualize a data ecosystem’s metadata and is the reference implementation of the OpenLineage standard.
You can install and run Marquez following these directions.
Data Lineage load to Marquez script
In the bin directory of the distribution you will find the script denodo-add-lineage-to-marquez.sh|bat. You can execute it in order to create the Denodo data lineage in Marquez:
$ cd denodo-governance-openlineage-<VERSION> $ bin/denodo-add-lineage-to-marquez.sh http://localhost:3000 C:/OpenLineage_20260310/book09_book_sales_details_lineage.json "Lineage metadata stored in Marquez" |
This script needs two parameters:
- The URI where your Marquez instance is running, usually http://localhost:3000.
- The path to the JSON file with the OpenLineage event previously generated.
This script also uses curl for invoking the request to Marquez.
Once all the OpenLineage files are loaded to Marquez, we can see the data lineage.
Denodo views are modeled as Datasets in Marquez.
Data lineage of the views of Denodo book09 data base in Marquez.
How to encrypt passwords
The Denodo Governance Bridge for OpenLineage expects encrypted passwords in the application.properties to appear surrounded by ENC(...). You can compute these values using the Jasypt CLI tools, and use the DENODO_EXPORT_ENCRYPTION_PASSWORD environment variable, or Java system property, to communicate the encryption password to the Denodo Governance Bridge.
This way, you can use encrypted passwords in the application.properties file:
... password=ENC(s2FdirMK4QORq1HZ6tcTTQ==) ... |
These are the steps for encrypting passwords:
- Download Jasypt CLI tools.
- Choose an encryption password, e.g., mypassword.
- Go to jasypt/bin.
- Run encrypt.bat with the input parameter and password parameter:
- input parameter - this is the string you want to encrypt.
- password parameter - this is the password that Jasypt is going to use to encrypt and decrypt the input parameter.
Your command should look like this:
Take note of the output. Example output: zrass64ls4LIx5hdFoXXyA==.
- Open your application.properties file, replace the password you want to encrypt with the output from Step 4: ENC(zrass64ls4LIx5hdFoXXyA==).
Example in the application.properties file:
Before Jasypt: password=admin After Jasypt password=ENC(zrass64ls4LIx5hdFoXXyA==) |
- Add an environment variable, or Java system property to the Denodo Governance Bridge start script, with the name DENODO_EXPORT_ENCRYPTION_PASSWORD, and value of mypassword, but use your real encryption password.
