Purpose of tagging
Let's start with this: what are tags and why do we tag data?
Tags are labels or metadata to associate and categorize data elements so they can be easily retrieved for browsing, searching, and managing data. Tagging can be useful for a wide variety of reasons:
Data Governance
- Classify content based on some grouping: #corporate, #sales, #industry, #partner, etc.
- Data quality labels: #gold, #silver, #bronze, etc.
- Data dictionary to relate a field or view with a specific business definition: #name, #ssn, #address, #phone, #customer, etc.
Security, Data Privacy, and Data Protection
- Security is sometimes an extension of the above (since security can depend on classifications or business terms) but often has its own system. Think, for example, of classified information: #unclassified, #secret, #topsecret, etc.
- Regulatory requirements will drive organizations to have additional data protection policies to safeguard sensitive data and ensure data access by only authorized users: #PII, #PHI, #GDPR, etc.
Version Control
- Version or release management, similar to tags in a VCS. Imagine things labeled as #version3.4 or #june2023.
And many others, for example:
- Crowdsourcing by users (ratings, warnings, deprecation, etc.)
- Master Data Management domains (customer data, product data, reference data, etc.)
Each organization will have its own policies and, in some cases, may use tags for more than one purpose. Many organizations will use some combination of the three approaches above, while others will focus on one aspect of tagging.
Who tags?
This is another important question. We may have different personas responsible for tagging, especially if tagging is used for multiple purposes.
- Sometimes, tagging may come from the developers.
- More often, though, tagging comes from a specialized profile, like a data steward or security manager.
- In some cases, tags may be something that is crowdsourced to the BI community.
With what tool?
Managing tags can be done from Design Studio or Data Catalog. Different users will use different tools depending on their role.
- Developers of Denodo will be most familiar with Design Studio.
- Security teams will be more familiar with Design Studio since it is the tool where security controls are created and managed.
- Data stewards tend to use data governance tools, like Collibra. In the absence of third-party governance tools, data stewards with an IT focus will use Design Studio and Data Catalog, while data stewards with a business focus will primarily use Data Catalog.
- BI users will use Data Catalog, as it is the interface they are already using.
What is the lifecycle of tags?
Tags may be created directly in production (e.g. for crowdsourced tagging). They may also go through a validation process that requires Q&A (e.g. for security tags) which means they will need to be created in a lower environment, validated, and migrated as part of a revision, just like other metadata.
Product Considerations
Virtual DataPort (VDP) Tags
- VDP tags are supported at both the view and column levels.
- There are specific roles, both for Data Catalog and Virtual DataPort, that control which users can create and assign tags.
- Global security policies allow defining security restrictions and are created together with VDP tags (e.g. PII, PHI, or confidential data). Global security policies are easier to manage than view restrictions (row restrictions and column privileges) because you have the advantage of assigning a global policy on multiple views/columns at the same time. Global security policies are only available with Denodo Enterprise Plus licenses.
- External tags can be imported from Collibra or other third-party data catalogs.
Data Catalog Tags
- Tags created in Data Catalog apply only to views and web services. They are meant for data classification, searching, filtering, and gathering crowdsourced input from the user community.
- Data Catalog tags are meant to be production-only. They are not available for migrations in Solution Manager, although basic export/import is possible.
- VDP tags can be imported and synchronized selectively with Data Catalog in order to expose them to business users. For instance, you may not want to expose security tags like #classified and #top_secret, but others like #name, #ssn, or #july2022 would make sense in Data Catalog.
- VDP tags in Data Catalog appear as read-only. Bear in mind that since they can be used for security, making a change in Data Catalog could have dramatic consequences.
- Synchronized VDP tags appear in Data Catalog with the same capabilities as the native Data Catalog tags (e.g. can be used for filtering). In addition, they also appear in the schema tab of a view and you can also use them to search for columns that have specific tag. For admin users in Data Catalog, they will look different (with a VDP logo), so they can be distinguished. But for end users, they behave in exactly the same way.
- Synchronizing VDP tags into Data Catalog is only available with Denodo Enterprise Plus licenses.
- Using the same approach described above to import from third-party data catalogs, tags in Data Catalog can also be imported into Virtual DataPort. This can be useful, for example, to migrate existing tags with the purpose of using them for security or if you prefer to manage tags in Virtual DataPort instead.
Putting it all together
Tags are a concept with broad uses across different personas for many scenarios like security, classifications, data dictionaries, and crowdsourcing metadata from users. The key to successfully implementing tags is understanding your objectives and how Denodo's various tags best support your goals. With Denodo 8.0 update 20220728 and beyond, the vast majority of scenarios can be managed with VDP tags, particularly when column-level tags are required (e.g. security or data dictionary). VDP tags can then be synchronized with Data Catalog when necessary.
As for Data Catalog, use it to crowdsource metadata enrichment. Remember Data Catalog also offers creating categories so using them for business classification may be less confusing than using tags with two different purposes. Denodo’s Data Catalog may be the tool of choice for data stewards, especially if they do not have a specialized tool for data governance.
Denodo continues to invest in capabilities that will improve tag management and usability along with Data Catalog so be sure to check the Denodo Platform New Features Guide to keep up with the latest enhancements.
References
Global Security Policies and Tag Management
Global Security Policies and Tags Tutorial
Data Catalog Categories Management
Denodo Platform - Subscription Bundles
Denodo Platform New Features Guide
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.