Purpose of tagging
Let's start with this: what are tags and why do we tag data?
Tags are labels or metadata to associate and categorize data elements so they can easily be retrieved for browsing, searching and managing data. Tagging can be useful for a wide variety of reasons:
- Classify content based on some grouping: #corporate, #sales, #industry, #partner, etc.
- Data quality labels: #gold, #silver, #bronze, etc.
- Data dictionary to relate a field or view with a specific business definition: #name, #ssn, #address, #phone, #customer, etc.
Security, Data Privacy and Data Protection
- Security, sometimes it's an extension of the above (security can depend on classifications or business terms), but often has its own system. Think for example of classified information: #unclassified, #secret, #topsecret, etc.
- Regulatory requirements will drive organizations to have additional data protection policies to safeguard sensitive data and ensure data access by only authorized users: #PII, #PHI, #GDPR, etc.
- Version or release management, similar to tags in a VCS. Imagine things labeled as #version3.4, or #june2022.
And many others, for example:
- Crowdsourcing by users (ratings, warnings, deprecation, etc.).
- Master Data Management domains (customer data, product data, reference data, etc.).
Each organization will have their own policies, and in some cases, they may be using tags for more than one purpose. Many organizations will use the three approaches above, while others will focus on one aspect of tagging.
This is another important question. We may have different personas in charge of tagging, especially if tagging is used with multiple purposes.
- Sometimes tagging may come from the developers.
- More often though, tagging is coming from a specialized profile, like the data steward, or a security manager.
- In some cases, tags may be something that is crowdsourced to the BI community.
With what tool?
Managing tags can be done from the Design Studio, VDP Admin Tool or Data Catalog. Different users will use different tools depending on their role.
- Developers of Denodo will be most familiar with the Design Studio/VDP Admin Tool.
- Security teams will be more familiar with Design Studio as it's the tool where security controls are created and managed.
- Data Stewards will tend to use data governance tools, like Collibra. In the absence of 3rd party governance tools, Data Stewards with an IT focus will use the Design Studio and Data Catalog. Data Stewards with a business focus will primarily use the Data Catalog.
- BI users use the Data Catalog as it is the interface they are already using.
What is the lifecycle of tags?
Tags may be created directly in production (that would be the case, for example, of crowdsourced tagging) but they may also go through a validation process that requires Q&A. This would be the expectation with security tags. In that case, they will need to be created in a lower environment, validated, and migrated as part of a revision, just like other metadata.
Virtual Data Port (VDP) Tags
- VDP tags are supported at both the view and column levels.
- There are specific roles, both for Data Catalog and VDP, that control what users can create and assign tags.
- Global Security Policies allow defining security restrictions and are created together with VDP tags (e.g., PII, PHI, Confidential data). Global security policies are easier to manage than view restrictions (Row Restrictions and Column Privileges) because you have the advantage of assigning a global policy on multiple views/columns at the same time. Global Security Policies are only available with Denodo Enterprise Plus licenses.
- External tags can be imported from Collibra or other 3rd Party data catalogs.
Data Catalog Tags
- Tags created in the Data Catalog apply only to views and web services. They are meant for data classification, searching, filtering and for gathering crowdsourced input from the user community.
- Data Catalog tags are meant to be in production only. They are not available for migrations in Solution Manager (although basic export/import is possible).
- VDP tags can be imported and synchronized selectively with the Data Catalog in order to expose them to business users. For instance, you may not want to expose security tags like #classified, #top_secret, but others like #name, #ssn or #july2022 would make sense in the Data Catalog.
- VDP tags in the Data Catalog appear as read only. Bear in mind that since they can be used for security, making a change in the Data Catalog could have dramatic consequences.
- Synchronized VDP tags appear in the Data Catalog with the same capabilities as the native Data Catalog tags (e.g. can be used for filtering). In addition, they also appear in the schema tab of a view and for columns, and you can use them to search for columns that have specific tags also. For admin users in the Data Catalog, they will look different (with a VDP logo), so they can be distinguished. But for end users, they behave in exactly the same way.
- Synchronizing VDP tags into the Data Catalog is only available with Denodo Enterprise Plus licenses.
- Using the same approach described above to import from 3rd party data catalogs, tags in Data Catalog can also be imported into VDP. This can be useful for example to migrate existing tags with the purpose of using them for security, or if you prefer to manage tags in VDP instead from here on.
Putting it all together
Tags are a concept with broad uses across different personas for different scenarios like security, classifications, data dictionaries and crowdsourcing metadata from users. The key to successfully implementing tags is understanding your objectives and how Denodo's various tags best support your goals. With Denodo 8.0 update 20220728 and beyond, the vast majority of scenarios can be managed with VDP tags, particularly when column level tags are required (e.g. security, data dictionary). VDP tags then can be synchronized with the Data Catalog when necessary.
As for the Data Catalog, use it to crowdsource metadata enrichment. Remember the Data Catalog also offers creating Categories, so using them for business classification may be less confusing than using tags with two different purposes. The Data Catalog may be the tool of choice for data stewards, especially if they do not have a specialized tool for data governance.
Denodo continues to invest in capabilities that will improve tag management and usability along with the Data Catalog, so be sure to check the Denodo Platform New Features Guide to keep current with new enhancements.