Anonymization in Denodo

Applies to: Denodo 8.0
Last modified on: 08 Feb 2022
Tags: Anonymization Best practices Encryption Pseudonymization Security Tokenization

Download document

You can translate the document:

What is anonymization? 

Data anonymization is the process of protecting private or sensitive information by erasing or encrypting identifiers that connect an individual to stored data.  Some techniques can be used to achieve this, like Encryption , Tokenization , Masking.

Encryption vs Tokenization

Tokenization and encryption are two ways for securing information – both while being transmitted and while at rest. But they are not the same thing and are not interchangeable.

Encryption 

  • Uses ‘secret keys’ to protect data.
  • Uses an algorithm to transform plain text information into a non-readable form called ciphertext.
  • Needs an encryption key to decrypt the information and return it to its original plain text form.
  • If a key is intercepted, it can be used to decrypt all of the data it was used to secure.

Tokenization

Converts an input string into a non meaningful string without any relation to the original, keeping the same format.

  • Uses ‘tokens’ to protect data.
  • Does not use a key or mathematical process.
  • Tokens serve as references or placeholders for the original data.
  • Uses a database, called a ‘token vault’ which stores the real data and the token.
  • If a token is intercepted, it cannot be used to guess the real values.
  • Primary benefit is ease of use, because you don’t have to manage encryption keys.

Difference from data masking

Data masking is a simple feature that has the intention to override the original content of an information. It’s simple, fast to process and impossible to “un-mask”, as the original values are removed and hard-coded with masked  information.

This can happen in a dynamic or static way.  Dynamic or in-flight masking changes the data logically in the view but does not physically change the source data. i.e. the last 4 digits of the phone number are still visible in the source.  Denodo typically does dynamic masking against production data.  

Static masking changes the actual source values. This is often used in non-production environments where you don't want developers and testers to see PII data, so SSN (social security #)/National ID and CCN (credit card #) will be physically changed to something else, but they still look like SSN and CCN numbers for testing purposes.  Test environments often don't have production level security and test data is frequently copied, so having real SSN's and CCN's would be a big security exposure in non-prod environments.

The main objective of masking information is to keep the original value unrecognizable, not allowing the user to obtain the original value. As example we can mention the masking of last phone number digits with an *, like +1(209)9988-****  This is also called redaction.

For example, the call center representative only sees the last 4-digits of your credit card number to verify the account but they do not have access to your entire number, so everything but the last 4 digits are redacted.

As masking does not keep a unique  value after adding a mask, if we have two phone numbers

+1(209)9988-2331

+1(209)9988-8896

Both will have the same masked or redacted value +1(209)9988-****.

Due to this characteristic of the data masking, masked fields that are used in operations like Distinct Counting, Group By and joins between two tables can generate wrong results, as masking is not the proper solution to keep the values unique or even recognizable.

When you need to have data masked, but keeping the original unique constraint, then anonymization is the technique to be applied, so you still warrant the original values are not identifiable, but that a unique value will remain unique, and now it can be used as key on joins, calculated fields and count distinct and group by will also work.


Techniques to Anonymize data in Denodo

Denodo provides different techniques for data anonymization. Here you can find examples of those techniques:

  • Hash Function: the Denodo Hash function will create a unique Character set based on the original value that was hashed (more information can be found here).

It can be done on a selection view or directly on a row restriction and assign the restriction to the users that need to see the value anonymized:

 

As a result, the users will see a set of emails Hashed as shown in the next screenshot

Another example of when to use the Hash function is when you have sensitive data that is also a join key.  If a bank account number is present in multiple tables and you do not want it exposed to users, use the hash function to de-identify or obfuscate the actual account number. Since the hash function always generates the same output for the same input, the hashed bank account value can be used to join tables and get correct results.

  • Encrypt Function: this function can be used to have more control over the algorithm used to encrypt the information using the Encrypt function, this is part of the Denodo Xtrafuncs, so you will need to download and install it into your Denodo Platform (more information can be found here)

For the encrypt function you can choose algorithms, providers, etc. But the easiest syntax to call it is encrypt(password, input) :

We can see the same values, but encrypted. The great advantage of using the encrypt function is that the value can be “decrypted” using the Decrypt function and the proper password. In addition it supports multiple algorithms and providers.

  • Integration with Third Party systems: Denodo supports the creation of Custom Functions that can be used to leverage Third Party Systems. In the case of Anonymization, it is possible to create custom functions to integrate with systems such as Protegrity, Voltage, Cosmian, etc.  The key value of these systems is that they provide mechanisms to manage and backup the keys.

As an example for Protegrity, it is possible to use Protegrity Java Application Protector SDK (AP Java) in a Denodo Custom Function to decrypt a value e.g. unprotect(value,value_type).  The custom function will determine the Denodo userid that is accessing the view and will pass that userid and the tokenized value to a Protegrity function. Protegrity will do a lookup for that userid and value and will return the de-tokenized value.

A view using the custom function can be created as follows:

CREATE VIEW CUSTOMER AS

   SELECT

    unprotect(NAME, 'de_Name') AS NAME,

    unprotect(SSN, 'de_Number') AS SSN

  FROM CUSTOMER_TABLE

When executing the view with an authorized user, the ssn field will show its de-tokenized value.

If the user has security restrictions, Protegrity will return the secured values to the custom  function and output of the view.

 

As we saw above, we implemented a third-party feature using a custom function. Although this example uses Protegrity, other suppliers like Voltage or Cosmian provide similar functionality that can be leveraged in the same way.

Things to consider when using anonymization

There are some things to consider when using anonymization in Denodo:

  • Performance
  • Both methods will anonymize the information in memory, so it requires a lot of resources especially if the result set has a lot of rows.
  • The Hash function has better performance as it uses a standard algorithm.
  • The Encrypt function has poor performance as it has to use a custom password, algorithm and provider.
  • Caching
  • Consider caching a content already encrypted or hashed, so it will save a lot of processing time and will result in performance improvement.

Questions

Ask a question
You must sign in to ask a question. If you do not have an account, you can register here

Featured content

DENODO TRAINING

Ready for more? Great! We offer a comprehensive set of training courses, taught by our technical instructors in small, private groups for getting a full, in-depth guided training in the usage of the Denodo Platform. Check out our training courses.

Training