Data anonymization is the process of protecting private or sensitive information by erasing or encrypting identifiers that connect an individual to stored data. Some techniques can be used to achieve this, like Encryption , Tokenization , Masking.
Tokenization and encryption are two ways for securing information – both while being transmitted and while at rest. But they are not the same thing and are not interchangeable.
Converts an input string into a non meaningful string without any relation to the original, keeping the same format.
Data masking is a simple feature that has the intention to override the original content of an information. It’s simple, fast to process and impossible to “un-mask”, as the original values are removed and hard-coded with masked information.
This can happen in a dynamic or static way. Dynamic or in-flight masking changes the data logically in the view but does not physically change the source data. i.e. the last 4 digits of the phone number are still visible in the source. Denodo typically does dynamic masking against production data.
Static masking changes the actual source values. This is often used in non-production environments where you don't want developers and testers to see PII data, so SSN (social security #)/National ID and CCN (credit card #) will be physically changed to something else, but they still look like SSN and CCN numbers for testing purposes. Test environments often don't have production level security and test data is frequently copied, so having real SSN's and CCN's would be a big security exposure in non-prod environments.
The main objective of masking information is to keep the original value unrecognizable, not allowing the user to obtain the original value. As example we can mention the masking of last phone number digits with an *, like +1(209)9988-**** This is also called redaction.
For example, the call center representative only sees the last 4-digits of your credit card number to verify the account but they do not have access to your entire number, so everything but the last 4 digits are redacted.
As masking does not keep a unique value after adding a mask, if we have two phone numbers
Both will have the same masked or redacted value +1(209)9988-****.
Due to this characteristic of the data masking, masked fields that are used in operations like Distinct Counting, Group By and joins between two tables can generate wrong results, as masking is not the proper solution to keep the values unique or even recognizable.
When you need to have data masked, but keeping the original unique constraint, then anonymization is the technique to be applied, so you still warrant the original values are not identifiable, but that a unique value will remain unique, and now it can be used as key on joins, calculated fields and count distinct and group by will also work.
Denodo provides different techniques for data anonymization. Here you can find examples of those techniques:
It can be done on a selection view or directly on a row restriction and assign the restriction to the users that need to see the value anonymized:
As a result, the users will see a set of emails Hashed as shown in the next screenshot
Another example of when to use the Hash function is when you have sensitive data that is also a join key. If a bank account number is present in multiple tables and you do not want it exposed to users, use the hash function to de-identify or obfuscate the actual account number. Since the hash function always generates the same output for the same input, the hashed bank account value can be used to join tables and get correct results.
For the encrypt function you can choose algorithms, providers, etc. But the easiest syntax to call it is encrypt(password, input) :
We can see the same values, but encrypted. The great advantage of using the encrypt function is that the value can be “decrypted” using the Decrypt function and the proper password. In addition it supports multiple algorithms and providers.
As an example for Protegrity, it is possible to use Protegrity Java Application Protector SDK (AP Java) in a Denodo Custom Function to decrypt a value e.g. unprotect(value,value_type). The custom function will determine the Denodo userid that is accessing the view and will pass that userid and the tokenized value to a Protegrity function. Protegrity will do a lookup for that userid and value and will return the de-tokenized value.
A view using the custom function can be created as follows:
CREATE VIEW CUSTOMER AS
unprotect(NAME, 'de_Name') AS NAME,
unprotect(SSN, 'de_Number') AS SSN
When executing the view with an authorized user, the ssn field will show its de-tokenized value.
If the user has security restrictions, Protegrity will return the secured values to the custom function and output of the view.
As we saw above, we implemented a third-party feature using a custom function. Although this example uses Protegrity, other suppliers like Voltage or Cosmian provide similar functionality that can be leveraged in the same way.
There are some things to consider when using anonymization in Denodo: