You can translate the question and the replies:

Data Cleansing & Scoring

Hi, Can DENODO do data cleansing, like for example identifying if records are duplicates or identical and provides a "Similarity Score" based on the rules that you designed? (Please table below) Customer Number Full Name Customer Number (Checking) Similarity Score xxxxxxx1 John Doe xxxxxxx1 100% xxxxxxx2 John Does xxxxxxx1 98% Thank you.
user
13-06-2019 02:38:48 -0400

3 Answers

Hi, In Denodo Platform, I am able to obatin the similarity score of the records in a view by using the "Similarty" Text Processing Function. This function calculates the textual similarity between defined two text strings based on a given textual similarity algorithm. For more details, you can refer to [SIMILARITY](https://community.denodo.com/docs/html/browse/7.0/vdp/vql/appendix/syntax_of_condition_functions/text_processing_functions#similarity) function which explain in detail with an example. Hope this hepls!
Denodo Team
13-06-2019 08:02:38 -0400
Hi, Thank you for introducing the SIMILARITY function. Based on what I see on the provided example on the documentation, it hard coded a text, to specifically compare all the values on a column. Just a quick one, will this also work in comparing each of your data on your base view? Like the example that I provided on my first question, meaning can it be dynamic? Thank you.
user
14-06-2019 03:55:04 -0400
Hi, I am able to pass the field names under the SIMILARITY function instead of hard coding the text values. By which SIMILARITY function dynamically compare the fields and generate the scoring in the output. For Example: `select similarity(<field_name1>,<field_name2>) from <view_name>` Hope this hepls!
Denodo Team
25-06-2019 02:53:18 -0400
You must sign in to add an answer. If you do not have an account, you can register here