USER MANUALS

Text Functions

Text processing functions manipulate and transform values of type text.

Usually input values of types that are not text, are automatically converted to text values before applying the function.

The text functions are:

ASCII

Description

The ASCII function returns the Unicode code point of the first character of the input value.

Syntax

ASCII( <value:text> ):int
  • value. Required.

Example

SELECT ASCII('Denodo') AS value1, ASCII('興味深い例') AS value2

value1

value2

68

33288

BASE64_TO_HEX

Description

The BASE64_TO_HEX function converts a String from base64 to hexadecimal(hex) or base16.

Syntax

BASE64_TO_HEX( <value:string> ):string
  • value. Required. String which you want to convert.

Example

SELECT BASE64_TO_HEX(base64_string) as
hex_string
FROM my_table;

base64_string

hex_string

VGhlIHJhaW4gaW4gc3BhaW4gaXMgYWx3YXlzIGluIHRoZSBwbGFpbg==

546865207261696e20696e20737061696e20697320616c7761797320696e2074686520706c61696e

CHAR

Description

The CHAR function returns the character associated to a Unicode code point.

When the input code point is higher than 255 and the function is pushed down to Impala or Hive, the function returns NULL. That is because they do not support code points higher than 255.

Syntax

CHAR( <code:int> ):text
  • code. Required. Unicode code point of the character you want to obtain.

Example

SELECT CHAR(65) AS char_1, CHAR(945) AS char_alpha;

char_1

char_alpha

A

α

CONCAT

Description

The CONCAT function concatenates two or more text values.

Syntax

CONCAT( <value 1:text>, <value 2:text> [, <value N:text> ]* ):text
  • value 1. Required. The first text item to be concatenated.

  • value 2. Required. The second text item to be concatenated.

  • value N. Optional. One or more arguments to be concatenated.

Example

SELECT original_text, CONCAT('I like to fly to ', originalText, ' every
month') as concat_text
FROM my_table;

original_text

concat_text

San Francisco, CA

I like to fly to San Francisco, CA every month

San Jose, CA

I like to fly to San Jose, CA every month

Birmingham, AL

I like to fly to Birmingham, AL every month

NY, NY

I like to fly to NY, NY every month

CONCATLIST

Description

The CONCATLIST function returns one value with all the values of an array. Each value is separated by a character.

Syntax

CONCATLIST( [ <separator:text> ,] <value:array> ):text
  • separator (optional): character that separates each value. If not present, the separator is an space character.

  • value (required): array whose values will be concatenated.

Examples

Example 1

SELECT CONCATLIST( { ROW ('1', '2') } )
FROM my_table;

concatlist

1 2

Example 2

SELECT CONCATLIST(',', { ROW ('1', '2') })
FROM my_table;

concatlist

1,2

DELETESPACES

Description

The DELETESPACES function removes all the spaces of a string.

Syntax

DELETESPACES( <value:text>):text
  • value. Required. The text to remove all spaces.

Example

SELECT DELETESPACES(originalText) as deletespaces_text
FROM my_table;

original_text

deletespaces_text

I like reading

Ilikereading

HEX_TO_BASE64

Description

The HEX_TO_BASE64 function converts a String from hexadecimal(hex) or base16 to base64.

Syntax

HEX_TO_BASE64( <value:text> ):text
  • value. Required. String which you want to convert.

Example

SELECT HEX_TO_BASE64(hex_string) as base64_string FROM my_table;

hex_string

base64_string

546865207261696e20696e20737061696e20697320616c7761797320696e2074686520706c61696e

VGhlIHJhaW4gaW4gc3BhaW4gaXMgYWx3YXlzIGluIHRoZSBwbGFpbg==

INSTR

Description

The INSTR function returns the index of a string within another string.

Syntax

INSTR( <str1:text>, <str2:text> ):int

Returns the index of the first character of the first occurrence of str2 within str1.

The index of the first character is 0.

If str1 is NULL, the function returns NULL.

If str2 is not present within str1, it returns -1.

This function is case-sensitive when executed by Virtual DataPort. However, when it is pushed-down to the database, the behavior may be different. There are databases where this function is case-sensitive and others where is case-insensitive.

Example

SELECT original_text, INSTR(originalText, 'i') as result
FROM myTable;

original_text

result

San Francisco, CA

9

San Jose, CA

-1

Birmingham, AL

1

NY, NY

-1

LEN

Description

The LEN function returns the number of characters in a text string

Syntax

LEN( <value:text> ):int
  • value. Required. The text whose length you want to find. Spaces count as characters.

Example

SELECT original_text, LEN(originalText) as len_text
FROM myTable;

original_text

len_text

San Francisco, CA

18

San Jose, CA

13

Birmingham, AL

15

NY, NY

7

LOWER

Description

The LOWER function converts text to lowercase.

Syntax

LOWER( <value:text> ):text
  • value. Required. Text to convert to lowercase.

Example

SELECT original_text, LOWER(originalText) as lower_text
FROM Mytable;

original_text

lower_text

San Francisco, CA

san francisco, ca

San Jose, CA

san jose, ca

Birmingham, AL

birmingham, al

NY, NY

ny, ny

LTRIM

Description

The LTRIM function returns the input value, without its leading white spaces and carriage returns.

Syntax

LTRIM( <value:text> ):text
  • value. Required.

MAX

Description

The MAX function returns the lexicographically greatest argument of the list. The function compares the Unicode value of each character of the input values.

This function is case sensitive.

Syntax

MAX( <value 1:text>, <value 2:text> [, <value N:text> ]* ):text
  • value 1. Required.

  • value 2. Required.

  • value N. Optional. One or more arguments.

Examples

Example 1

SELECT MAX('DENODO', 'Virtual DataPort')
FROM Dual();

max

Virtual DataPort

Example 2

SELECT MAX('denodo', 'Virtual DataPort')
FROM Dual();

max

denodo

In this example, the result is “denodo” because the first letter has the highest Unicode value: d = 100 and V = 86.

MIN

Description

The MIN function returns the lexicographically lowest argument of the list. The function compares the Unicode value of each character of the input values.

This function is case sensitive.

Syntax

MIN( <value 1:text>, <value 2:text> [, <value N:text> ]*): text
  • value 1. Required.

  • value 2. Required.

  • value N. Optional.

Examples

Example 1

SELECT MIN('Data Catalog', 'Virtual DataPort', 'Solution Manager')
FROM Dual();

min

Data Catalog

Example 2

SELECT MIN('i', 'v', 'a')
FROM Dual();

min

a

In this example, the result is “a” because the first letter has the lowest Unicode value: a = 97, i = 105 and v = 118.

POSITION

Description

The POSITION function returns the first position, if any, at which one string (value1) occurs within another (value2).

Syntax

POSITION( <value 1:text> IN <value 2:text> ) : int
  • value 1. Text you want to search in value2.

  • value 2.

If value 1 or value 2 are NULL, the function returns NULL.

If the length of value 1 is zero, the function returns one.

If value 1 does not occur in value 2, the function returns zero.

Example

SELECT POSITION('no' IN 'Denodo') AS pos_1, POSITION('z' IN 'Denodo') AS
pos_2

pos_1

pos_2

3

0

PRINTF

Description

The PRINTF function performs a String.format(…) formatting operation just for values of type double.

Syntax

PRINTF( [ <locale:text>, ] <value 2:text>, <value 3:double> ): text
  • locale (optional): Use this to get locale-specific formatting of numbers (e.g. en_US, es_ES, …)

  • value 2. Required. A valid Java format expression. Find more information in the Java documentation.

  • value 3. Required. Double value.

Examples

Example 1

SELECT PRINTF('Item price is %f', 0.1)
FROM Dual();

printf

Item price is 0.100000

Example 2

SELECT PRINTF('es_ES', 'Item price is %f', 0.1)
FROM Dual();

printf

Item price is 0,100000

PROPERCASE

Description

The PROPERCASE function returns the first letter capitalized and the rest in lower case.

Syntax

PROPERCASE( <value:text> ):text
  • value. Required.

Examples

Example 1

SELECT PROPERCASE('CALIFORNIA')
FROM Dual();

propercase

California

Example 2

SELECT PROPERCASE('cALIFORNIA')
FROM Dual();

propercase

California

REGEXP

Description

The REGEXP function replaces each substring of the input string that matches the given regular expression, with the given replacement.

If you simply want to replace a string with another string, we suggest you use the function REPLACE instead. That is because REPLACE interprets the input parameters as literals and in REGEXP, the second and third parameters are regular expressions. If you are unfamiliar with regular expressions, you may end up with an unexpected result because some characters have special meaning. For example, in a regular expression . matches any character, not just the dot character.

Syntax

REGEXP( <original text:text>, <regex:text>, <replacement:text> ):text
  • original text. Required. Input string.

  • regex. Required. Regular expression to which original text is matched.

  • replacement. Required. Each match of regular expression will be replaced by this. This value is also a regular expression so you can specify capturing groups.

If any of these parameters is null, the function returns null.

regex is a regular expression so you can pass any text value but take into account that some characters will have special meaning. E.g. . represents any character not just the dot, \d represents a digit, etc.

This function follows the behavior of the Java regular expressions. Find more information in the Java documentation about regular expressions. This function is equivalent to the following Java code:

originalText.replaceAll(regularExpression, replacement);

The characters ^ and $ only match at the beginning and the end, respectively, of the input value. To detect line terminators, use \n.

Examples

Example 1

Replacing the character “#” with “*”.

SELECT REGEXP('########## DATABASE ##########', '#', '*') AS result
FROM Dual();

result

****** DATABASE ******

Example 2

SELECT REGEXP('#.#.#.#.#. DATABASE #.#.#.#.#.', '.', '#') AS result
FROM Dual();

result

##############################

In this example, all the characters are replaced with “#” because in a regular expression, the character . matches any character.

Example 3

Regular expression with the character class “\d”, which represents digits, and a capturing group:

SELECT REGEXP('NUMBER: 3829022', '(?i)number: (\d+)', 'Value: $1') AS result
FROM Dual();

result

Value: 3829022

Note the following things:

  • The second parameter starts with (?i). This turns on the case-insensitive matching. That is why number... matches NUMBER: .... Without (?i), the expression would not match.

  • In the third parameter, the $1 represents the first capturing group. That is, the (\d+) of the second parameter.

REGEXP_COUNT

Description

The REGEXP_COUNT function returns the number of times a pattern (a regular expression) occurs in a string.

Syntax

REGEXP_COUNT(String originalText, String regex): int
  • original text. Required. Input string.

  • regex. Required. Regular expression to which original text is matched in order to count the number of matches with the same.

If any of these parameters is null, the function returns null. By default, the matching with the regular expression is case sensitive.

regex is a regular expression so you can pass any text value but take into account that some characters will have special meaning. E.g. . represents any character not just the dot, \d represents a digit, etc.

This function follows the behavior of the Java regular expressions. Find more information in the Java documentation about regular expressions. This function is equivalent to the following Java code:

Examples

Example 1

If you want to obtain results that take the caseInsensitive into account, you must define it in the pattern

SELECT REGEXP_COUNT('SSSs', '(?i)s') AS result
FROM Dual();

result

Value: 4

Example 2

We count how many ‘a’ are in the originalText field

SELECT REGEXP_COUNT('DAtabase', 'a') AS result
FROM Dual();

result

Value: 2

REMOVEACCENTS

Description

The REMOVEACCENTS function replaces all characters with an accent with the same characters without accent.

Syntax

REMOVEACCENTS( <value:text> ):text
  • value. Required. Text you want to remove accents from.

Example

SELECT REMOVEACCENTS('bё áéíóú àèìòù') as text_without_accent
FROM Dual();

text_without_accent

Bё aeiou aeiou

REPEAT

Description

The REPEAT function repeats a text a given number of times.

Syntax

REPEAT ( <value:text>, <count:int> ):text
  • value. Required. The text you want to repeat

  • count. Required. Number of times value will be repeated. If 0 or less than 0, the function returns an empty string.

Example

SELECT REPEAT('Denodo ', 3), REPEAT('Platform', 0)

repeat

repeat_

Denodo Denodo Denodo

REPLACE

Description

The REPLACE function replaces all the occurrences of a specified string with another string.

Syntax

REPLACE( <value:text>, <from:text>, <to:text> ):text
  • value. Required. Text which you want to replace some/all of it.

  • from. Required. All occurrences to be replaced.

  • to. Required. Text which will replace all the occurrences of from.

This function returns null if value is null or from is an empty string. If from or to is null, the function returns value.

Example

SELECT original_text, REPLACE(originalText, 'CA', 'California') as
replace_text
FROM my_table;

original_text

replace_text

San Francisco, CA

San Francisco, California

San Jose, CA

San Jose, California

Birmingham, AL

Birmingham, AL

NY, NY

NY, NY

REPLACEMAP

Description

The REPLACEMAP function, whose input parameters are a text value and a list of key-value pairs, replaces all the occurrences of each key in the text, with its value.

The list of key-value pairs can be obtained from a view or a map (see section Defining a Map):

  • If the list is obtained from a view, the keys are obtained from one of the fields of the view and the values, from another.

  • If the list is obtained from a map, consider this:

    • Duplicated entries in the map are discarded when the map is created.

    • When the values of the map does not contain any key, the order of the keys does not matter. Otherwise, the replacements are performed iteratively, following the natural order of the keys.

Syntax 1

REPLACEMAP( <search text:text>, <map_name:text> ):text
  • search_text. Required. Text which you want to replace some/all of it.

  • map_name. Required. Name of the map that contains the key/value pairs.

If map_name does not exist, the function returns NULL.

Syntax 2

REPLACEMAP( <search_text:text>, <database name.view name:text>, <keyField:text>, <valueField:text> ):text
  • searchText. Required. Text which you want to replace some/all of it.

  • database name.view name. Required. View that contains the key/value pairs. For example, customer360.Report Timeline (“customer360” is the database of the view “Report Timeline”; note the . between the database name and the view name).

  • keyField. Required. The field from view_name which contains the keys.

  • valueField. Required. The field from view_name which contains the values.

If viewName does not exist, the function returns NULL.

If keyField or valueField are not fields of the view, the function returns NULL.

Examples

Example 1

Consider the following map:

Now consider the following query:
CREATE MAP simple "daysOfTheWeek" (
    'Sun' = 'Sunday'
    'Mon' = 'Monday'
    'Tus' = 'Tuesday'
    'Wed' = 'Wednesday'
    'Thur'= 'Thursday'
    'Fri' = 'Friday'
    'Sat' = 'Saturday'
);
SELECT text_block
    , replacemap (textblock, 'daysOfTheWeek') as text_block_with_full_name
FROM V;

text_block

text_block_with_full_name

I like to travel on Sun

I like to travel on Sunday

I am available to travel on Mon

I am available to travel on Monday

My best day of vacation is Sat because I see my relatives on Wed

My best day of vacation is Saturday because I see my relatives on Wednesday

The third row contains two keys of the map (“Sat” and “Wed”) and they are both replaced by the value of these keys in the map.

In the CREATE MAP statement, we have surrounded the name of the map with the double quotes. Otherwise, the map, or any other element, is created in lowercase. I.e. CREATE MAP daysOfTheWeek... creates the map daysoftheweek.

The second parameter of the function has to be the name of the map with the same case it was created. I.e., if you have created the map with the statement CREATE MAP daysOfTheWeek..., the second parameter of REPLACEMAP has to be daysoftheweek (in lowercase)

The section Element Names (Identifiers) gives more details about the identifiers of elements in Virtual DataPort.

Example 2

Consider the view days_of_the_week:

full_day_name

abbreviated_format

Sunday

Sun

Monday

Mon

Tuestday

Tus

Wednesday

Wed

Thursday

Thur

Friday

Fri

Saturday

Sat

Now consider the following query:

-- In this example, "customer360" is the database that has the view "days_of_the_week".

SELECT text_block,
replacemap (text_block, 'customer360.days_of_the_week', 'abbreviated_format',
'full_day_name') AS text_block_with_full_name
FROM V;

text_block

text_block_with_full_name

I like to travel on Sun

I like to travel on Sunday

I am available to travel on Mon

I am available to travel on Monday

My best day of vacation is Sat because I see my relatives on Wed

My best day of vacation is Saturday because I see my relatives on Wednesday

RTRIM

Description

The RTRIM function returns the input value, without its trailing white spaces and carriage returns.

Syntax

RTRIM( <value:text> ):text
  • value. Required.

SIMILARITY

Description

The SIMILARITY function calculates the textual similarity between two text strings based on a given textual similarity algorithm.

Syntax

SIMILARITY( <value 1:text>, <value 2:text> [ , <algorithm:text> ]):double
  • value 1. Required. Text to be compared.

  • value 2. Required. Text to be compared with value1.

  • algorithm. Optional. Algorithm to use. Virtual DataPort provides the following textual similarity algorithms (“JaroWinklerTFIDF” by default):

Algorithms Based on Distance Between Text Strings

Algorithms Based on the Appearance of Common Terms in the Texts

Combinations of Both

ScaledLevenshtein

TFIDF

JaroWinklerTFIDF

JaroWinkler

Jaccard

Jaro

UnsmoothedJS

Level2 Jaro

MongeElkan

Level2MongeElkan

Example

SELECT city, SIMILARITY(city , 'San') as similarity
FROM V
ORDER BY similarity DESC

city

similarity

San Jose

0.71

San Francisco

0.71

NY

0.00

Birmingham

0.00

SPLIT

Description

The SPLIT function splits strings around matches of a given regular expression and returns an array containing these substrings.

The results do not contain the regular expression.

Syntax

SPLIT( <regexp:text>, <value:text> ):array
  • regexp. Required. A regular expression. The substrings that match this regular expression are not included in the result.

  • value. Required. Field name or text to split.

Examples

Consider the following view V:

a

b

10.10

I am some text

-80.10

Text is $% needed always

20.50

Text for a living

NULL

NULL

Example 1

SELECT SPLIT('0', a), SPLIT('Text\s+\w+', b)
FROM V

split

split_1

Array { { 1 } { .1 } }

Array { I am some text }

Array { { -8 } { .1 } }

Array { , $% needed always}

Array { { 2 } { .5 } }

Array { , a living }

NULL

NULL

The regular expression Text\s+\w+ captures the word “Text” and the word next to it.

Example 2

SELECT split(' ', B)
FROM V

SPLIT

Array { { I } { am} { some} { text} }

Array { { Text } { is } { $% } { needed } { always } }

Array { { Text } { for } { a } { living } }

NULL

Example 3

SELECT split('\.', A)
FROM V

SPLIT

Array { { 10 } { 10 } }

Array { { -80 } { 10 } }

Array { { 20 } { 50 } }

NULL

The regular expression “\\.” captures the character dot. That is because it has been escaped with the character \. Otherwise, . matches any character.

SUBSTRING / SUBSTR

Description

The SUBSTRING and SUBSTR functions return a substring of an input string.

Note that the results of Syntax 1 and Syntax 2 are not equivalent. When this function is used with Syntax 2, it behaves as specified in the standard SQL-92.

Syntax 1

SUBSTRING( <value:text>, <start index:int> [, <end index:int> ]):text
  • value. Required. Text string containing the characters to extract.

  • start index. Required. Index of the first character of the new substring. The index of the first character of the input string is 0.

  • end index. Optional. Index of the last character.

With this syntax, the function returns a substring that begins at start index of the input string.

If start index is a negative value, the function returns a substring that begins at the index 0.

If end index is not present, the result goes from start index to the end of the input value.

If end index is present, the result goes from start index and extends to the character at index endIndex-1. Thus the length of the result is endIndex-startIndex.

The function returns an empty string if one of the following conditions are met:

  • start index and end index are equal.

  • start index is greater than the length of value.

The function returns NULL if at least one of the following conditions are met:

  • Any of the parameters are NULL.

  • start index is greater than end index.

Syntax 2

SUBSTRING( <value:text> FROM <start index:int> [ FOR <length:int> ] ):text

SUBSTR( <value:text> FROM <start index:int> [ FOR <length:int> ] ):text

SUBSTR( <value:text>, <start index:int> [, <length:int> ] ):text

These three ways of invoking the function behave exactly in the same way.

The behavior of this syntax is the defined for the function SUBSTRING in the standard SQL-92:

  • The function returns a substring that begins at start index of the input string (with this syntax, the index of the first character is 1 while in Syntax 1, is 0)

  • If length is not present, the substring extends to the end of the input value.

  • If length is present, the substring has the length indicated by this value or shorter, if the length of input string is lower than start_index + length.

  • If at least one of the parameters is NULL, the function returns NULL.

  • If length is present and, start index or length are negative values, the following formula specifies the result: the function will return the L characters of value starting at the character C being:

    • L = Minimum (start index + length, length of value + 1) - Maximum (start_index, 1)

    • C = the larger of start index and 1.

  • If length is not present and start index is a negative value, the function will return value.

Example of Syntax 1

SELECT city, SUBSTRING(city, 1), SUBSTRING(city, 1, 5)
FROM locations

city

substring

substring_1

San Jose

an Jose

an J

San Francisco

an Francisco

an F

Birmingham

irmingham

irmi

NY

Y

Y

Example of Syntax 2

SELECT city, SUBSTRING(city FROM 2), SUBSTRING(city FROM 3 FOR 5)
FROM locations

city

substring

substring_1

San Jose

an Jose

n Jos

San Francisco

an Francisco

n Fra

Birmingham

irmingham

rming

NY

Y

<empty string>

Example of Syntax 2

SELECT SUBSTRING ('Denodo' from -2 for 4) AS f

f

D

TRIM

Description

The TRIM function returns the input string without its leading and/or trailing pad characters. By default, the pad characters to remove are the whitespace and the carriage return. But you can indicate a different character.

LTRIM (see section LTRIM) only removes the leading white spaces and carriage returns.

RTRIM (see section RTRIM) only removes the trailing white spaces and carriage returns.

Syntax 1

With the first one, the characters removed are the spaces and carriage returns.

With the second one, by default the function removes the spaces but not the carriage returns.

Syntax 2

TRIM ( <value:text> )
  • value. Required. Text from which you want to remove the spaces and carriage returns.

Syntax

TRIM ( [ <trim specification> [ <trim character:text> ] FROM ] <value:text> )

<trim specification> ::=
    LEADING
  | TRAILING
  | BOTH
  • value. Required. Text from which you want to remove the pad character. By default, the space character

    If this parameter is NULL, the function returns NULL.

  • LEADING/TRAILING/BOTH. Optional.

    • LEADING: removes the pad character from the beginning of the input value.

    • TRAILING: removes the pad character from the end of the input value.

    • BOTH: removes the pad character from the beginning and end of the input value.

    Not adding this token is equivalent to adding the token BOTH.

  • trim character. Optional. The function will remove this character from the leading/trailing value. If this parameter contains more than one character, only the first one is taken into account, the rest are ignored.

Examples

Example 1

Query that removes the white spaces and carriage returns from the beginning and end of the input value.

SELECT original_text, TRIM(originaltext) as trim_text
FROM mytable;

original_text

trim_text

San Francisco , CA

San Francisco , CA

San Jose , CA

San Jose , CA

Birmingham , AL

Birmingham , AL

NY, NY

NY, NY

Example 2

Query that removes the white spaces - but not the carriage returns - from the beginning of the value.

SELECT original_text, TRIM( LEADING FROM originaltext) as trim_text
FROM mytable;

original_text

trim_text

San Francisco , CA

San Francisco , CA

San Jose , CA

San Jose , CA

Birmingham , AL

Birmingham , AL

NY, NY

NY, NY

Example 3

Query that removes the character “c” from the end of the value.

SELECT original_text, TRIM( TRAILING 'c' FROM original_text ) as trim_text
FROM mytable;

original_text

trim_text

aaabbcbccc

aaabbcb

aaabbb

aaabbb

<null>

<null>

UPPER

Description

The UPPER function converts text to uppercase.

Syntax

UPPER( <value:text> ):text
  • value. Required. Text to convert to uppercase.

Example

SELECT original_text, UPPER(originalText) as upper_text
FROM Mytable;

original_text

upper_text

San Francisco , CA

SAN FRANCISCO , CA

San Jose , CA

SAN JOSE , CA

Birmingham , AL

BIRMINGHAM , AL

NY, NY

NY , NY

Add feedback