Aggregation Functions¶
Aggregate functions return a single result row for each group of rows of the result. Usually, you use aggregation function in the SELECT
clause of queries that also have the GROUP BY
clause. In these queries, the Execution Engine splits the rows in groups and then, applies the aggregation function over each group. The function returns one row for each group. If the query does not have the GROUP BY
clause, the Execution Engine applies the aggregation functions over all the rows and returns one row.
In many aggregation functions, the input parameters can
be preceded by one of these modifiers: ALL
or DISTINCT
. These
modifiers affect the semantics of certain aggregation functions,
applying them to all tuples in a group or only to those with a different
value for the field in question.
The aggregation functions are:
AVG¶
Description
The AVG
function returns the average of the non-null values of a
field of the table.
If all the values of the field are NULL
, the function returns
NULL
.
Syntax
AVG( <expression> ) : double
expression
. Required. The type of the expression has to beint
,long
,float
,double
ordecimal
.
Examples
Example 1
Consider the following view ITEMS
:
item |
price |
---|---|
A |
3.45 |
B |
9.99 |
C |
4.99 |
SELECT AVG(price) AS average_price
FROM ITEMS
average_price |
---|
6.1433333333333335 |
Example 2
Consider the following view ITEMS_2
:
item |
price |
---|---|
A |
3.45 |
B |
9.99 |
C |
4.99 |
D |
NULL |
SELECT AVG(price) AS average_price
FROM ITEMS_2
average_price |
---|
6.1433333333333335 |
COUNT¶
Description
The COUNT
function returns the number of tuples of the result of a
selection operation.
If the parameter is *
, it returns the number of tuples.
If the parameter is a field, it returns the number of non-null values
It is applicable to any type of field.
This function can be used in queries without a GROUP BY
clause, but
in that case, it can only be used with *
.
Syntax
COUNT( <count parameter> )
<count parameter> ::=
<field name:identifier>
| *
count parameter
. Required. It can be either a field name or*
.
Examples
Consider the following view ITEMS
:
item |
price |
---|---|
A |
3.45 |
B |
9.99 |
C |
4.99 |
D |
NULL |
Example 1
SELECT COUNT(*)
FROM items
count |
---|
4 |
Example 2
SELECT COUNT(price)
FROM items
count |
---|
3 |
FIRST¶
Description
The FIRST
function returns the first value of a field of each
group of values.
This function ignores the ALL
/DISTINCT
modifier.
Syntax
FIRST ( <field name:identifier> )
field name
. Required. A field of the view.
Examples
Consider the following view V
:
A |
B |
---|---|
group1 |
one |
group1 |
two |
group1 |
NULL |
group2 |
four |
Example 1
SELECT FIRST(b)
FROM v
first |
---|
one |
Example 2
SELECT a, FIRST(b)
FROM v
GROUP BY a
a |
first |
---|---|
group1 |
one |
group2 |
four |
GROUP_CONCAT¶
Description
The GROUP_CONCAT
function returns, for each group, a string with the
concatenation of all the field/fields values of each group.
Syntax
GROUP_CONCAT( [ DISTINCT | ALL ] <field name:identifier>)
GROUP_CONCAT( [ <row separator:text> [, <field separator:text> ] ],
<field name:identifier> [, <field name:identifier>]* )
GROUP_CONCAT( <ignore null:boolean> , <row separator:text>, <field separator:text>,
<field name:identifier> [, <field name:identifier>]* )
ignore null
. Optional. Iftrue
and any of the fieldsfield name
of a row areNULL
,GROUP_CONCAT
ignores all the fields of that row. Iffalse
, no rows are ignored andNULL
values are treated as empty characters. The default value istrue
.row separator
. Optional. Literal used to separate the values of each row. Default value:,
.field separator
. Optional. Literal used to separate the values of the fields of the same row. The default value is a whitespace.field name
. Required. Field which contains the values to concatenate.
Examples
Consider the following view V
:
a |
b |
c |
---|---|---|
group1 |
1 |
one |
group1 |
2 |
two |
group1 |
NULL |
three |
group2 |
4 |
four |
Example 1
SELECT A, GROUP_CONCAT(':', c)
FROM v
GROUP BY a
a |
group_concat |
---|---|
group1 |
one:two:three |
group2 |
four |
Example 2
SELECT A, GROUP_CONCAT(':', ';', b, c)
FROM v
GROUP BY a
a |
group_concat |
---|---|
group1 |
1;one:2;two |
group2 |
4:four |
As the field B
is NULL
in the third row, GROUP_CONCAT
ignores all the fields of that row. That is, it ignores the value
three
.
Example 3
SELECT a, GROUP_CONCAT(false, ':', ';', b, c)
FROM v
GROUP BY a
a |
group_concat |
---|---|
group1 |
1;one:2;two:;three |
group2 |
four |
As the parameter ignoreNulls
is false
, GROUP_CONCAT
does not
ignore the value of the field C
of the third row (three
),
even if the value of the field B
of that row is NULL
. In this
case, NULL
values are treated like empty characters.
LAST¶
Description
The LAST
function returns the last value of a field of each
group of values.
This function ignores the ALL
/DISTINCT
modifier.
Syntax
LAST ( <field name:identifier> )
field name
. Required. Field name of the view.
Examples
Consider the following view V
:
a |
b |
---|---|
group1 |
one |
group1 |
two |
group1 |
NULL |
group2 |
four |
Example 1
SELECT LAST(b)
FROM v
last |
---|
four |
Example 2
SELECT a, LAST(b)
FROM v
GROUP BY a
a |
last |
---|---|
group1 |
NULL |
group2 |
four |
LIST¶
Description
The LIST
function returns an array with all the values of a
specified field.
It has the same behavior as the function NEST
(section NEST) when
invoked with a single field as argument.
Syntax
LIST ( <field name:identifier> )
field name
. Required. Field of the view.
Examples
Consider the following view V
:
a |
b |
---|---|
group1 |
one |
group1 |
two |
group1 |
NULL |
group2 |
four |
Example 1
SELECT LIST(b)
FROM v
list |
---|
Array { one, two, NULL, four } |
Example 2
SELECT a, LIST(b)
FROM v
GROUP BY a
a |
list |
---|---|
group1 |
Array { one, two, NULL } |
group2 |
Array { four } |
MAX¶
Description
The MAX
function returns the highest value of a field for each
group of values.
This function ignores the ALL
/DISTINCT
modifier.
Syntax
MAX ( <expression> )
expression
. Required. Expression of typedate
(deprecated type),intervaldaysecond
,intervalyearmonth
,text
,time
,timestamp
,timestamptz
or any of the numeric data types.
The return type is the type of the input expression.
When the field is of text type, the function compares the Unicode value of each character of the value.
Examples
Consider the following view V
:
a |
b |
c |
---|---|---|
group1 |
1 |
Virtual DataPort |
group1 |
2 |
Data Catalog |
group2 |
4 |
denodo |
Example 1
SELECT MAX(b)
FROM v
max |
---|
4 |
Example 2
SELECT a, MAX(b)
FROM v
GROUP BY a
a |
max |
---|---|
group1 |
2 |
group2 |
4 |
Example 3
SELECT MAX(c)
FROM v
max |
---|
denodo |
In this example, the result is “denodo” because the first letter has the highest Unicode value of all the values: d = 100, V = 86, etc.
MEDIAN¶
Description
The MEDIAN
function returns the middle number of a field for each
group of values, taken as the average of the two middle numbers when the group has an even number of values.
This function does not take into account NULL
values to calculate
the result.
This function ignores the modifiers ALL
and DISTINCT
.
Syntax
MEDIAN ( <expression> )
expression
. Required. Expression of typedate
(deprecated type) ,time
,timestamp
,timestamptz
or any of the numeric data types.
Examples
Consider the following view employee
:
id |
first_name |
start_date |
salary |
---|---|---|---|
01 |
Jason |
25-JUL-96 |
1234.56 |
02 |
Alison |
21-MAR-76 |
6661.78 |
03 |
James |
12-DEC-78 |
6544.78 |
04 |
Celia |
24-OCT-82 |
2344.78 |
05 |
Robert |
24-OCT-82 |
2334.78 |
05 |
Jason |
30-JUL-87 |
2224.50 |
Example 1
SELECT MEDIAN(salary), MEDIAN(start_date)
FROM employee
median(salary) |
median(start_date) |
---|---|
2339.78 |
1983-06-05 |
Example 2
SELECT first_name, MEDIAN(salary), MEDIAN(start_date)
FROM employee
GROUP BY first_name
first_name |
median(salary) |
median(start_date) |
---|---|---|
Alison |
6661.78 |
1976-03-21 |
Celia |
2344.78 |
1982-10-24 |
James |
6544.78 |
1978-12-12 |
Jason |
1729.53 |
1992-01-26 |
Robert |
2334.78 |
1984-01-15 |
MIN¶
Description
The MIN
function returns the lowest value of a field for each
group of values.
This function ignores the ALL
/DISTINCT
modifier.
Syntax
MIN ( <expression> )
expression
. Required. Expression of typedate
(deprecated type),intervaldaysecond
,intervalyearmonth
,text
,time
,timestamp
,timestamptz
or any of the numeric data types.
When the field is of text type, the function compares the Unicode value of each character of the value.
Examples
Consider the following view V
:
a |
b |
c |
---|---|---|
group1 |
1 |
Virtual DataPort |
group1 |
2 |
Data Catalog |
group2 |
4 |
denodo |
Example 1
SELECT MIN(b)
FROM V
min |
---|
1 |
Example 2
SELECT a, MIN(b)
FROM v
GROUP BY a
a |
min |
---|---|
group1 |
1 |
group2 |
4 |
Example 3
SELECT MIN(c)
FROM v
min |
---|
Data Catalog |
In this example, the result is “Data Catalog” because the first letter has the lowest Unicode value of all the values: D = 68, d = 100, V = 86, etc.
NEST¶
Description
The NEST
function returns an array with the values of the selected
fields. Its result is inverse to the result of the FLATTEN views (see
section FLATTEN View (Flattening Data Structures) for more
information about FLATTEN views).
Syntax
NEST( <field name:identifier> [, <field name:identifier> ]*):array
NEST(*)
field name
. The name of a field. Using(*)
is equivalent to pass all the fields of the view to the function.
Example
Consider the view V
:
int_sample |
text_sample |
register_sample |
---|---|---|
1 |
A |
Register { hello , how’re you } |
1 |
B |
Register { hello, good bye } |
2 |
C |
Register { another string, last string } |
SELECT int_sample, NEST(text_sample, register_sample) AS nest_sample
FROM V
GROUP BY int_sample;
int_sample |
nest_sample |
---|---|
1 |
Array [ A, Register { hello , how’re you } B, Register { hello, good bye } ] |
2 |
Array [ C, Register { another string, last string } ] |
STDEV¶
Description
The STDEV
function returns the sample standard deviation of each
group of values.
This function does not take into account NULL
values to calculate
the result.
Syntax
STDEV( <expression> ) : decimal
expression
. Required. Any numeric data type.
Example
Consider the view sales_by_region
:
region |
state |
revenue |
---|---|---|
South |
Alabama |
53168 |
South |
Mississippi |
5681 |
South |
Tennessee |
80166 |
Northeast |
New York |
12945 |
Northeast |
Pennsylvania |
69284 |
Northeast |
New Hampshire |
53168 |
SELECT region, STDEV(revenue)
FROM revenue
GROUP BY region
region |
stdev |
---|---|
South |
37709.24377832752 |
Northeast |
29016.36924794922 |
STDEVP¶
Description
The STDEVP
function returns the population standard deviation of
each group of values.
This function does not take into account NULL
values to calculate
the result.
Syntax
STDEVP( <expression> ) : decimal
expression
. Required. Any numeric data type.
Example
Consider the view sales_by_region
:
region |
state |
revenue |
---|---|---|
South |
Alabama |
53168 |
South |
Mississippi |
5681 |
South |
Tennessee |
80166 |
Northeast |
New York |
12945 |
Northeast |
Pennsylvania |
69284 |
Northeast |
New Hampshire |
53168 |
SELECT region, STDEVP(revenue)
FROM revenue
GROUP BY region
region |
stdevp |
---|---|
South |
30789.46861437455 |
Northeast |
23691.76628188695 |
SUM¶
Description
The SUM
function returns the sum of all non-null values of a
field for each group of values.
Syntax
SUM ( <expression> )
expression
. Required. Any numeric data type.
The return type is the type of the input expression.
Examples
Consider the following view V
:
a |
b |
---|---|
group1 |
1 |
group1 |
2 |
group1 |
NULL |
group2 |
4 |
Example 1
SELECT SUM(b)
FROM v
sum |
---|
7 |
Example 2
SELECT a, SUM(b)
FROM v
GROUP BY a
a |
sum |
---|---|
group1 |
3 |
group2 |
4 |
VAR¶
Description
The VAR
function returns the sample variance of each group of
values.
This function does not take into account NULL
values to calculate
the result.
Syntax
VAR( <expression> ) : decimal
expression
. Required. Any numeric data type.
Example
Consider the view sales_by_region
:
region |
state |
revenue |
---|---|---|
South |
Alabama |
53168 |
South |
Mississippi |
5681 |
South |
Tennessee |
80166 |
Northeast |
New York |
12945 |
Northeast |
Pennsylvania |
69284 |
Northeast |
New Hampshire |
53168 |
SELECT region, VAR(revenue)
FROM revenue
GROUP BY region
region |
var |
---|---|
South |
1.4219870663333333E9 |
Northeast |
8.419496843333333E8 |
VARP¶
Description
The VARP
function returns the population variance of each group of
values.
This function does not take into account NULL
values to calculate
the result.
Syntax
VARP( <expression> ) : double
expression
. Required. Any numeric data type.
Example
Consider the view sales_by_region
:
region |
state |
revenue |
---|---|---|
South |
Alabama |
53168 |
South |
Mississippi |
5681 |
South |
Tennessee |
80166 |
Northeast |
New York |
12945 |
Northeast |
Pennsylvania |
69284 |
Northeast |
New Hampshire |
53168 |
SELECT region, VARP(revenue)
FROM revenue
GROUP BY region
region |
varp |
---|---|
South |
9.479913775555555E8 |
Northeast |
5.612997895555555E8 |