Subqueries¶
A subquery is a query that is embedded in another query acting as an expression or a view reference.
Subqueries can be present either in the SELECT
, FROM
or WHERE
clauses of the query.
Subqueries in the WHERE Clause of the Query¶
For subqueries in the WHERE clause of the query, the Syntax of the SELECT statement (definition of
<subselect condition>
) contains the definition of the operators you
can use to compare the output of a subquery.
The comparison conditions ALL
, ANY
and IN
a value to a list
or subquery. They must be preceded by <
, <=
, =
, <>
,
>=
, >
and followed by a subquery.
Example 1
Obtain all the rows of the incident
view whose taxid
is any of
the taxid
of the view flat_revenue
that have a revenue greater
than 2,500.
The following two queries are equivalent. In the first one we use the
operator IN
and in the other, we show how to use = ANY
.
SELECT * FROM incident
WHERE taxid IN
(SELECT taxid
FROM flat_revenue
WHERE revenue > 2500)
SELECT * FROM incident
WHERE taxid = ANY
(SELECT taxid
FROM flat_revenue
WHERE revenue > 2500)
As in other databases, ANY
can be used with other operators as well.
E.g, taxid > ANY
…, taxid < ANY
…, etc.
Example 2
Obtain the rows of internet_inc
whose id matches the id of a row of
phone_inc
.
SELECT *
FROM internet_inc AS a
WHERE EXISTS
(SELECT b.PINC_ID
FROM PHONE_INC AS b
WHERE a.iinc_id = b.pinc_id)
In “Example 2” we are using the alias of the main query
(internet_inc AS a
) in the WHERE
clause of the subquery.
At runtime, the queries that contain subqueries in the WHERE
clause are converted into
semijoins. A semijoin is a relational operation that returns all the
rows of the left-side query that have a matching row in the right-side
query. As with regular joins, the Virtual DataPort server selects a semijoin method
depending on the query.
The available methods for executing semijoins of subqueries are the following:
Merge semijoin: processes rows that are already sorted by the join attributes. Whenever possible, Virtual DataPort selects this algorithm because is almost always the most efficient and with the lowest memory footprint.
This method can only be selected when both sides of the join are sorted by the join attributes. If the data from both sides is obtained from JDBC or ODBC data sources, Virtual DataPort will retrieve the data sorted by the fields of the join attributes. To do this, it adds the clause ORDER BY to the query sent to the database.
Hash semijoin: the subquery is executed and its results are stored in a hash table. Then, the Virtual DataPort server begins processing the results of the main query and looks for matches in the hash table. After merge semijoin, this is the most efficient algorithm, although not always can be used.
Nested semijoin: the subquery is executed once for each row of the main query’s results.
Virtual DataPort does not use statistics to choose the execution method of the semijoins.
As with regular joins, the query can override the algorithm selected by
the Virtual DataPort server to execute the subquery’s semijoin. To do this, add the
SUBQUERYPLAN
modifier to the CONTEXT
clause of the subquery you
want to modify its query plan. The Syntax of the CONTEXT clause
contains the syntax of this modifier.
Note that the SUBQUERYPLAN
modifier has to be indicated in the
CONTEXT
of the subquery you want to modify and not in the
CONTEXT
of the main query.
For example,
SELECT * FROM incidents
WHERE taxid IN
(SELECT taxid
FROM flat_revenue
WHERE revenue > 2500 CONTEXT (SUBQUERYPLAN = NESTED ORDERED))
In this example, the query forces the Virtual DataPort server to use the algorithm nested semijoin.
Subqueries in the SELECT Clause of the Query¶
Subqueries in the SELECT
clause act as any other expression in such clause, even being part of the expressions used
within a function. For each tuple in the main query, their return value, if any, must be unique. So their SELECT
clause can only contain one column. For example:
SELECT i.incident_id
,(
SELECT sum(r.revenue)
FROM flat_revenue r
WHERE r.taxid = i.taxid
)
FROM incident i
At runtime, the queries that contain subqueries in the SELECT
clause are converted into left joins. If a row of the
left-side query has more than a matching row in the right-side, the query will fail.
As explained in Optimizing Join Operations, a join method is selected for each join by the Virtual DataPort server to execute the query. If the subquery is correlated (it contains a reference to a field from outside the subquery, like in the example above), the join method will always be Nested. This way the subquery will be executed once for each row returned by the main query.