USER MANUALS

Subqueries

A subquery is a query that is embedded in another query acting as an expression or a view reference.

Subqueries can be present either in the SELECT, FROM or WHERE clauses of the query.

Subqueries in the WHERE Clause of the Query

For subqueries in the WHERE clause of the query, the Syntax of the SELECT statement (definition of <subselect condition>) contains the definition of the operators you can use to compare the output of a subquery.

The comparison conditions ALL, ANY and IN a value to a list or subquery. They must be preceded by <, <=, =, <>, >=, > and followed by a subquery.

Example 1

Obtain all the rows of the incident view whose taxid is any of the taxid of the view flat_revenue that have a revenue greater than 2,500.

The following two queries are equivalent. In the first one we use the operator IN and in the other, we show how to use = ANY.

SELECT * FROM incident
WHERE taxid IN
    (SELECT taxid
     FROM flat_revenue
     WHERE revenue > 2500)

SELECT * FROM incident
WHERE taxid = ANY
    (SELECT taxid
     FROM flat_revenue
     WHERE revenue > 2500)

As in other databases, ANY can be used with other operators as well. E.g, taxid > ANY…, taxid < ANY…, etc.

Example 2

Obtain the rows of internet_inc whose id matches the id of a row of phone_inc.

SELECT *
FROM internet_inc AS a
WHERE EXISTS
    (SELECT b.PINC_ID
     FROM PHONE_INC AS b
     WHERE a.iinc_id = b.pinc_id)

In “Example 2” we are using the alias of the main query (internet_inc AS a) in the WHERE clause of the subquery.

At runtime, the queries that contain subqueries in the WHERE clause are converted into semijoins. A semijoin is a relational operation that returns all the rows of the left-side query that have a matching row in the right-side query. As with regular joins, the Virtual DataPort server selects a semijoin method depending on the query.

The available methods for executing semijoins of subqueries are the following:

  1. Merge semijoin: processes rows that are already sorted by the join attributes. Whenever possible, Virtual DataPort selects this algorithm because is almost always the most efficient and with the lowest memory footprint.

    This method can only be selected when both sides of the join are sorted by the join attributes. If the data from both sides is obtained from JDBC or ODBC data sources, Virtual DataPort will retrieve the data sorted by the fields of the join attributes. To do this, it adds the clause ORDER BY to the query sent to the database.

  2. Hash semijoin: the subquery is executed and its results are stored in a hash table. Then, the Virtual DataPort server begins processing the results of the main query and looks for matches in the hash table. After merge semijoin, this is the most efficient algorithm, although not always can be used.

  3. Nested semijoin: the subquery is executed once for each row of the main query’s results.

Virtual DataPort does not use statistics to choose the execution method of the semijoins.

As with regular joins, the query can override the algorithm selected by the Virtual DataPort server to execute the subquery’s semijoin. To do this, add the SUBQUERYPLAN modifier to the CONTEXT clause of the subquery you want to modify its query plan. The Syntax of the CONTEXT clause contains the syntax of this modifier.

Note that the SUBQUERYPLAN modifier has to be indicated in the CONTEXT of the subquery you want to modify and not in the CONTEXT of the main query.

For example,

SELECT * FROM incidents
WHERE taxid IN
    (SELECT taxid
     FROM flat_revenue
     WHERE revenue > 2500 CONTEXT (SUBQUERYPLAN = NESTED ORDERED))

In this example, the query forces the Virtual DataPort server to use the algorithm nested semijoin.

Subqueries in the SELECT Clause of the Query

Subqueries in the SELECT clause act as any other expression in such clause, even being part of the expressions used within a function. For each tuple in the main query, their return value, if any, must be unique. So their SELECT clause can only contain one column. For example:

SELECT i.incident_id
    ,(
        SELECT sum(r.revenue)
        FROM flat_revenue r
        WHERE r.taxid = i.taxid
        )
FROM incident i

At runtime, the queries that contain subqueries in the SELECT clause are converted into left joins. If a row of the left-side query has more than a matching row in the right-side, the query will fail.

As explained in Optimizing Join Operations, a join method is selected for each join by the Virtual DataPort server to execute the query. If the subquery is correlated (it contains a reference to a field from outside the subquery, like in the example above), the join method will always be Nested. This way the subquery will be executed once for each row returned by the main query.

Add feedback