USER MANUALS

Embedding Functions

This class of functions is designed to manipulate VECTOR type values. It comprises two distinct groups: Vector Generation Functions, which transform input data into vector representations, and Distance Functions, which quantify the metric separation or dissimilarity between two vector inputs.

The embedding functions are:

EMBED_AI

Description

Calculates a vector embedding for a specific text string using an external embedding model.

Embedding models translate text into abstract numerical sequences (vectors). This representation allows the system to measure the similarity between different text values, which is the foundation for semantic search, clustering, and classification.

Virtual DataPort does not generate vectors internally. Instead, it relies on external providers configured in the Server configuration.

Important

  • Subscription: The Enterprise Plus subscription bundle is required. To verify the active bundle, check the About dialog in the Design Studio. Refer to the section Denodo Platform - Subscription Bundles for more information.

  • Configuration: You must configure the Embedding Model in the Server configuration before you use this function.

Syntax

EMBED_AI( <input:text> ):VECTOR
EMBED_AI( <input:text>, <model:text> ):VECTOR
EMBED_AI( <input:text>, <field:VECTOR> ):VECTOR

Parameters

  • input: The text to analyze.

  • model: The name of the specific model to use for generating the embedding. This parameter influences whether the execution occurs in the source database or in Virtual DataPort (see Processing Logic below).

  • field: A field of type vector. The function inspects the Source Type Properties of this field to determine which embedding model to use.

Processing logic

The execution location and the model used depend on the parameters provided:

  1. Input only: EMBED_AI( <input> ) Virtual DataPort executes the function locally. It generates the vector using the default model defined in the global Server configuration for Vector Database. If no default model is configured, the function returns NULL.

  2. Input and Model: EMBED_AI( <input>, <model> ) Virtual DataPort attempts to delegate the function to the underlying data source.

    • If delegation is possible: The function is pushed down to the data source and executed using the specific model name provided.

    • If delegation is not possible: Virtual DataPort verifies if the specified model matches the model configured in the Embedding Model Configuration.

      • If the models match, Virtual DataPort executes the function locally.

      • If the models do not match, the function returns NULL.

  3. Input and Field: EMBED_AI( <input>, <field> ) Virtual DataPort attempts to extract the embedding model name from the Embedding model source type property of the specified field.

    • If the field has the “Embedding model” source property: The function behaves identically to EMBED_AI( <input>, <model> ). This allows you to generate compatible vectors without manually typing the model name.

      For example, consider a base view with a field named embedding_column. This field has the model all-minilm:l6-v2 configured in its Source Type Properties.

      In this scenario, the following two functions are equivalent:

      • EMBED_AI( 'hello world!', embedding_column )

      • EMBED_AI( 'hello world!', 'all-minilm:l6-v2' )

    • If the field does not have the “Embedding model” source property: The function behaves like the Input only scenario (see item 1). It uses the default global configuration.

Returns

  • Vector: A numeric vector representing the text embedding.

  • NULL: The function returns NULL if an error occurs during generation, the configuration is missing or if the specified model does not match the model configured. Check the logs for more details.

Important

Model Consistency

Different embedding models generate completely different vectors for the same input. You cannot compare vectors generated by different models. To perform meaningful distance or similarity calculations, ensure you generate all comparison vectors using the exact same embedding model.

Example

SELECT EMBED_AI('Hello World!')

embed_ai

[0.345345398559845, 0.834273423429537, 0.834746523453934, …]

VECTOR_DISTANCE

Description

Calculates the distance between a vector column and a text input.

This function acts as a wrapper that simplifies semantic search queries. It automatically detects the embedding model associated with the vector column, generates an embedding for the input text, and calculates the distance using the specified metric.

Using VECTOR_DISTANCE is equivalent to combining a specific distance function (such as VECTOR_COSINE_DISTANCE) with the EMBED_AI function.

Syntax

VECTOR_DISTANCE( <field:vector>, <input:text> [, <metric:text> ] ):double

Parameters

  • field: The column (field) containing the vector data.

  • input: The text string to compare against the vector field.

  • metric: (Optional) The distance metric to use. If you omit this parameter, the system uses cosine by default.

Processing logic

When you execute this function, Virtual DataPort performs the following steps:

  1. Identify the model: The system inspects the Source Type Properties of the field to find the embedding model property. This property contains the name of the AI model used to generate the stored vectors.

  2. Generate the embedding: The system uses the identified model (or the server default if the property is missing) to convert the input text into a vector.

  3. Calculate distance: The system calculates the distance between the stored vector and the generated vector using the specified metric.

Important

For this function to work correctly, the field must have the Embedding model source type property defined in its base view, as shown in Configure the source properties of a vector field in a base view.

If this property is missing, Virtual DataPort cannot determine which model created the stored vectors. In this scenario, the system generates the embedding for the input text using the Embedding Model Configuration set up in the Server configuration. If the default server model differs from the model used to populate the table, the distance calculation will be mathematically invalid.

Supported metrics

The metric parameter determines which underlying distance function Virtual DataPort uses.

Metric Parameter

Internal Rewrite

Description

cosine (Default)

VECTOR_COSINE_DISTANCE

Measures the cosine of the angle between two vectors.

euclidean or l2

VECTOR_L2_DISTANCE

Measures the straight-line distance between two vectors.

manhattan or l1

VECTOR_L1_DISTANCE

Measures the sum of the absolute differences of their coordinates.

inner_product

VECTOR_NEGATIVE_INNER_PRODUCT

Calculates the negative inner product.

Examples

Example 1: Default behavior (Cosine)

When you do not specify a metric, the function defaults to cosine distance.

SELECT VECTOR_DISTANCE(embedding_column, 'hello world!')

Virtual DataPort internally rewrites this query to:

SELECT VECTOR_COSINE_DISTANCE(embedding_column, EMBED_AI('hello world!', embedding_column))

Example 2: Specifying a metric (Euclidean)

You can specify a metric such as euclidean (or l2).

SELECT VECTOR_DISTANCE(embedding_column, 'hello world!', 'euclidean')

Virtual DataPort internally rewrites this query to:

SELECT VECTOR_L2_DISTANCE(embedding_column, EMBED_AI('hello world!', embedding_column))

VECTOR_COSINE_DISTANCE

Description

The cosine distance between two vectors is a measure of their angular dissimilarity, calculated by subtracting their cosine similarity from 1. Cosine similarity is a measure of directional alignment that is calculated by dividing the inner product of the vectors by the product of their magnitudes. A larger distance value indicates a greater difference or dissimilarity between the vectors. A value of one indicates that the vectors are orthogonal. A distance of zero means the vectors are identical.

Note

Use the cosine distance metric as the default function unless you have a specific preference for another metric. This function generally yields consistent results. Additionally, most databases support cosine distance, which increases the probability of query delegation.

Given two n-dimensional vectors p and q:

Formula for Cosine Distance: 1 minus the Cosine Similarity.

Syntax

VECTOR_COSINE_DISTANCE( <v1:VECTOR>, <v2:VECTOR> ):double

Returns

  • The cosine distance between two input vectors as a double value.

  • The function returns NULL if the vector dimensions do not match or if any input vector is NULL.

Example

SELECT VECTOR_COSINE_DISTANCE(vector<float,3>[1.0,4.0,3.0], vector<float,3>[1.0,1.0,6.0])

vector_cosine_distance

0.2682725225778554

VECTOR_L1_DISTANCE

Description

The Manhattan distance between two vectors is calculated by summing the absolute differences of their corresponding coordinates. To calculate this distance, both vectors must have the same dimension. A larger distance value indicates a greater difference or dissimilarity between the vectors. A distance of zero means the vectors are identical.

Given two n-dimensional vectors p and q:

Formula for Manhattan distance.

Syntax

VECTOR_L1_DISTANCE( <v1:VECTOR>, <v2:VECTOR> ):double

Returns

  • The Manhattan distance between two input vectors as a double value.

  • The function returns NULL if the vector dimensions do not match or if any input vector is NULL.

Example

SELECT VECTOR_L1_DISTANCE(vector<float,3>[1.0,4.0,3.0], vector<float,3>[1.0,1.0,6.0])

vector_l1_distance

6

VECTOR_L2_DISTANCE

Description

The Euclidean distance between two vectors is calculated as the square root of the sum of the squared differences of their corresponding coordinates. A larger distance value indicates greater dissimilarity between the vectors, whereas a distance of zero indicates that the vectors are identical.

Given two n-dimensional vectors p and q:

Formula for euclidean distance.

Syntax

VECTOR_L2_DISTANCE( <v1:VECTOR>, <v2:VECTOR> ):double

Returns

  • The Euclidean distance between two input vectors as a double value.

  • The function returns NULL if the vector dimensions do not match or if any input vector is NULL.

Example

SELECT VECTOR_L2_DISTANCE(vector<float,3>[1.0,4.0,3.0], vector<float,3>[1.0,1.0,6.0])

vector_l2_distance

4.242640687119285

VECTOR_NEGATIVE_INNER_PRODUCT

Description

The VECTOR_NEGATIVE_INNER_PRODUCT function returns the negated dot product of two vectors.

While the standard inner product measures alignment—where larger values indicate greater similarity—the negated version is used primarily as a distance metric. By inverting the sign, the measure follows the standard distance convention: a smaller (more negative) value indicates that the vectors are more closely aligned, whereas a larger value indicates they are more divergent.

Given two n-dimensional vectors p and q:

Formula for innper product distance. Negative inner product.

Syntax

VECTOR_NEGATIVE_INNER_PRODUCT( <v1:VECTOR>, <v2:VECTOR> ):double

Returns

  • The negative inner product. A distance metric between two input vectors as a double value where smaller values indicate higher similarity.

  • The function returns NULL if the vector dimensions do not match or if any input vector is NULL.

Example

SELECT VECTOR_INNER_PRODUCT_DISTANCE(vector<float,3>[1.0,4.0,3.0], vector<float,3>[1.0,1.0,6.0])

vector_inner_product_distance

23

Add feedback