Denodo Couchbase Custom Wrapper
You can translate the document:
Introduction
denodo-couchbase-customwrapper is a Virtual DataPort custom wrapper for querying Couchbase collections.
It bridges the gap between NoSQL and relational databases by establishing a predefined schema for output.
What is Couchbase?
Couchbase is an open source distributed, JSON document database, with a powerful search engine and in-built operational and analytical capabilities. It brings the power of NoSQL to the edge and provides fast, efficient bidirectional synchronization of data between the edge and the cloud.
It exposes a scale-out, key-value store with managed cache for sub-millisecond data operations, purpose-built indexers for efficient queries, and a powerful query engine for executing SQL-like queries.
Couchbase stores data as items. Each item consists of a key, by which the item is referenced, and an associated value, which must be either binary or a JSON document. Each document consists of one or more attributes, each of which has its own value. An attribute’s value can be a basic type, such as a number, string or boolean, or a complex type, such as an embedded document or an array.
Items are stored in named buckets and each bucket contains a hierarchy of scopes and collections to group keys and values logically. Collections allow documents to be grouped by purpose or theme, according to a specified scope.
Couchbase uses a document-oriented query language called SQL++. The most important difference between SQL++ and SQL is the data model. In a relational database, data is constrained to tables with uniform structure. By contrast, SQL++ handles data as free-form documents, gathered into large collections called keyscapes. There is no uniformity, nor is there any logical proximity shared by objects of the same data shape within the keyspace. A document might be considered equivalent to a row in a relational table, with each of the document’s attributes being equivalent to a column. Couchbase, however, can store JSON documents with varied schemas.
Requirements
This Custom wrapper requires a Couchbase Server 7.0 or newer.
Architecture and Features
denodo-couchbase-customwrapper allows us to create base views on Couchbase collections and execute SQL queries on those collections.
This wrapper uses the Couchbase Java client 3.7.7 version, and it can create Denodo base views to read Couchbase collections and documents as if they were tables and rows. Each base view will be created on a specific collection.
Once the base views are created, VDP users will be able to perform queries that get delegated to Couchbase, create Denodo derived views on top of the base views, etc.
Couchbase Custom Wrapper Architecture
This is a brief summary of the wrapper’s current features:
- User and password must be specified as input values when building the data source.
- By default, the schema is defined using the Couchbase INFER statement. It enables us to infer the metadata of documents in a keyspace. You can specify the sample size that must be used to analyze and identify the structure of documents.
- Alternatively the schema could be defined as a comma-separated list of field names.
- Lacking fields in returned documents are returned as null.
- Types can be specified from constants in java.sql.Types, which correspond to SQL standard types.
- Field projections are delegated to Couchbase.
- Some query conditions are delegated to Couchbase.
- AND and OR conditions.
- Operators: =, <>, <, >, <=, >=, LIKE, IN, IS NULL, IS NOT NULL.
- ORDER BY clause is delegated to Couchbase.
Usage
Importing the custom wrapper into VDP
In order to use the Couchbase Custom Wrapper in VDP, we must import it into VDP using Design Studio.
From the denodo-couchbase-customwrapper distribution, we will select the denodo-couchbase-customwrapper-${version}-jar-with-dependencies.jar file and upload it from File → Extension management → Import option from the Design Studio menu.
No other jars are required as this one will already contain all the required dependencies, including the Couchbase Java client classes.
Import option in Design Studio
Creating a Couchbase Data Source
Once the custom wrapper jar file has been uploaded to VDP using the Design Studio, we can create new data sources for this custom wrapper –and their corresponding base views– as usual.
Go to New → Data Source → Custom, enable Select jars and select the jar file of the Couchbase custom wrapper.
Couchbase Data Source
Click to refresh the input parameters of the data source.
Data source parameters
Data sources created from the denodo-couchbase-wrapper need to fill in the following parameters.
- Connection string: Connection string to your Couchbase Server instance. The Java SDK includes Capella’s standard Certificate Authority (CA) certificates by default, so any additional configuration isn`t needed. Capella requires TLS, which you can enable by using a connection string that starts with couchbases:// (note the final 's').
[couchbases://]host1[:port1]
- Timeout: The timeout value to configure the cluster environment.
- User: User to connect to Couchbase Server.
- Password: User’s password to connect to Couchbase Server.
- Bucket name: The name of the bucket where your data is stored in your Couchbase Server.
Creating Base Views
Basics
Once the custom wrapper has been registered, click on Create a base view.
Base view parameters
- Scope: Mandatory. The scope of the collection. All the queries will be executed at the scope level
- Collection name: Mandatory. The name of the collection of documents you want to retrieve.
- Fields: The fields we would like to import as columns in VDP. We must keep the syntax:
field1:type1[,field2:type2,...]
Type should be one of the constants in java.sql.Types (note these are SQL standard types). If type is a date/time type you must add the pattern that matches the values of the property between the [ and ] characters. For example:
'date : timestamp_with_timezone [yyyy-MM-dd HH:mm:ss Z]'
- Infer query options: If the fields parameter is empty, the Couchbase INFER statement (without options) will be used to infer the base view schema. Fill in this parameter if you want to specify INFER statement options following the syntax:
“option1”:value1[,”option2”:value2[,”option3”:value3]]
Example
Schemas
Specifying Fields
In the following example we want to import a route collection of a travel-sample bucket using the fields parameter:
Couchbase Route Base View using fields
The resulting schema for this route collection can be seen in the image below.
Couchbase Route Base View schema
Couchbase Route Base View execution
Infer query options
If you don’t want to explicitly define the schema, it will be inferred using the Couchbase INFER statement (by default, without parameters). If you want to specify some optional parameters you can fill in the infer query options param. In the following example we are going to import the hotel collection adding the sample_size and the num_sample_values options to the INFER statement:
"sample_size": 3000, "num_sample_values": 2
Couchbase Base Hotel View with infer query options
Couchbase Hotel Base View schema
Couchbase Hotel Base View execution
Array and record details of Couchbase Base View execution
Querying Couchbase Collections
Once base views have been created, queries can be issued through the VQL Shell tool or any of the other Virtual Data Port query interfaces.
Querying a Couchbase View
Limitations
Write operations
Insert/update/delete operations are not supported through this custom wrapper at its current version.
Date/time data types
Date/time data types are not directly supported by Couchbase. Instead these values are represented as strings, so some kind of conversion needs to be done at the VDP side to parse them as date/time data types: Date, Time or Timestamp. This is done by trying to apply one of the following patterns, and it is expected that all the values in the Couchbase property match the same pattern:
- yyyy-MM-dd
- yyyy-MM-dd HH:mm:ss
- yyyy-MM-dd HH:mm:ss z
- yyyy-MM-dd HH:mm:ss Z
- yyyy-MM-dd’T’hh:mm:ss
- yyyy-MM-dd’T’hh:mm:ssz
- yyyy-MM-dd’T’hh:mm:ssZ
- HH:mm:ss
- HH:mm:ss z
- HH:mm:ss Z
Array names
When generating the schema of a Couchbase view, the array items are named like the array property name followed by the suffix _item. For the Denodo Couchbase Custom Wrapper to work properly, this name must not be changed.
Limitations in select sentences
Note that group by operations are not delegated to the Couchbase server, and therefore aggregations are performed in-memory by Virtual DataPort.
Also note that functions are not delegated to the Couchbase server, and therefore any WHERE clauses containing functions will be executed by Virtual DataPort after retrieving the target data.