• User Manuals /
  • Denodo MongoDB Custom Wrapper - User Manual

Denodo MongoDB Custom Wrapper - User Manual

Download original document


You can translate the document:

Introduction

mongodb-customwrapper is a Virtual DataPort custom wrapper for querying MongoDB collections.

It bridges (to some extent) the gap between NoSQL and relational databases by establishing a predefined schema for output, thus enabling SQL queries on MongoDB.

What is MongoDB?

MongoDB is one of the most prominent so-called NoSQL databases, although it would be more accurate to say that is a non-relational database, or as defined on their website: "A document-oriented, scalable and high performance database", developed by 10gen.

In MongoDB we can forget about the table: we have collections which are groups of documents that do not have to share any common schema. So documents are to MongoDB something very similar to what tuples represent to relational databases, only without the schema restrictions associated with the latter.

Another important feature is the way in which MongoDB stores documents: BSON (Binary JSON). The binary part is hidden from us when we work with MongoDB, and thus we see and treat these data in JSON format, so our documents are in fact JSON objects, and its fields are the equivalent to the columns of the relational world. But as we do not need the documents in a collection to share a common schema, each JSON document can have just the fields it needs, and all documents in a collection do not have to represent entities of the same nature.

And you can also forget about both the SQL language and the JDBC API. MongoDB does not use SQL but its own query API, and consequently offers a specialized driver for Java that acts as a client and interacts with the database.

A database consists of one or more collections, the documents in those collections, and an optional set of security credentials for controlling access.

Requirements

This Custom wrapper requires:

  • When used with Denodo 5.5: At least the Denodo update 20170529
  • When used with Denodo 6.0: At least the Denodo update 20170515
  • When used with Denodo 7.0: No particular update is needed.
  • MongoDB version 2.6 or newer.

Architecture and Features

mongodb-customwrapper allows us to create base views on MongoDB collections and execute SQL queries on those collections.

This wrapper uses the official MongoDB Driver for Java.

It gets a MongoClient object that represents a pool of connections to a MongoDB database. Optionally it can authenticate by user and password.

After making the connection, the custom wrapper can create base views to read collections and documents as if they were tables and rows. Each base view will be created on a specific collection.

Mongoclient objects are stored in a cache: when a query is executed,a search is performed on this cache, to see if there is a mongoclient with the same connection parameters. If this is the case, the connection pool will be reused. When creating or updating a base view, all pool connections are closed and the cache is invalidated. You can disable the pool connections in the parameters of the datasource. For each datasource VDP, if the optionis enabled , a pool connection will be opened, and this pool opens at least three connections in MongoDB, and two of them will remain open until the cache is cleared or VDP is stopped.

Once the base views are created, VDP will be able to perform queries on MongoDB using the VQL Shell, create derived views, etc.

MongoDB Custom Wrapper Architecture

This is a brief summary of the wrapper’s current features:

  • User/password authentication is supported. User and password can be specified as input values when building the base view.

  • Base view result schema is defined as a comma-separated list of field names.

  • Extra fields in returned documents are ignored.

  • Lacking fields in returned documents are returned as null.

  • Types can be specified from constants in java.sql.Types, which correspond to SQL standard types.                

  • Alternatively the schema could be defined using an introspection query. All documents (remember, “schemaless”) retrieved by this query are analyzed to reveal their fields and build the base view schema. For this reason the query should retrieve a significant sample of the collection we are interested in.

  • Since common fields may hold different types of data the resulting structure is the highest common denominator between all the fields with the same name. In case of incompatible fields --such as integer and subdocuments-- VDP interprets them as of type text. But notice that MongoDB is strict about types and you must query for data using the correct type, so this fields would not be searchable.

  • Create/update operations using simple fields are supported.

  • Delete operations are supported.

  • Field projections are delegated to MongoDB.

  • Some query conditions are delegated to MongoDB.

  • AND and OR conditions.
  • Operators: =, <>, <, >, <=, >=, LIKE, IN, IS NULL, IS NOT NULL.
  • Note the IS NULL operator gets the documents where the specified field has the null value or does not exist.

  • ORDER BY clause is delegated to MongoDB.

Usage

Importing the custom wrapper into VDP

In order to use the MongoDB Custom Wrapper in VDP, we must configure the Admin Tool to import the extension.

From the denodo-mongodb-customwrapper distribution, we will select the denodo-mongodb-customwrapper-${version}-jar-with-dependencies.jar file and upload it to VDP.

No other jars are required as this one will already contain all the required dependencies, including the MongoDB Java driver classes.

     

MongoDB Extension in VDP

Creating a MongoDB Data Source

Once the custom wrapper jar file has been uploaded to VDP using the Admin Tool, we can create new data sources for this custom wrapper --and their corresponding base views-- as usual.

Go to New → Data Source → Custom and specify the wrapper’s class name com.denodo.connect.mongodb.wrapper.MongoDBWrapper. Also check ‘Select Jars’ and select the jar file of the custom wrapper.

MongoDB Data Source

Click to refresh the input parameters of the data source.

Data source parameters

Base views created from the MongoDBWrapper need to fill in the Database parameter or the Connection string parameter, both of them are mutually exclusive.

  • Database: Database name.

  • Connection String: this parameter allows more options in the connection between the driver and mongodb. This is the pattern:

[mongodb://]host1[:port1][,host2[:port2],...[,hostN[:portN]]]/database[?options]

The prefix mongodb:// is optional, the database is mandatory and user and password are not written in this field, but in the User and Password parameters.

You can see all the options in the documentation of mongodb, in the following link about Connection String URI Format.

If you introduce this parameter, Host, Port and Database parameters should be empty.

Seven optional parameters:

  • Host: Name of the computer or IP address where MongoDB is running, default is 127.0.0.1. Only with Database parameter.

  • Port: Port number to connect to MongoDB, default is 27017. Only with Database parameter.

  • User/Password: Username and password to connect to MongoDB, if authentication is enabled and Kerberos is not being used..

  • Pass-through session credentials: Enabling this checkbox allows VDP to execute the queries using the credentials of the user logged into VDP, so User and Password fields would be only used for introspection.

  • Use SSL/TLS: Enabling this checkbox will make the wrapper use SSL/TLS. When using a Connection String, this can also be specified by means of the “ssl=true” option.
  • Pool Connections: Use a cache of database connections to reuse them, with this the base views of the same datsource would use the same pool connection.

  • Kerberos Principal Name: Name of the Kerberos principal to be used, in format user@REALM. Note the “@” symbol needs to be escaped (e.g. “myname\@MYCORPORATE.DOMAIN”). When using Kerberos authentication, this parameter is required.

  • Kerberos KeyTab File: Location of the keytab file to be used, which should contain valid authentication credentials for the principal specified above. When using Kerberos authentication, it is required to set a value to either this parameter or to Kerberos JAAS Configuration.

  • Kerberos JAAS Configuration: Location of the JAAS configuration file to be used, which should contain the complete JAAS configuration using Krb5LoginModule as detailed here. Please note the principal option should always be added to this file even if it is also required at the Kerberos Principal Name parameter of the VDP data source. When using Kerberos authentication, it is required to set a value to either this parameter or to Kerberos KeyTab File. Example JAAS configuration:

com.sun.security.jgss.krb5.initiate {
         com.sun.security.auth.module.Krb5LoginModule required
         doNotPrompt=true
         debug=true
         refreshKrb5Config=true
         useTicketCache=true
         ticketCache="file:///home/someuser/krb5cc_someuser"
         principal="someuser@REALM";
};

Creating Base Views

Basics

Once the custom wrapper has been registered, click on Create a base view.

Base view parameters

 Collection is mandatory

  • Collection: Collection name that we will import as a table in VDP.

There are also two parameters that are mutually exclusive:

  • Fields: The fields we would like to import as columns in VDP. We must keep the syntax

field1:type1[,field2:type2,...].

Type should be one of the constants in java.sql.Types (note these are SQL standard types). See a specific section below to learn more about the allowed syntax.

  • Introspection query: Documents retrieved by this query will be analyzed to reveal their fields and build the view schema. An empty query selects all documents in the collection, so, if the collection has a lot of documents, the creation of the base view can take a long time. In addition, it is recommendable to check the types of the new view inferred  from the  introspection.
    This query requires
    MongoDB syntax.

Example

In the following example we want to import a product catalog collection of an E-Commerce site database.

At the beginning of the document, the schema must contain general product information, to facilitate searches of the entire catalog. Then, a details subdocument that contains fields that vary between product types.

As we are only interested in albums and films products we use the following introspection query:

{ type: { $in: [ 'Audio Album', 'Film' ] } }

MongoDB Base View edition using Host, Port and Database parameters

MongoDB Base View edition using Connection String parameter

The resulting schema for this product catalog collection can be seen in the image below. It contains common product information like title, type, pricing and the details subdocument.

MongoDB Base View

MongoDB Base View execution

Array and record details of MongoDB Base View execution

Schemas

Specifying Fields

The use of the Fields parameter for specifying the view’s schema explicitly  is recommended for base view creation when there are complex fields involved coming from mongoDB and there is a possibility that these data are not always returned in the same order or with the same type when using an introspection query.

We will need to specify the fields we would like to import as columns in VDP, using this syntax:

field1:type1[,field2:type2,...].

Type should be one of the constants in java.sql.Types (note these are SQL standard types).

  • Default is VARCHAR.
  • If documents in the same collection specify different types for fields with the same name, using VARCHAR (text in VDP) for that column will automatically perform the required conversions to show data from those documents.
  • Note that the SQL equivalent to MongoDB’s “Date” type is SQL’s TIMESTAMP, as MongoDB Date values contain both date and time.
  • MongoBD’s internal “BSON Timestamp” data type is supported as SQL TIMESTAMP too, but with some limitations (see section at the end).
  • BinData type from BSON is supported and it will be defined as BLOB in VDP.
  • Nested arrays syntax is array(array(\{"field":VARCHAR\})).

See an example of explicit schema specification (note braces for complex objects should be escaped):

_id : VARCHAR,

firstName : VARCHAR,

lastName : VARCHAR,

age : VARCHAR,

address: \{

    city : VARCHAR,

    state : VARCHAR,

    streetAddress : VARCHAR,

    state : VARCHAR,

    postalCode : VARCHAR

    \},

phoneNumbers: array(\{

    aaa : VARCHAR,

    type : VARCHAR,

    number : VARCHAR

    \}),

cc:array(VARCHAR)

See how complex structures are specified using braces ({...}) and arrays using the array keyword and specifying the contents of the array between parentheses.

Also field names can be surrounded by single or double quotes (e.g. "phone number") if they contain white spaces or non-alphanumeric chars.

Mongo DB base View using Fields parameter

Fields Parameter

Special data types:

JSON

Although the allowed data types are defined in java.sql.Types, there is the possibility to be able to obtain a raw JSON (text) for a field at a given level, rather than introspecting and obtaining an array/register. To do this, you just only need to specify the desired field as JSON. For example, for an example collection such as:

We could create a base view obtaining the JSON representation for the awards field:

So the result of executing a sample query against this view would be:

If we extract the value of the awards field we can see that it’s a JSON formatted text block:

NOTE: Although a field is defined in the base view creation as JSON, VDP will display it as text type.

ObjectID

This BSON type isn’t defined under java.sql.Types either, but ObjectID fields can be defined manually for a base view. For example:

This way, you’d be able to filter over those fields:

Introspection Queries

If you don’t want to explicitly define the schema, you should use the Introspection query parameter. For this you should use a query over the collection, following the syntax for the mongodb method db.collection.find(). You should escape the braces.

 

An example of introspection query is:

\{ "properties": \{"LINEARID": "1101121687137", "observed": "", "COUNTYFP": "051", "RTTYP": "M", "FULLNAME": "Airport Rd", "MTFCC": "S1400", "STATEFP": "35"\}\}

MongoDB Base View edition using Introspection query parameter

TLS/SSL

The MongoDB API for Java Driver allows the connection to a MongoDB Server that has TLS/SSL enabled. To create a Base view using SSL, you have to add the option ssl=true in the Connection String parameter. ssl is one of the Connection Options of the Connection String URI. Alternatively (or additionally), you can also check the Use SSL/TLS checkbox at view creation time.

In addition, if you are using a self-signed certificate, you have to import it into the TrustStore of the JVM of the Denodo Platform. The section “Importing the Certificates of Data Sources (SSL/TLS Connections)” of the Denodo Platform Installation Guide explains how to do this.

An example of a Base view with TLS/SSL:

MongoDB Base View using SSL

Also note that in order to use TLSv1.2 in Denodo 6 (with Java 7), you will need to add the property “com.denodo.parser.connection.http.tlsProtocol=TLSv1.2” to the VDBConfiguration.properties file.

AuthSource

In MongoDB it is possible to authenticate against a database but read the data from a different one. When the authentication is done in a different database, you have to add the option authsource in the Connection String parameter. authsource is one of the Connection Options of the Connection String URI. If you use this option  Host, Port and Database parameters should be empty.

An example of a Base View with authsource:

Querying MongoDB Collections

From now on we can query using the VQL Shell tool of the Admin Tool, create derived views, etc.

Querying a MongoDB View

Insert

This wrapper allows to make insertion. No complex fields are allowed.

Update

This wrapper allows you to make updates in a document. No complex fields are allowed.

Delete

It is allowed to delete documents of a collection.

Connecting to Azure Cosmos DB

If your Cosmos DB account was created before mid-October 2019 and/or your MongoDB API version at the Cosmos DB side is 3.2 or earlier, you need to enable the Aggregation Pipeline functionality in the Azure portal, inside the "Preview Features" section. Otherwise the Denodo MongoDB Custom Wrapper won’t work.


Now, you can use this wrapper to access Azure Cosmos DB, taking advantage of their MongoDB protocol API. So let’s see the steps for creating an example base view over an example collection called cosmosdb_collection in a database called databasename.

  1. Create a Custom datasource selecting the denodo-mongodb-customwrapper-7.0 (*.6.0). You can find the needed parameters under your Azure Cosmos DB directory:



    We have two different ways for defining the datasource parameters:
  1. Via input parameters
  1. Host: The HOST shown in the screenshot.
  2. Port: The PORT shown in the screenshot.
  3. User: The USERNAME shown in the screenshot.
  4. Password: The PRIMARY PASSWORD shown in the screenshot.
  5. Database: The name of the database that contains the collection we want to create the base view over.


  1. Via Connection String
  1. The PRIMARY CONNECTION STRING shown in the screenshot  with the format mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]

  1. Create the base view. Here we set the collection name (in our example, cosmosdb_collection) as well as Fields or Introspection query the way it was explained in the “Specifying Fields” and “Introspection Queries” sections:

Once you click in the Accept button, you’ll finally see the base view:

NOTE: If you are using Java 7 (Denodo 6), please note you need to enable TLS 1.2 as Cosmos DB requires at least this version.

Connecting to Alibaba Cloud ApsaraDB for MongoDB

You can use this wrapper to access Alibaba Cloud ApsaraDB for MongoDB. So let’s see the steps for creating an example base view over an example collection called apsaradb_collection in a database called admin.

  • Create a Custom datasource selecting the denodo-mongodb-customwrapper-8.0 (*.7.0). You can find the needed parameters under the Database connection option of your Alibaba Cloud ApsaraDB for MongoDB instance:

You must click on Apply for Public Connection String and use this connection string, with the format mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
 in the Denodo MongoDB Custom Wrapper configuration:

  • Create the base view. Here we set the collection name (in our example, apsaradb_collection) as well as Fields or Introspection query the way it was explained in the “Specifying Fields” and “Introspection Queries” sections:
            

Once you click in the Accept button, you’ll finally see the base view:

Limitations

Write operations

Create/update operations with complex fields, arrays and registers, on the MongoDB server are not supported through this custom wrapper at its current version.

If you try this operation, you can see the following exception, in the output of the VQL shell:

MongoDB wrapper error. Can't find a codec for class com.denodo.vdb.engine.customwrapper.value.CustomWrapperStruct.

MongoDB BSON Timestamp data type

The MongoDB “BSON Timestamp” data type is considered by MongoDB to be for internal use only (see https://docs.mongodb.com/manual/reference/bson-types/#timestamps), but is nevertheless supported by this MongoDB custom wrapper by considering values of this type equivalent to MongoDB’s recommended “BSON Date” type.

Note however that “BSON Timestamp” data types are not supported in WHERE clauses in updates/deletes, nor they will be used in INSERT or UPDATE sentences to set new values into MongoDB (new values will be set as BSON Date, not BSON Timestamp). You can use them in select queries, but notice that filtering will be done in memory instead of directly in the data source.

Limitations in select sentences

Note that group by operations are not delegated to the MongoDB server, and therefore aggregations are performed in-memory by Virtual DataPort.

Also note that functions are not delegated to the mongoDB server, and therefore any WHERE clauses containing functions will be executed by Virtual DataPort after retrieving the target data, and not at the mongoDB server.

User privileges on older versions of MongoDB (below 4.0)

The Denodo MongoDB Custom Wrapper needs to check, in some operations, if a MongoDB collection exists. In MongoDB versions prior to 4.0, these operations could fail if the user does not have read access to the database where the collection is stored or if the user does not have permissions to execute the listCollections operation.