mongodb-customwrapper is a Virtual DataPort custom wrapper for querying MongoDB collections.
It bridges (to some extent) the gap between NoSQL and relational databases by establishing a predefined schema for output, thus enabling SQL queries on MongoDB.
MongoDB is one of the most prominent so-called NoSQL databases, although it would be more accurate to say that is a non-relational database, or as defined on their website: "A document-oriented, scalable and high performance database", developed by 10gen.
In MongoDB we can forget about the table: we have collections which are groups of documents that do not have to share any common schema. So documents are to MongoDB something very similar to what tuples represent to relational databases, only without the schema restrictions associated with the latter.
Another important feature is the way in which MongoDB stores documents: BSON (Binary JSON). The binary part is hidden from us when we work with MongoDB, and thus we see and treat these data in JSON format, so our documents are in fact JSON objects, and its fields are the equivalent to the columns of the relational world. But as we do not need the documents in a collection to share a common schema, each JSON document can have just the fields it needs, and all documents in a collection do not have to represent entities of the same nature.
And you can also forget about both the SQL language and the JDBC API. MongoDB does not use SQL but its own query API, and consequently offers a specialized driver for Java that acts as a client and interacts with the database.
A database consists of one or more collections, the documents in those collections, and an optional set of security credentials for controlling access.
This Custom wrapper requires:
mongodb-customwrapper allows us to create base views on MongoDB collections and execute SQL queries on those collections.
This wrapper uses the official MongoDB Driver for Java.
It gets a MongoClient object that represents a pool of connections to a MongoDB database. Optionally it can authenticate by user and password.
After making the connection, the custom wrapper can create base views to read collections and documents as if they were tables and rows. Each base view will be created on a specific collection.
Once the base views are created, VDP will be able to perform queries on MongoDB using the VQL Shell, create derived views, etc.
MongoDB Custom Wrapper Architecture
This is a brief summary of the wrapper’s current features:
In order to use the MongoDB Custom Wrapper in VDP, we must configure the Admin Tool to import the extension.
From the denodo-mongodb-customwrapper distribution, we will select the denodo-mongodb-customwrapper-${version}-jar-with-dependencies.jar file and upload it to VDP.
No other jars are required as this one will already contain all the required dependencies, including the MongoDB Java driver classes.
MongoDB Extension in VDP
Once the custom wrapper jar file has been uploaded to VDP using the Admin Tool, we can create new data sources for this custom wrapper --and their corresponding base views-- as usual.
Go to New → Data Source → Custom and specify the wrapper’s class name com.denodo.connect.mongodb.wrapper.MongoDBWrapper. Also check ‘Select Jars’ and select the jar file of the custom wrapper.
MongoDB Data Source
Click to refresh the input parameters of the data source.
Base views created from the MongoDBWrapper need to fill in the Database parameter or the Connection string parameter, both of them are mutually exclusive.
[mongodb://]host1[:port1][,host2[:port2],...[,hostN[:portN]]]/database[?options]
The prefix mongodb:// is optional, the database is mandatory and user and password are not written in this field, but in the User and Password parameters.
You can see all the options in the documentation of mongodb, in the following link about Connection String URI Format.
If you introduce this parameter, Host, Port and Database parameters should be empty.
Five optional parameters:
Once the custom wrapper has been registered, click on Create a base view.
Base view parameters
Collection is mandatory
There are also two parameters that are mutually exclusive:
field1:type1[,field2:type2,...].
Type should be one of the constants in java.sql.Types (note these are SQL standard types). See a specific section below to learn more about the allowed syntax.
In the following example we want to import a product catalog collection of an E-Commerce site database.
At the beginning of the document, the schema must contain general product information, to facilitate searches of the entire catalog. Then, a details subdocument that contains fields that vary between product types.
As we are only interested in albums and films products we use the following introspection query:
{ type: { $in: [ 'Audio Album', 'Film' ] } }
MongoDB Base View edition using Host, Port and Database parameters
MongoDB Base View edition using Connection String parameter
The resulting schema for this product catalog collection can be seen in the image below. It contains common product information like title, type, pricing and the details subdocument.
MongoDB Base View
MongoDB Base View execution
Array and record details of MongoDB Base View execution
The use of the Fields parameter for specifying the view’s schema explicitly is recommended for base view creation when there are complex fields involved coming from mongoDB and there is a possibility that these data are not always returned in the same order or with the same type when using an introspection query.
We will need to specify the fields we would like to import as columns in VDP, using this syntax:
field1:type1[,field2:type2,...].
Type should be one of the constants in java.sql.Types (note these are SQL standard types).
See an example of explicit schema specification (note braces for complex objects should be escaped):
_id : VARCHAR, firstName : VARCHAR, lastName : VARCHAR, age : VARCHAR, address: \{ city : VARCHAR, state : VARCHAR, streetAddress : VARCHAR, state : VARCHAR, postalCode : VARCHAR \}, phoneNumbers: array(\{ aaa : VARCHAR, type : VARCHAR, number : VARCHAR \}), cc:array(VARCHAR) |
See how complex structures are specified using braces ({...}) and arrays using the array keyword and specifying the contents of the array between parentheses.
Also field names can be surrounded by single or double quotes (e.g. "phone number") if they contain white spaces or non-alphanumeric chars.
Mongo DB base View using Fields parameter
Fields Parameter
If you don’t want to explicitly define the schema, you should use the Introspection query parameter. For this you should use a query over the collection, following the syntax for the mongodb method db.collection.find(). You should escape the braces.
An example of introspection query is:
\{ "properties": \{"LINEARID": "1101121687137", "observed": "", "COUNTYFP": "051", "RTTYP": "M", "FULLNAME": "Airport Rd", "MTFCC": "S1400", "STATEFP": "35"\}\} |
MongoDB Base View edition using Introspection query parameter
The MongoDB API for Java Driver allows the connection to a MongoDB Server that has TLS/SSL enabled. To create a Base view using SSL, you have to add the option ssl=true in the Connection String parameter. ssl is one of the Connection Options of the Connection String URI. Alternatively (or additionally), you can also check the Use SSL/TLS checkbox at view creation time.
In addition, if you are using a self-signed certificate, you have to import it into the TrustStore of the JVM of the Denodo Platform. The section “Importing the Certificates of Data Sources (SSL/TLS Connections)” of the Denodo Platform Installation Guide explains how to do this.
An example of a Base view with TLS/SSL:
MongoDB Base View using SSL
Also note that in order to use TLSv1.2 in Denodo 6 (with Java 7), you will need to add the property “com.denodo.parser.connection.http.tlsProtocol=TLSv1.2” to the VDBConfiguration.properties file.
In MongoDB it is possible to authenticate against a database but read the data from a different one. When the authentication is done in a different database, you have to add the option authsource in the Connection String parameter. authsource is one of the Connection Options of the Connection String URI. If you use this option Host, Port and Database parameters should be empty.
An example of a Base View with authsource:
From now on we can query using the VQL Shell tool of the Admin Tool, create derived views, etc.
Querying a MongoDB View
This wrapper allows to make insertion. No complex fields are allowed.
This wrapper allows to make updates in a document. No complex fields are allowed.
It is allowed to delete documents of a collection.
If your Cosmos DB account was created before mid-October 2019 and/or your MongoDB API version at the Cosmos DB side is 3.2 or earlier, you need to enable the Aggregation Pipeline functionality in the Azure portal, inside the "Preview Features" section. Otherwise the Denodo MongoDB Custom Wrapper won’t work.
Now, you can use this wrapper to access Azure Cosmos DB, taking advantage of their MongoDB protocol API. So let’s see the steps for creating an example base view over an example collection called cosmosdb_collection in a database called databasename.
Once you click in the Accept button, you’ll finally see the base view:
NOTE: If you are using Java 7 (Denodo 6), please note you need to enable TLS 1.2 as Cosmos DB requires at least this version.
Create/update operations with complex fields, arrays and registers, on the MongoDB server are not supported through this custom wrapper at its current version.
If you try this operation, you can see the following exception, in the output of the VQL shell:
MongoDB wrapper error. Can't find a codec for class com.denodo.vdb.engine.customwrapper.value.CustomWrapperStruct.
The MongoDB “BSON Timestamp” data type is considered by MongoDB to be for internal use only (see https://docs.mongodb.com/manual/reference/bson-types/#timestamps), but is nevertheless supported by this MongoDB custom wrapper by considering values of this type equivalent to MongoDB’s recommended “BSON Date” type.
Note however that “BSON Timestamp” data types are not supported in WHERE clauses in updates/deletes, nor they will be used in INSERT or UPDATE sentences to set new values into MongoDB (new values will be set as BSON Date, not BSON Timestamp). You can use them in select queries, but notice that filtering will be done in memory instead of directly in the data source.
Note that group by operations are not delegated to the MongoDB server, and therefore aggregations are performed in-memory by Virtual DataPort.
Also note that functions are not delegated to the mongoDB server, and therefore any WHERE clauses containing functions will be executed by Virtual DataPort after retrieving the target data, and not at the mongoDB server.