The filesystem-customwrapper is a Virtual DataPort custom wrapper that enables VDP to retrieve information from a filesystem and display it in a relational way. This allows to inspect local and network folders and also folders which are accessible by a FTP server, retrieve lists of files (in a single folder or recursively) and filter files using any part of its metadata (file name, file size, last modification date, etc.). The custom wrapper can also read their contents --in text or binary form-- and allows creating, updating and deleting text files (except when using FTP).
The custom wrapper allows to retrieve the meta-information of the files such as filename, size or date modified. Three custom wrapper classes are provided in the JAR file:
One of the possible uses of this wrapper is selecting a set of files in a folder (or folders, recursively), then filter them by date, size and/or file name and use that list of files in the left branch of a nested join operation to pass them to an XML or Delimited File base view that uses a parameterized file name. This way we can extend the native capabilities of the XML and Delimited File data sources; they allow to only read one file or all the files in a folder, but by using a nested join with the file system custom wrapper we can select a subset of all the files, and/or files recursively in subfolders.
The entry point for the queries to the wrapper is the folder where the search for files will be performed, specified through the parameter parentfolder. The VDP customwrapper will list the files are located in that directory; it is possible to do this in a recursive way, retrieving also the contents of the subfolders (also recursively). The parameter recursive is mandatory also as parentfolder. You always should specify this field: false in the case that you only want search the files in the specified directories and true if you want to do a search also in subdirectories recursively.
This wrapper is able to carry out read, create, update and delete operations. It can be delegated the following query artifacts and operators from VDP:
Installing the custom wrapper
In order to use the denodofile-customwrapper first it is necessary to install its JAR file into the Denodo VDP server. For this, go to the VDP Administration tool, open the “File > Extensions” menu option and install the jar-with-dependencies file that is included in the distribution of the FileSystem custom wrapper.
Creating a new datasource
After this, create a new data source using “File > New > DataSource > Custom” menu option and fill the appropriate parameters:
Once the datasource has been created, we can create base views on it. If the ReadFileSystemConnector has been chosen, this custom wrapper does not need input parameters to create a base view. So after accessing view to create base view we only can set the primary key or change the datatype of the parameters.
If ReadWriteFileSystemConnector has been chosen, you can fill the optional field fileMaxSize that represents, in bytes, the maximum size of the files whose its contents will be read. If you do not fill this field, no limit will be established. The configuration of this parameter can avoid memory problems.
Note: Besides setting fileMaxSize, it is very important to configure the chunksize parameter in Admin Tool Preferences -> Connection; this way the amount of used memory can be reduced thanks to the specification of a limit in the number of rows that are being sent to the client in each block, thus we are reducing the size of used memory. To avoid memory problems when it is necessary to access large files or a big number of files, it is recommended to reduce this parameter to a more suitable value.
If ReadFileSystemFTPConnector has been chosen. Click to refresh the input parameters of the datasource and fill in the fields Host, Port, and User. All of them are necessary to establish the FTP/FTPS/SFTP connection. Besides, you can fill the field Password which may be mandatory for the specified user in order to access his/her home directory on the remote machine. There is also a check box, Establish the connection using the active mode, that allows to change the mode of the FTP connection between the client and the server. By default connections are established in passive mode, but if you check this option connection mode will change to active. You also should choose the Protocol among FTP, FTPS and SFTP. And if you use SFTP you can add the Key File and Passphrase associated with the key file.
The schema of the three custom wrappers contains these columns:
In Addition, in the ReadWriteFileSystemConnector there are also these columns:
In the following image we can see the view schema:
This is an example of a select operation that retrieves all the files in a folder. In this type of query the parameter parentfolder is mandatory and represents the starting point of the search.
In this example we search recursively for all the files in c:/temp2 folder:
SELECT *
FROM filesystem
WHERE (parentfolder = 'c:/temp2' AND recursive is true)
In the following image we can see the results of this query with the ReadFileSystemConnector custom wrapper (that only retrieves the file metadata):
In the following image we can see the results of this query with the ReadWriteFileSystemConnector custom wrapper (that retrieves both the metadata and the contents of the files):
If you use the ReadFileSystemFTPConnector custom wrapper the parameter parentfolder is required too and a slash (‘/’) or an empty string (‘’) represent the home directory.
In this example we search recursively for all the files in the home directory:
SELECT *
FROM ftpfilesystem
WHERE (parentfolder = '' and recursive is true)
If you use the custom wrapper ReadFileSystemFTPConnector custom wrapper (that only retrieves the file metadata), you have to fill in the parameters Host, Port, User, Password, Active Mode and Protocol. If you choose SFTP you can fill in Key File and Passphrase.
This is an example with a more complex condition, where we search for files located in 'c:/temp2' recursively, with a size smaller than 600 and last modification date after '11-nov-2013 13h 46m 2s'
SELECT * FROM filesystem
WHERE parentfolder = 'C:\\Temp1'
and recursive = true
and size < 3000
and datemodified < 'Dec 5, 2014 9:48:49 AM'
CONTEXT ('i18n'='us_pst', 'cache_wait_for_load'='true')
This is an example of an UPDATE query. This type of query is used for modifying the contents of a single file. It is mandatory to specify values for either the column fullpath or both the parentfolder + filename columns. It is only possible to update files one by one, and only using text contents. If the directories of the path do not exist, they will be created. And if the file does not exist, it will be created too, in this case UPDATE operation would be equivalent to INSERT operation.
In this example we set 'This text file has been modified file' as the contents of the file 'c:/textfile.txt':
UPDATE filesystem_write
SET (textcontents, append) =
('This text file has been modified file',true)
WHERE parentfolder='c:/temp1' AND filename='textfile.txt'
This is an example of INSERT operation, used to create new files in the file system. In this type of query, it is mandatory to specify values for either the column fullpath or both the parentfolder + filename columns. It is only possible to insert files one by one, and only using text contents. If the directory where we want to insert the new file and its ancestors, they do not exist, they will be created.
In this example we create a new file in the folder ‘c:/’ with name ‘proba20.txt’, setting the contents to ‘If there is a file with this path, it will be overwritten’ and overwriting any previous file with that same name:
INSERT
INTO filesystem_write
(parentfolder, filename, textcontents, overwrite)
VALUES
('c:/temp1', 'textfile.txt', 'This is a text file', true)
This is an example of DELETE operation, used to delete files. In this type of query it is mandatory to specify values for either the column fullpath or both the parentfolder + filename columns. It is only possible to delete files one by one.
In this example we delete the file ‘c:/temp1/testfile.txt’:
DELETE
FROM filesystem_write
WHERE parentfolder = 'c:/temp1' AND filename = 'testfile.txt'
First, we have to create a delimited file data source by clicking on File -> New -> Data Source -> Delimited File. Then we need to configure the Data Route parameter: for this click on Configure. We introduce a variable in Local path, using the reserved character ‘@’, as in this case “@path” .
Click on Ok and after we have to choose a path from one of the files that we want to access.
Now, we have to set the Column Delimiter, and the other parameters, in this case we use the comma (‘,’)
Click on OK and finally, we create a base view and after we can configure the name of the columns, and its types.
After creating the Delimited File, we create a Join nested File -> New -> Join. First we add the base view of the file system custom wrapper, in this case we add, one of the first examples of this manual. Second we add the delimited file base view. After we link the fullpath parameter of the first view with the path parameter of the second view. We configure the join strategy in the Join Conditions tab changing the Join Method to Nested Join.
In the Where Conditions tab, we have to set the parentfolder because is a mandatory field, and select with the other conditions the files, that we want to access, in this example we select the files that their filename start with ‘test’
In the following image we can see the results of the execution of the Nested Join, the file system custom wrapper returns 1 rows, and on each delimited file, there were two rows too, so the VDP has received 2 rows.