HDFS Path¶
Use this type of path to obtain the data from a file or a set of files located in a HDFS file system.
Find information about the _Filters_ tab in Compressed or Encrypted Data Sources; filters work the same way for any type of path (local, HTTP, FTP…).
Configuration¶
In URI, enter the path you want to obtain the data from. It can point to a file or a directory and you can use interpolation variables (see section Paths and Other Values with Interpolation Variables).
In Hadoop properties you can set the same Hadoop properties that you would put in the Hadoop configuration files like core-site.xml
.
Paths Pointing to a Directory¶
When you create a base view over a data source that points to a directory, Virtual DataPort infers the schema of the new view from the first file in the directory and it assumes that all the other files have the same schema.
Only for delimited-file data sources: if the path points to a directory
and you enter a value in File name pattern, the data source will
only process the files whose name matches the regular expression entered
in this box. For example, if you only want to process the files with the
extension log
, enter (.*)\.log
.
Note
For XML data sources, if a Validation file has been provided, all files in the directory have to match that Schema or DTD.
Authentication¶
There are these authentication modes:
None: use this option if the HDFS server does not require authentication.
Simple: you have to configure the user name. This authentication mode is equivalent to use the HADOOP_USER_NAME variable when you execute the Hadoop commands in a terminal.
Kerberos with user and password: you have to configure the user name and the password.
Kerberos with keytab: you have to configure the user name and you have to upload the keytab.