USER MANUALS


Hive Tables Syntax and Properties

To access Parquet datasets from Denodo Embedded MPP, you need to map the files from AWS S3, Azure Data Lake, etc.. to Hive tables in the Metastore. The mapping is done through the CREATE TABLE sentence, where you have to provide the data schema, file format, and data location.

CREATE TABLE orders (
   orderkey bigint,
   custkey bigint,
   orderstatus varchar,
   totalprice double,
   orderpriority varchar,
   clerk varchar,
   shippriority integer,
   comment varchar
) WITH (
   external_location = 's3a://my_bucket/path/to/folder/',
   format = 'PARQUET'
);
  • external_location: The data location must be a bucket name or a bucket name and one or more folders, not a specific file. Do not forget to use a slash at the end, otherwise an error will occur: Can not create a Path from an empty string.

  • format: Supported file formats are: - ORC - PARQUET - AVRO - RCBINARY - RCTEXT - SEQUENCEFILE - JSON - TEXTFILE - CSV

The WITH clause of CREATE TABLE can also be used to set other properties on the table. The available table properties are:

  • avro_schema_url (varchar): URI pointing to Avro schema for the table.

  • bucket_count (integer): Number of buckets.

  • bucketed_by (array(varchar)): Bucketing columns.

  • csv_escape (varchar): CSV escape character.

  • csv_quote (varchar): CSV quote character.

  • csv_separator (varchar): CSV separator character.

  • dwrf_encryption_algorithm (varchar): Algorithm used for encryption data in DWRF.

  • dwrf_encryption_provider (varchar): Provider for encryption keys in provider.

  • encrypt_columns (array(varchar)): List of key references and columns being encrypted. Example: ARRAY['key1:col1,col2', 'key2:col3,col4'].

  • encrypt_table (varchar): Key reference for encrypting the whole table.

  • external_location (varchar): File system location URI for external table.

  • format (varchar): Hive storage format for the table. Possible values: [ORC, PARQUET, AVRO, RCBINARY, RCTEXT, SEQUENCEFILE, JSON, TEXTFILE, CSV].

  • orc_bloom_filter_columns (array(varchar)): ORC Bloom filter index columns.

  • orc_bloom_filter_fpp (double): ORC Bloom filter false positive probability.

  • partitioned_by (array(varchar)): Partition columns.

  • preferred_ordering_columns (array(varchar)): Preferred ordering columns for unbucketed table.

  • sorted_by (array(varchar)): Bucket sorting columns.

Add feedback