USER MANUALS

Caches

Alluxio SDK cache

Alluxio provides a distributed cache layer that can be used between Embedded MPP and Object Storage to improve I/O performance. By caching data closer to the Embedded MPP workers, Alluxio reduces the latency of data access and relieves pressure on the underlying storage system.

The speed of the local cache storage is crucial to the performance of the cache. The recommended approach is to attach NVMe SSDs (or other high performance storage) to the workers of the cluster.

There are two main reasons to use the Alluxio cache:

  • Reduce data transfer costs from the Object Storage to the Embedded MPP. By reducing the number of remote table scans, caching reduces query latency, saves on egress and cloud storage API costs.

  • Improve performance in cases where data reading is a bottleneck. This is the case when the storage is slow, for example an on prem HDFS, or the latency is high, for instance, if the Object Storage and the Embedded MPP are located in different cloud provider regions. But keep in mind that, if Object Storage is already running at very high performance, and your local cache storage has a similar speed, the performance benefits may be minimal.

The Alluxio SDK cache is configured as follows:

  • Add the following properties in the values.yaml additionalConfig property of the desired catalog: hive, iceberg or delta.

Alluxio configuration for Hive catalog in values.yaml
 hive:

   additionalConfig: [
     cache.enabled=true
     cache.type=ALLUXIO
     cache.alluxio.max-cache-size=xxxGB
     cache.base-directory=file:////mnt/flash/data
     hive.node-selection-strategy=SOFT_AFFINITY
   ]
  • Add the following volumeMount to templates/presto-template.yaml.

    volumeMounts:
      - name: cache-volume
        mountPath: /opt/data/alluxio
    
  • Add the following volume to templates/presto-template.yaml.

    volumes:
      - name: cache-volume
        hostPath:
          path: /opt/data/
    

This Alluxio SDK cache is completely transparent to users. To verify if the cache is working, you can check the directory set by cache.base-directory and see if temporary files are created there. Additionally, Alluxio exports various JMX metrics while performing caching-related operations. Refer to “Monitoring Alluxio SDK <https://prestodb.io/docs/current/cache/local.html#monitoring> for more information.

Add feedback