Hive LLAP Caching

Cache is used mostly for BI queries as compared to ETL queries. is at the node level and it defines the number of low level io threads .Basically, the daemon offloads I/O and transformation from compressed formats to these I/O threads. Then, the data will be passed on to execution where the actual vectorized processing happens by executor threads in JVM. and number of executors per daemon are recommended to be set to same value as cores.

Cache storage :

LLAP’s cache is columnar, automatic and decentralised. When a new column or partition is used , it adds it to cache automatically and do not hold any dead columns. The daemon will cache metadata for input files, as well as the data. The metadata and index information can be cached even for data that is not currently cached. Metadata will be stored in process in Java objects.

● Eviction policy. Currently, LRFU is used but the policy is pluggable. LRFU prevents large scans.
● Caching granularity. Column-chunks will be the unit of data in the cache.

To disable LLAP cache from Command line



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store