Database has locks and Hive is no different. Locks in database can be either Read Lock or Write Lock. Locks are used when concurrent applications tries to access the same table. Locks prevents data from being corrupted or invalidated when multiple users try to reach while others write to database.
…
Cache is used mostly for BI queries as compared to ETL queries.
hive.llap.io.threadpool.size is at the node level and it defines the number of low level io threads .Basically, the daemon offloads I/O and transformation from compressed formats to these I/O threads. Then, the data will be passed on to…
Hive 3 has seen lot of changes in terms of Architecture like default Table type as ACID, deprecating hive cli (thick Jdbc client) and only supporting the Thin JDBC client (Beeline) etc.
Below is the High Level Architecture (I tried to make some changes to existing Hive Architecture Diagram which…
For Yarn application, to fetch application’s data we can use Rest APIs on ATS below are some reference links:
We can use RM REST APIs to get some application related data. Some of the Examples are below:
a) Failed apps for the specific time period
GET “http://Resource-Manager-Address:8088/ws/v1/cluster/apps?limit=10&startedTimeBegin={time in…
What strategy ORC should use to create splits for execution. The available options are “BI”, “ETL” and “HYBRID”.
The HYBRID mode reads the footers for all files if there are fewer files than expected mapper count, switching over to generating 1 split per file if the average file sizes are…