Hive 3 (Architecture)

Oct 26, 2020

Hive 3 has seen lot of changes in terms of Architecture like default Table type as ACID, deprecating hive cli (thick Jdbc client) and only supporting the Thin JDBC client (Beeline) etc.

Below is the High Level Architecture (I tried to make some changes to existing Hive Architecture Diagram which had Job Tracker in it)

Design: components

Server
Thrift Server API (server impls, serialization, network I/O)
Processor: Application Logic (session, operation, driver etc)
Client
JDBC/ODBC/Beeline
Thrift Client API (discouraged)
ZooKeeper
Service discovery
Authentication
Kerberos/LDAP/Pluggable
Metastore & RDBMS
Remote: TCP connection to Metastore Thrift Server to access RDBMS data
Embedded: Access RDBMS data via Metastore Thrift Client API but no TCP connection
Execution & Persistence
MR/Tez/Spark
HDFS/Local Disk
Plugins
Atlas
Ranger
Plug in others via pre/post hooks.

Query Flow (Analysis from the Logs)

1.Session is opened:

service.CompositeService (:()) — Session opened, SessionHandle [f8afd0d8-ec65–4bb8-a4c6-f5537781fcfb], current sessions:2

2.Compiler Still waits for Query to arrive:

<PERFLOG method=waitCompile from=org.apache.hadoop.hive.ql.Driver>

3.Query comes in and Compile lock is acquired:

ql.Driver (:()) — Acquired the compile lock.

4.Query Compilation

ql.Driver (:()) — Compiling command(queryId=hive_20200408113511_15f380d1–2ad4–4f7f-bcbb-cfd97e38f2f6): select id,name from user_data_managed where age < 50 order by id limit 50

5. Parsing

parse.ParseDriver (:()) — Parsing command: select id,name from user_data_managed where age < 50 order by id limit 50

6. Semantic Analyzer:

parse.CalcitePlanner (:()) — Starting Semantic Analysis

7. Get Metadata operation:

parse.CalcitePlanner (:()) — Get metadata for source tables

8. Plan Generation:

org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction Calcite: Plan generation>

In the above step plan is generated, CBO applied and optimizers applied

Refer: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/hive-overview/content/hive-apache-hive-3-architectural-overview.html

Hive 3 (Architecture)

Design: components

Query Flow (Analysis from the Logs)

Written by Tamil Selvan K