Hive 3 (Architecture)

Hive 3 has seen lot of changes in terms of Architecture like default Table type as ACID, deprecating hive cli (thick Jdbc client) and only supporting the Thin JDBC client (Beeline) etc.

Below is the High Level Architecture (I tried to make some changes to existing Hive Architecture Diagram which had Job Tracker in it)

Design: components

Server
Thrift Server API (server impls, serialization, network I/O)
Processor: Application Logic (session, operation, driver etc)
Client
JDBC/ODBC/Beeline
Thrift Client API (discouraged)
ZooKeeper
Service discovery
Authentication
Kerberos/LDAP/Pluggable
Metastore & RDBMS
Remote
: TCP connection to Metastore Thrift Server to access RDBMS data
Embedded: Access RDBMS data via Metastore Thrift Client API but no TCP connection
Execution & Persistence
MR/Tez/Spark
HDFS/Local Disk
Plugins
Atlas
Ranger
Plug in others via pre/post hooks.

Query Flow (Analysis from the Logs)

1.Session is opened:

2.Compiler Still waits for Query to arrive:

3.Query comes in and Compile lock is acquired:

4.Query Compilation

5. Parsing

6. Semantic Analyzer:

7. Get Metadata operation:

8. Plan Generation:

In the above step plan is generated, CBO applied and optimizers applied

Refer: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/hive-overview/content/hive-apache-hive-3-architectural-overview.html

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store