Hive 3 (Architecture)
Hive 3 has seen lot of changes in terms of Architecture like default Table type as ACID, deprecating hive cli (thick Jdbc client) and only supporting the Thin JDBC client (Beeline) etc.
Below is the High Level Architecture (I tried to make some changes to existing Hive Architecture Diagram which had Job Tracker in it)
Design: components
Server
Thrift Server API (server impls, serialization, network I/O)
Processor: Application Logic (session, operation, driver etc)
Client
JDBC/ODBC/Beeline
Thrift Client API (discouraged)
ZooKeeper
Service discovery
Authentication
Kerberos/LDAP/Pluggable
Metastore & RDBMS
Remote: TCP connection to Metastore Thrift Server to access RDBMS data
Embedded: Access RDBMS data via Metastore Thrift Client API but no TCP connection
Execution & Persistence
MR/Tez/Spark
HDFS/Local Disk
Plugins
Atlas
Ranger
Plug in others via pre/post hooks.
Query Flow (Analysis from the Logs)
1.Session is opened:
service.CompositeService (:()) — Session opened, SessionHandle [f8afd0d8-ec65–4bb8-a4c6-f5537781fcfb], current sessions:2
2.Compiler Still waits for Query to arrive:
<PERFLOG method=waitCompile from=org.apache.hadoop.hive.ql.Driver>
3.Query comes in and Compile lock is acquired:
ql.Driver (:()) — Acquired the compile lock.
4.Query Compilation
ql.Driver (:()) — Compiling command(queryId=hive_20200408113511_15f380d1–2ad4–4f7f-bcbb-cfd97e38f2f6): select id,name from user_data_managed where age < 50 order by id limit 50
5. Parsing
parse.ParseDriver (:()) — Parsing command: select id,name from user_data_managed where age < 50 order by id limit 50
6. Semantic Analyzer:
parse.CalcitePlanner (:()) — Starting Semantic Analysis
7. Get Metadata operation:
parse.CalcitePlanner (:()) — Get metadata for source tables
8. Plan Generation:
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction Calcite: Plan generation>
In the above step plan is generated, CBO applied and optimizers applied