hadoophivemapreduceapache-tez

why mappers and reducers are allocated on hive tez?


Can anyone please explain even when running hive query in TEZ, yet Mappers and Reducers are allocated?

Is running in Tez engine still uses MR engine?


Solution

  • Sure of course it using the same Map-Reduce primitives, but in addition Tez represents all the task as a single DAG. This allows to optimize it and eliminate unnecessary steps.

    For example this query:

    SELECT DeptName, COUNT(*) as c FROM EmployeeTable
    GROUP BY DeptName ORDER BY c;
    

    On MR it will execute two MR jobs, saving intermediate results into HDFS.

    enter image description here Map-Reduce mandates a local sort of the output from each Map task. When the sort is not required it is an unnecessary overhead. Tez increases the flexibility of task behaviors by eliminating steps mandatory in map-reduce.

    enter image description here

    The more complex the query is the more benefit it will be from Tez. TEZ represents query as a DAG (directed acyclic graph) for a single job and eliminates unnecessary steps like read/write to durable storage, sort of the output from each Map, also enables containers reuse. Tez is always the best choice, for simple queries it will work not worse than MR and much better for complex queries.