I get the following in my log file, but don't know what is the implication:
query:
create table mydb.new as
select a.col1,b.col4,a.col3 from
sampledb.table1 a
left join sampledb.table2 b
on a.col3=b.col3
;
No Stats for sampledb@table1, Columns: col_1
No Stats for sampledb@table2, Columns: col_4
Can someone help with this or guide me where to look?
Related question with no answers:Hive No Stats for 'database'@'table', Columns
No stats can result in suboptimal plan (for join columns) or table scan instead of using statistics only for simple queries:
CBO uses statistics when creating query plan. Absence of statistics or stale statistics may result in sub-optimal query plan. If there is no stats, CBO makes some estimations based on files size/avg row size (see hive.stats.avg.row.size
configuration and other statistics related settings).
Also if hive.compute.query.using.stats=true
and statistics exists, then optimizer is using statistics for simple query (for example select count(col1) ...) calculation instead of querying table data (this may lead to wrong query results if the stats is stale), see this answer.
Also read this design document for more details about stats: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-StatisticsinHive