apache-sparkhiveapache-spark-sqlhiveql

SparkSQL/Hive: equivalent of MySQL's `information_schema.table.{data_length, table_rows}`?


In MySQL, we can query the table information_schema.tables and obtain useful information such as data_length or table_rows

select
  data_length
  , table_rows
from
  information_schema.tables
where  
  table_schema='some_db'
  and table_name='some_table';

+-------------+------------+
| data_length | table_rows |
+-------------+------------+
|        8368 |        198 |
+-------------+------------+
1 row in set (0.01 sec)

Is there an equivalent mechanism for SparkSQL/Hive?

I am okay to use SparkSQL or program API like HiveMetaStoreClient (java API org.apache.hadoop.hive.metastore.HiveMetaStoreClient). For the latter I read the API doc (here) and could not find any method related to table row numbers and sizes.


Solution

  • There is no one command for meta-information. Rather there are a set of commands, you may use

    Describe Table/View/Column

    desc [formatted|extended] schema_name.table_name;
    
    show table extended like part_table;
    SHOW TBLPROPERTIES tblname("foo");
    

    Display Column Statistics (Hive 0.14.0 and later)

    DESCRIBE FORMATTED [db_name.]table_name column_name;
    DESCRIBE FORMATTED [db_name.]table_name column_name PARTITION (partition_spec);