hivederbymetastore

Hadoop Metastore Will Not Initialize


preamble: i'm new to hadoop / hive. have installed standalone hadoop and now am trying to get hive to work. i keep getting an error about initializing the metastore and cannot seem to figure out how to resolve. (hadoop 2.7.2 and hive 2.0)

HADOOP_HOME AND HIVE_HOME ARE SET

ubuntu15-laptop: ~ $>echo $HADOOP_HOME
/usr/hadoop/hadoop-2.7.2

ubuntu15-laptop: ~ $>echo $HIVE_HOME
/usr/hive

hdfs is working

ubuntu15-laptop: ~ $>hadoop fs -ls /
Found 2 items
drwxrwxr-x   - testuser supergroup          0 2016-04-13 21:37 /tmp
drwxrwxr-x   - testuser supergroup          0 2016-04-13 21:38 /user

ubuntu15-laptop: ~ $>hadoop fs -ls /user
Found 1 items
drwxrwxr-x   - testuser supergroup          0 2016-04-13 21:38 /user/hive

ubuntu15-laptop: ~ $>hadoop fs -ls /user/hive
Found 1 items
drwxrwxr-x   - testuser supergroup          0 2016-04-13 21:38 /user/hive/warehouse

ubuntu15-laptop: ~ $>groups
testuser adm cdrom sudo dip plugdev lpadmin sambashare

hive is not working. says i need to initialize my metastore

ubuntu15-laptop: ~ $>hive

Logging initialized using configuration in
jar:file:/usr/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Exception in thread "main" java.lang.RuntimeException: Hive metastore database
is not initialized. Please use schematool (e.g. ./schematool -initSchema
-dbType ...) to create the schema. If needed, don't forget to include the 
option to auto-create the underlying database in your JDBC connection string
(e.g. ?createDatabaseIfNotExist=true for mysql)

so i try to initialize it useing postgres - but schematool tries to use derby

ubuntu15-laptop: ~ $>schematool -initSchema -dbType postgres
Metastore connection URL:  jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :  org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.postgres.sql
Error: Syntax error: Encountered "statement_timeout" at line 1, column 5.
(state=42X01,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization
FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***

so i change hive-site.xml to use postgres drivers etc. but because i don't have the drivers installed, it fails

ubuntu15-laptop: ~ $>cp /usr/hive/conf/hive-site.xml.templ /usr/hive/conf/hive-site.xml
ubuntu15-laptop: ~ $>schematool -initSchema -dbType postgres
Metastore connection URL:  jdbc:postgresql://localhost:5432/hivedb
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:   123456
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
*** schemaTool failed ***

so then i try to use derby first move the hive-site.xml out of the way again so default is derby

ubuntu15-laptop: ~ $>mv /usr/hive/conf/hive-site.xml /usr/hive/conf/hive-site.xml.templ

then i try intializing again with derby but it appears to already be initialized per the error "Error: FUNCTION 'NUCLEUS_ASCII' already exists"

ubuntu15-laptop: ~ $>schematool -initSchema -dbType derby
Metastore connection URL:  jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :  org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.derby.sql
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization
FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***

I've been at this for two days. Any help would be very much appreciated.


Solution

  • So..

    Here's what happened.

    After installing hive, the first thing I did was run hive, which attempted to create/initialize the metastore_db, but apparently didn't get it right. On that initial run, I got this error:

    Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)
    

    Running hive, even though it failed, created a metastore_db directory in the directory from which I ran hive:

    ubuntu15-laptop: ~ $>ls -l |grep meta
    drwxrwxr-x 5 testuser testuser 4096 Apr 14 12:44 metastore_db
    

    So when I then tried running

    ubuntu15-laptop: ~ $>schematool -initSchema -dbType derby
    

    The metastore already existed, but not in complete form.

    Soooooo the answer is:

    1. Before you run hive for the first time, run

      schematool -initSchema -dbType derby

    2. If you already ran hive and then tried to initSchema and it's failing:

      mv metastore_db metastore_db.tmp

    3. Re run

      schematool -initSchema -dbType derby

    4. Run hive again

    **Also of note: if you change directories, the metastore_db created above won't be found! I'm sure there's a good reason for this that I don't know yet because I'm literally trying to use hive for the first time today. Ahhh here's information on this: metastore_db created wherever I run Hive