oracle-databaseapache-sparkdataframeapache-spark-sqlhbase

spark dataframe and design Hbase : one table multiple-columns vs multiple tables one column family


I have multiples tables on an oracle database. I would like to copy these tables on Hbase, what is the best design, one table with multiple-columns family and each column family represent an oracle table? or multiple tables on Hbase with one column family containing all fields or multiples tables withe multiple columns family (each column family contain one column qualifier)?

I would after that using spark dataframe to run some job and querying like Oracle!

which strategy you use?

cordially


Solution

  • Multiple column family (more than 3 column family) for one table is discouraged.

    Please see hbase manual

    So you have other option[s] which you have mentioned which are more suited for your requirements and your kind of design.