hadoophanadata-accessvora

Difference between SAP HANA Smart Data Access & SAP HANA Vora


Can someone explain the difference between the Smart Data Access of SAP HANA and SAP HANA Vora?

As I understud, the SDA just creates some virtual tables that enable to access the data of an external system (like Hadoop and many other databases by ODBC) like it would be part of the SAP HANA system (so you can use the HANA IDE) and uses the "default database engine" to calculate and return the sub-result of this external system back to SAP HANA.

The concept of Vora is also to give a SAP user access to a Hadoop system, but the sub-results are calculted by using the InMemory execution engine of Apache Spark.

I read that Hadoop is a perfect data storage for cold data (data coming from SAP HANA that is older than a certain time period and not needed for all analyses). But I'm confused by these two data access solutions, as I can't find an important difference.

Which one would be better if you plan to access the hot (SAP HANA) and cold (Hadoop) data in one analysis, written by SAP HANA tools?

What would be your recommendation to combine SAP HANA and a Hadoop Cluster, which contains the data in Hive tables?


Solution

  • HANA Vora and SDA are related but are actually two different things that cannot be compared directly.

    Smart Data Access is a feature/component in HANA that is used to connect to external data sources (e.g. MySQL/Oracle databases, Vora, etc)

    The word HANA in "HANA Vora" is misleading because Vora is actually a stand-alone product that does not need HANA to run. Vora is an extension of Apache Spark and allows you to process data from HDFS in memory. Also, one of the key features with Vora is that it integrates well with HANA. It can join its local tables with tables from HANA or vice-versa.

    Currently Vora does not support INSERT/UPDATE commands, therefore you cannot directly move data for cold data storage from HANA to Vora. You can achieve this however by using HANA's Data Lifecycle Manager (DLM) which is discussed in this blogpost: https://blogs.sap.com/2016/02/12/seamless-big-data-tiering-with-hana-hadoop-and-vora-with-a-little-help-from-dlm/