We are trying to use a common data for more than one Outlook accounts. Lets say data is stored in a container which belongs to data@outlook.com and I want to read it as datasc1@outlook.com, my friend wants to read from datasc2@outlook.com.
I have common account's storage account name, container name (which is public container) but when I try to read the data using Hive with command below:
CREATE EXTERNAL TABLE deneme (t1 string, t2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'wasb://container@storageaccount.blob.core.windows.net/OUR_DATA.txt';
OR I also try command below
CREATE EXTERNAL TABLE deneme (t1 string, t2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'wasb://container@storageaccount.blob.core.windows.net/OUR_DATA.txt?sig=ACCESS_KEY_OF_CONTAINER';
I get the error below:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException Uploads to to public accounts using anonymous access is prohibited.)
We've tried some methods, we made the container type "Public Blob" it didn't work. We added our accounts to storage accounts default directory and it didn't work also. I tried to load data with PIG it seemed to work, but when I dump, PIG also failed.
A weird thing to me is when I run the code below on Hadoop command line it works perfectly :
hadoop fs -lsr wasb://container@storageaccount.blob.core.windows.net/
output is :
lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxrwxrwx 1 145391417 2015-05-18 10:58 wasb://container@storageaccount.blob.core.windows.net/OUR_DATA.txt
-rwxrwxrwx 1 25634418 2015-05-18 10:44 wasb://container@storageaccount.blob.core.windows.net/OUR_OTHER_DATA.txt
To sum up up our problem is reading data from another Azure account with our Azure accounts, using HDInsight (Hive/PIG/Hadoop).
Does it work if you just point to the folder instead of a specific file? Hive expects locations to be folder paths, not specific files.
CREATE EXTERNAL TABLE deneme (t1 string, t2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'wasb://container@storageaccount.blob.core.windows.net/';
I was able to create a similar external table against a container configured as a "Public Container".
If you don't want to use a public container, you can include the storage key in a configuration variable directly in a Hive script like:
set fs.azure.account.key.storageaccount.blob.core.windows.net=ACCESS_KEY_OF_CONTAINER;
Or you can configure the cluster at provisioning time with access permissions to the storage account using the Additional Storage Accounts section of the custom create wizard, or by using the Add-AzureHDInsightStorage cmdlet to modify the cluster configuration prior to creating the cluster.
This article has a bunch of related information on the interactions between HDInsight and Azure Blob Storage: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/