csvhivehive-serdehiveddlregexserde

insert data into table using csv file in HIVE


CREATE TABLE `rk_test22`(
`index` int, 
`country` string, 
`description` string, 
`designation` string, 
`points` int, 
`price` int, 
`province` string, 
`region_1` string, 
`region_2` string, 
`taster_name` string, 
`taster_twitter_handle` string, 
`title` string, 
`variety` string, 
`winery` string)
ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
'input.regex'=',(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)') 
STORED AS INPUTFORMAT 
'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://namever/user/hive/warehouse/robert.db/rk_test22'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true', 
'numFiles'='1', 
'skip.header.line.count'='1', 
'totalSize'='52796693', 
'transient_lastDdlTime'='1516088117');

I created the hive table using above command. Now I want to load the following line (in CSV file) into table using load data command. The load data command shows status OK but i cannot see data into that table.

0,Italy,"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco  (Etna),White Blend,Nicosia

Solution

  • If you are loading one line CSV file then that line is skipped because of this property: 'skip.header.line.count'='1'

    Also Regex should contain one capturing group for each column. Like in this answer: https://stackoverflow.com/a/47944328/2700344

    And why do you provide these settings in table DDL:

    'COLUMN_STATS_ACCURATE'='true'
    'numFiles'='1', 
    'totalSize'='52796693', 
    'transient_lastDdlTime'='1516088117'
    

    All these should be set automatically after DDL and ANALYZE.