hadoophiveteradatahcatalog

Exception using TDCH for hcat job type. DATE column not supported


The query that I was running is :

 hadoop com.teradata.hadoop.tool.TeradataExportTool  -url     
 jdbc:teradata://tdx/TMODE=ANSI,CHARSET=UTF8,database=db  -username  
 xxx -password xxx -jobtype hcat -sourcetable customers -
 sourcedatabase xxx -nummappers 1 -targettable customers

While running the job, I get this exception :

com.teradata.connector.common.exception.ConnectorException: DATE Field data type is not supported at com.teradata.connector.hive.utils.HiveSchemaUtils.lookupHiveDataTypeByName(HiveSchemaUtils.java:475) at com.teradata.connector.hcat.utils.HCatSchemaUtils.getRecordSchema(HCatSchemaUtils.java:396) at com.teradata.connector.hcat.processor.HCatInputProcessor.inputPreProcessor(HCatInputProcessor.java:89) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:116) at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.teradata.hadoop.tool.TeradataExportTool.main(TeradataExportTool.java:29) Is there any way I can get over this issue ? The crate table statement of the hive table is :

CREATE EXTERNAL TABLE `customers`(                                                  
`row_id` string,                                                                         
`source_start_date` date,                                                                
`source_start_timestamp` timestamp,                                                      
`target_start_timestamp` timestamp,                                                      
`target_start_date` date,                                                                
`source_end_date` date,                                                                  
`source_end_timestamp` timestamp,                                                        
`target_end_date` date,                                                                  
`target_end_timestamp` timestamp,                                                        
`active` string,                                                                         
`is_deleted` string,                                                                     
`status_flag` string,                                                                    
`first_name` string,                                                                         
`last_name` string,                                                                          
`city` string,                                                                                  
)                                                                         
PARTITIONED BY (                                                                               
   `join_dt` date)                                                                           
ROW FORMAT SERDE                                                                               
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'                                                  
STORED AS INPUTFORMAT                                                                          
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'                                            
OUTPUTFORMAT                                                                                   
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'                                           
LOCATION                                                                                       
   'xxx'   
TBLPROPERTIES (                                                                                
 'transient_lastDdlTime'='1461674429')  

Any pointers are appreciated.


Solution

  • There was no way that I found to make the date work,but I found a work around. I create a temp table with the date columns casted to String. e.g., you created a table like :

    Create table person ( person_id int, joining_date date);
    

    From this table you can create temp hive table like:

    Create table temp_person as select person_id, cast(joining_date as string) from person;
    

    In the target teradata side you can give the datatype as Date :

    Create multiset table person ( person_id int, joining_date date);
    

    And now you can run the job like:

    hadoop com.teradata.hadoop.tool.TeradataExportTool  -url     
    jdbc:teradata://tdx/TMODE=ANSI,CHARSET=UTF8,database=db  -username  
    xxx -password xxx -jobtype hcat -sourcetable temp_person -
    sourcedatabase xxx -nummappers 1 -targettable person
    

    This works fine. But due to the overhead of copying table, the performance slows down a little.