Before the question/issue, here's the setup:
Table 1
CREATE EXTERNAL TABLE `table1`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
`load_dt` string)
PARTITIONED BY (
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654609315')
Table 2
CREATE EXTERNAL TABLE `table2`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
PARTITIONED BY (
`load_dt` string,
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654147830')
When the following Athena SQL is executed, the error below is thrown:
insert into tabl2
select * from table1;
"HIVE_UNSUPPORTED_FORMAT: Output format org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat with SerDe org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is not supported."
That error seems relatively straighforward, but I'm still stuck on
building a solution despite looking for alternatives to the so-called
HiveIgnoreKeyTextOutputFormat
. There's also the partition difference
going on, but I'm not sure if that has any bearing on this current error
shown here.
I believe you should be using the following SerDe/format combination:
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
You can also create table2
via CTAS, avoiding the manual creation/INSERT:
Note that you should have separate/non-overlapping S3 paths for both
create table table2
with
(
format='parquet',
parquet_compression='snappy',
partitioned_by=array['load_dt', 'site_id'],
external_location = 's3://foobucket/object-thing-table2/'
)
as
select * from table1;
If you now run show create table table2;
you will see that CTAS has generated the MapredParquetInputFormat
+ MapredParquetOutputFormat