I have a HIVE connector in Trino setup.. the files are in S3 and I can start to create tables to query data.. however I am getting an error.. and I suspect it could be the compression method.. is my TABLE
creation step looking okay..
SELECT * from ibtickers;
Query 20220925_104301_00008_h8me8, FAILED, 1 node
Splits: 3 total, 0 done (0.00%)
0.15 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20220925_104301_00008_h8me8 failed: Failed to read Parquet file: s3a://MYBUCKET/data_dump/ib_tickers/20220918_192603-092364.snappy.parquet
here is the lines I used to create the table
CREATE TABLE IF NOT EXISTS hive.<MY_SCHEMA>.<MY_TABLE> (
-> column_one VARCHAR,
-> column_two VARCHAR,
-> column_three VARCHAR,
-> column_four DOUBLE,
-> column_five VARCHAR,
-> column_six VARCHAR,
-> query_start_time TIMESTAMP)
-> WITH (
-> external_location = 's3a://<MY_S3_BUCKET_NAME>/dir_one/dir_two',
-> format = 'PARQUET'
-> );
The files work locally when I am utilizing a viewer in pycharm just fine so.. my guess is my TRINO or HIVE setup needs fixing..?
I just tried the other compression types from meltano.. and none seem to work.. I've already confirmed the s3 files are reachable and downloadable via boto3 ... so this leads me to think this has to do with how I create the tables.. so any help here is appreciated.
I found the fuller stack trace and it looks like i need to just.. figure out how to add the compression support to ...HIVE Metastore?
io.trino.spi.TrinoException: Failed to read Parquet file: s3a://test-bucket-13549875235/data_dump/ib_tickers_gzip/20220925_173820-404835.gz.parquet
at io.trino.plugin.hive.parquet.ParquetPageSource.handleException(ParquetPageSource.java:169)
at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.lambda$createPageSource$6(ParquetPageSourceFactory.java:271)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:75)
at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:406)
at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:385)
at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:292)
at io.trino.spi.Page.getLoadedPage(Page.java:229)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:314)
at io.trino.operator.Driver.processInternal(Driver.java:411)
at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
at io.trino.operator.Driver.tryWithLock(Driver.java:706)
at io.trino.operator.Driver.process(Driver.java:306)
at io.trino.operator.Driver.processForDuration(Driver.java:277)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
at io.trino.$gen.Trino_397____20220925_161652_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.UnsupportedOperationException: io.trino.spi.type.DoubleType
at io.trino.spi.type.AbstractType.writeLong(AbstractType.java:91)
at io.trino.parquet.reader.LongColumnReader.readValue(LongColumnReader.java:31)
at io.trino.parquet.reader.PrimitiveColumnReader.lambda$readValues$2(PrimitiveColumnReader.java:248)
at io.trino.parquet.reader.PrimitiveColumnReader.processValues(PrimitiveColumnReader.java:304)
at io.trino.parquet.reader.PrimitiveColumnReader.readValues(PrimitiveColumnReader.java:246)
at io.trino.parquet.reader.PrimitiveColumnReader.readPrimitive(PrimitiveColumnReader.java:235)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:441)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:540)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:523)
at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:272)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
... 17 more
io.trino.spi.TrinoException: Failed to read Parquet file: s3a://test-bucket-13549875235/data_dump/ib_tickers_zstd/20220925_175330-188177.zstd.parquet
at io.trino.plugin.hive.parquet.ParquetPageSource.handleException(ParquetPageSource.java:169)
at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.lambda$createPageSource$6(ParquetPageSourceFactory.java:271)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:75)
at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:406)
at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:385)
at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:292)
at io.trino.spi.Page.getLoadedPage(Page.java:229)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:314)
at io.trino.operator.Driver.processInternal(Driver.java:411)
at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
at io.trino.operator.Driver.tryWithLock(Driver.java:706)
at io.trino.operator.Driver.process(Driver.java:306)
at io.trino.operator.Driver.processForDuration(Driver.java:277)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
at io.trino.$gen.Trino_397____20220925_161652_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.UnsupportedOperationException: io.trino.spi.type.DoubleType
at io.trino.spi.type.AbstractType.writeLong(AbstractType.java:91)
at io.trino.parquet.reader.LongColumnReader.readValue(LongColumnReader.java:31)
at io.trino.parquet.reader.PrimitiveColumnReader.lambda$readValues$2(PrimitiveColumnReader.java:248)
at io.trino.parquet.reader.PrimitiveColumnReader.processValues(PrimitiveColumnReader.java:304)
at io.trino.parquet.reader.PrimitiveColumnReader.readValues(PrimitiveColumnReader.java:246)
at io.trino.parquet.reader.PrimitiveColumnReader.readPrimitive(PrimitiveColumnReader.java:235)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:441)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:540)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:523)
at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:272)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
... 17 more
io.trino.spi.TrinoException: Failed to read Parquet file: s3a://test-bucket-13549875235/data_dump/ib_tickers_br/20220925_181026-076690.br.parquet
at io.trino.plugin.hive.parquet.ParquetPageSource.handleException(ParquetPageSource.java:169)
at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.lambda$createPageSource$6(ParquetPageSourceFactory.java:271)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:75)
at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:406)
at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:385)
at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:292)
at io.trino.spi.Page.getLoadedPage(Page.java:223)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:314)
at io.trino.operator.Driver.processInternal(Driver.java:411)
at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
at io.trino.operator.Driver.tryWithLock(Driver.java:706)
at io.trino.operator.Driver.process(Driver.java:306)
at io.trino.operator.Driver.processForDuration(Driver.java:277)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
at io.trino.$gen.Trino_397____20220925_161652_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.RuntimeException: Error reading dictionary page
at io.trino.parquet.reader.PageReader.readDictionaryPage(PageReader.java:118)
at io.trino.parquet.reader.PrimitiveColumnReader.setPageReader(PrimitiveColumnReader.java:188)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:439)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:540)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:523)
at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:272)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
... 17 more
Caused by: io.trino.parquet.ParquetCorruptionException: Codec not supported in Parquet: BROTLI
at io.trino.parquet.ParquetCompressionUtils.decompress(ParquetCompressionUtils.java:69)
at io.trino.parquet.reader.PageReader.readDictionaryPage(PageReader.java:113)
... 23 more
io.trino.spi.TrinoException: Failed to read Parquet file: s3a://test-bucket-13549875235/data_dump/ib_tickers_snappy/20220918_194105-135895.snappy.parquet
at io.trino.plugin.hive.parquet.ParquetPageSource.handleException(ParquetPageSource.java:169)
at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.lambda$createPageSource$6(ParquetPageSourceFactory.java:271)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:75)
at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:406)
at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:385)
at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:292)
at io.trino.spi.Page.getLoadedPage(Page.java:229)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:314)
at io.trino.operator.Driver.processInternal(Driver.java:411)
at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
at io.trino.operator.Driver.tryWithLock(Driver.java:706)
at io.trino.operator.Driver.process(Driver.java:306)
at io.trino.operator.Driver.processForDuration(Driver.java:277)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
at io.trino.$gen.Trino_397____20220925_161652_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.UnsupportedOperationException: io.trino.spi.type.DoubleType
at io.trino.spi.type.AbstractType.writeLong(AbstractType.java:91)
at io.trino.parquet.reader.LongColumnReader.readValue(LongColumnReader.java:31)
at io.trino.parquet.reader.PrimitiveColumnReader.lambda$readValues$2(PrimitiveColumnReader.java:248)
at io.trino.parquet.reader.PrimitiveColumnReader.processValues(PrimitiveColumnReader.java:304)
at io.trino.parquet.reader.PrimitiveColumnReader.readValues(PrimitiveColumnReader.java:246)
at io.trino.parquet.reader.PrimitiveColumnReader.readPrimitive(PrimitiveColumnReader.java:235)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:441)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:540)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:523)
at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:272)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
... 17 more
I realized the LONG
in a parquet file doesn't convert to DOUBLE
but to BIGINT
when using HIVE.. this allowed me to proceed to take in the offending column
CREATE TABLE IF NOT EXISTS hive.<MY_SCHEMA>.<MY_TABLE> (
-> column_one VARCHAR,
-> column_two VARCHAR,
-> column_three VARCHAR,
-> column_four BIGINT, <--- change this
-> column_five VARCHAR,
-> column_six VARCHAR)
-> WITH (
-> external_location = 's3a://<MY_S3_BUCKET_NAME>/dir_one/dir_two',
-> format = 'PARQUET'
-> );