I have a local setup of Trino, Hive Metastore, and Minio storage. I have enabled and configured Alluxio caching and Disk Spilling on Trino. The number of requests made to the object storage is higher than expected. Given that I am only testing on a few megabytes of Parquet files.
What could be the problem? and the solution?
Here are my configurations in /etc/trino/config.properties .
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://localhost:8080
catalog.management=${ENV:CATALOG_MANAGEMENT}
query.max-memory=2GB
query.max-memory-per-node=700MB
exchange.http-client.max-requests-queued-per-destination=999999
scheduler.http-client.max-requests-queued-per-destination=999999
exchange.http-client.request-timeout=30s
task.info-update-interval=2s
spill-enabled=true
spiller-spill-path=/tmp/spill
spiller-max-used-space-threshold=0.7
spiller-threads= 16
max-spill-per-node=100GB
query-max-spill-per-node=100GB
aggregation-operator-unspill-memory-limit=15MB
spill-compression-codec=LZ4
spill-encryption-enabled=false
Here are my catalog configurations in /etc/trino/catalog/hive.properties
connector.name=hive
hive.metastore=thrift
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.path-style-access=true
hive.s3.endpoint=http://minio:9000
hive.s3.aws-access-key=XXX
hive.s3.aws-secret-key=XXX
hive.non-managed-table-writes-enabled=true
hive.s3.ssl.enabled=false
hive.s3.max-connections=1000
hive.metastore.thrift.client.read-timeout=3000s
hive.timestamp-precision=MILLISECONDS
hive.collect-column-statistics-on-write=false
hive.storage-format=PARQUET
hive.security=allow-all
fs.cache.enabled=true
fs.cache.directories=/tmp/cache
fs.cache.max-disk-usage-percentages=70
fs.cache.ttl=32d
fs.cache.preferred-hosts-count=5
fs.cache.page-size=15MB
Thanks in advance.
-- Edit: Test with both cach and spilling disabled
Disabling caching and spilling, has affected the latency and throughput, and caused far more GetObject requests:
-- Edit: Cache Tracing --
I have enabled cache tracing, and the cache is being hit.
Instruction to enable tracing were found at: File System Cache-Monitoring
Setup in Docker Compose is simply:
jaeger:
image: jaegertracing/all-in-one:latest
hostname: jaeger
ports:
- "16686:16686"
- "4317:4317"
environment:
- COLLECTOR_OTLP_ENABLED=true
And /etc/trino/config.properties
################################# Tracing
tracing.enabled=true
tracing.exporter.endpoint=http://jaeger:4317
-- Edit: Provide more context Part II--
Based on Slack chat, the following has been confirmed:
On Trino startup, I get:
024-05-08T09:14:37.105Z INFO main Bootstrap hive.s3.security-mapping.refresh-period ---- ---- How often to refresh the security mapping configuration
2024-05-08T09:14:37.106119355Z 2024-05-08T09:14:37.105Z INFO main Bootstrap hive.s3.security-mapping.iam-role-credential-name ---- ---- Name of the extra credential used to provide IAM role
2024-05-08T09:14:37.106120682Z 2024-05-08T09:14:37.105Z INFO main Bootstrap jmx.base-name ---- ----
2024-05-08T09:14:37.237666777Z 2024-05-08T09:14:37.237Z INFO main alluxio.client.file.cache.PageStore Opening PageStore with option=alluxio.client.file.cache.store.PageStoreOptions@780e214b
2024-05-08T09:14:37.256328387Z 2024-05-08T09:14:37.256Z INFO pool-51-thread-1 alluxio.client.file.cache.LocalCacheManager Restoring PageStoreDir (/tmp/alluxio_cache/LOCAL)
2024-05-08T09:14:37.257456265Z 2024-05-08T09:14:37.257Z INFO pool-51-thread-1 alluxio.client.file.cache.LocalCacheManager PageStore (/tmp/alluxio_cache/LOCAL) restored with 0 pages (0 bytes), discarded 0 pages (0 bytes)
2024-05-08T09:14:37.257498782Z 2024-05-08T09:14:37.257Z INFO pool-51-thread-1 alluxio.client.file.cache.LocalCacheManager Cache is in READ_WRITE.
2024-05-08T09:14:37.552235866Z 2024-05-08T09:14:37.551Z INFO main org.ishugaliy.allgood.consistent.hash.HashRing Ring [hash_ring_8605] created: hasher [METRO_HASH], partitionRate [1000]
2024-05-08T09:14:37.566095594Z 2024-05-08T09:14:37.565Z INFO main org.ishugaliy.allgood.consistent.hash.HashRing Ring [hash_ring_8605]: node [TrinoNode[nodeIdentifier=5b97f235d043, hostAndPort=172.23.0.12:8080]] added
2024-05-08T09:14:37.721904967Z 2024-05-08T09:14:37.721Z DEBUG main io.trino.connector.CoordinatorDynamicCatalogManager -- Added catalog hive using connector hive --
2024-05-08T09:14:37.724651195Z 2024-05-08T09:14:37.724Z INFO main io.trino.security.AccessControlManager Using system access control: default
2024-05-08T09:14:37.735207694Z 2024-05-08T09:14:37.734Z INFO main io.trino.server.Server Server startup completed in 14.36s
2024-05-08T09:14:37.735276670Z 2024-05-08T09:14:37.734Z INFO main io.trino.server.Server ======== SERVER STARTED ========
-- Edit: Provide more context Part I--
The test simply runs a few simple queries in a loop, so regardless of partitioning or queries joins cardinalities, the caching should have been triggered.
By "expected" I meant, I would have expected a 10-megabyte dataset not to cause 60 GB of traffic, as in the screenshot below.
I assume that the Alluxio cache is bypassed altogether, so I doubt that it is not only a matter of configuration, as one might infer from: Trino: File System Cache
By consulting more references, it's clear that more steps should be included:
Besides, I doubt compatibility issues, since I am on Trino 445. The Alluxio docs mentions that the step were only tested against an earlier version of Trino:
Deploy Trino. This guide is tested with Trino-352.
Here is a distribution of the requests trino makes to Minio bypassing any caching, I include it in case it hints Trino team to an optimization idea, especially head requests are triggered very often:
Finally, after following the steps in the above mentioned tutorials, the following error occurs:
trino> CREATE SCHEMA hive.lakehouse
-> WITH (location = 'alluxio://alluxio-leader:19998/lakehouse');
Query 20240508_071745_00002_vyeug failed: Invalid location URI: alluxio://alluxio-leader:19998/lakehouse
Logs:
2024-05-08 09:17:45 2024-05-08T07:17:45.210Z DEBUG dispatcher-query-4 io.trino.security.AccessControl Invocation of checkCanSetUser(principal=Optional[trino], userName='trino') succeeded in 27.47us
2024-05-08 09:17:45 2024-05-08T07:17:45.210Z DEBUG dispatcher-query-4 io.trino.security.AccessControl Invocation of checkCanExecuteQuery(identity=Identity{user='trino', principal=trino}, queryId=20240508_071745_00002_vyeug) succeeded in 13.49us
2024-05-08 09:17:45 2024-05-08T07:17:45.211Z DEBUG dispatcher-query-5 io.trino.execution.QueryStateMachine Query 20240508_071745_00002_vyeug is QUEUED
2024-05-08 09:17:45 2024-05-08T07:17:45.211Z DEBUG dispatcher-query-1 io.trino.execution.QueryStateMachine Query 20240508_071745_00002_vyeug is WAITING_FOR_RESOURCES
2024-05-08 09:17:45 2024-05-08T07:17:45.212Z DEBUG dispatcher-query-1 io.trino.execution.QueryStateMachine Query 20240508_071745_00002_vyeug is DISPATCHING
2024-05-08 09:17:45 2024-05-08T07:17:45.212Z DEBUG dispatcher-query-1 io.trino.execution.QueryStateMachine Query 20240508_071745_00002_vyeug is RUNNING
2024-05-08 09:17:45 2024-05-08T07:17:45.213Z DEBUG Query-20240508_071745_00002_vyeug-172 io.trino.security.AccessControl Invocation of checkCanCreateSchema(context=SecurityContext{identity=Identity{user='trino', principal=trino}, queryId=20240508_071745_00002_vyeug}, schemaName=hive.lakehouse, properties={location=alluxio://alluxio-leader:19998/lakehouse}) succeeded in 68.04us
2024-05-08 09:17:45 2024-05-08T07:17:45.223Z DEBUG Query-20240508_071745_00002_vyeug-172 io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient Invocation of getDatabase(name='lakehouse') took 7.32ms and failed with NoSuchObjectException(message:database hive.lakehouse)
2024-05-08 09:17:45 2024-05-08T07:17:45.224Z DEBUG dispatcher-query-1 io.trino.execution.QueryStateMachine Query 20240508_071745_00002_vyeug is FAILED
2024-05-08 09:17:45 2024-05-08T07:17:45.224Z DEBUG Query-20240508_071745_00002_vyeug-172 io.trino.execution.QueryStateMachine Query 20240508_071745_00002_vyeug failed
2024-05-08 09:17:45 io.trino.spi.TrinoException: Invalid location URI: alluxio://alluxio-leader:19998/lakehouse
2024-05-08 09:17:45 at io.trino.plugin.hive.HiveMetadata.lambda$createSchema$22(HiveMetadata.java:954)
2024-05-08 09:17:45 at java.base/java.util.Optional.map(Optional.java:260)
2024-05-08 09:17:45 at io.trino.plugin.hive.HiveMetadata.createSchema(HiveMetadata.java:949)
2024-05-08 09:17:45 at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.createSchema(ClassLoaderSafeConnectorMetadata.java:417)
2024-05-08 09:17:45 at io.trino.tracing.TracingConnectorMetadata.createSchema(TracingConnectorMetadata.java:348)
2024-05-08 09:17:45 at io.trino.metadata.MetadataManager.createSchema(MetadataManager.java:769)
2024-05-08 09:17:45 at io.trino.tracing.TracingMetadata.createSchema(TracingMetadata.java:373)
2024-05-08 09:17:45 at io.trino.execution.CreateSchemaTask.internalExecute(CreateSchemaTask.java:128)
2024-05-08 09:17:45 at io.trino.execution.CreateSchemaTask.execute(CreateSchemaTask.java:82)
2024-05-08 09:17:45 at io.trino.execution.CreateSchemaTask.execute(CreateSchemaTask.java:54)
2024-05-08 09:17:45 at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:146)
2024-05-08 09:17:45 at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:272)
2024-05-08 09:17:45 at io.trino.dispatcher.LocalDispatchQuery.startExecution(LocalDispatchQuery.java:145)
2024-05-08 09:17:45 at io.trino.dispatcher.LocalDispatchQuery.lambda$waitForMinimumWorkers$2(LocalDispatchQuery.java:129)
2024-05-08 09:17:45 at io.airlift.concurrent.MoreFutures.lambda$addSuccessCallback$12(MoreFutures.java:570)
2024-05-08 09:17:45 at io.airlift.concurrent.MoreFutures$3.onSuccess(MoreFutures.java:545)
2024-05-08 09:17:45 at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
2024-05-08 09:17:45 at io.trino.$gen.Trino_445____20240508_071639_2.run(Unknown Source)
2024-05-08 09:17:45 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2024-05-08 09:17:45 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
2024-05-08 09:17:45 at java.base/java.lang.Thread.run(Thread.java:1570)
2024-05-08 09:17:45 Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "alluxio"
2024-05-08 09:17:45 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3553)
2024-05-08 09:17:45 at io.trino.hdfs.TrinoFileSystemCache.createFileSystem(TrinoFileSystemCache.java:155)
2024-05-08 09:17:45 at io.trino.hdfs.TrinoFileSystemCache$FileSystemHolder.createFileSystemOnce(TrinoFileSystemCache.java:298)
2024-05-08 09:17:45 at io.trino.hdfs.TrinoFileSystemCache.getInternal(TrinoFileSystemCache.java:140)
2024-05-08 09:17:45 at io.trino.hdfs.TrinoFileSystemCache.get(TrinoFileSystemCache.java:91)
2024-05-08 09:17:45 at org.apache.hadoop.fs.ForwardingFileSystemCache.get(ForwardingFileSystemCache.java:39)
2024-05-08 09:17:45 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557)
2024-05-08 09:17:45 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
2024-05-08 09:17:45 at io.trino.hdfs.HdfsEnvironment.lambda$getFileSystem$0(HdfsEnvironment.java:110)
2024-05-08 09:17:45 at io.trino.hdfs.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
2024-05-08 09:17:45 at io.trino.hdfs.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:109)
2024-05-08 09:17:45 at io.trino.hdfs.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:102)
2024-05-08 09:17:45 at io.trino.filesystem.hdfs.HdfsFileSystem.directoryExists(HdfsFileSystem.java:254)
2024-05-08 09:17:45 at io.trino.filesystem.manager.SwitchingFileSystem.directoryExists(SwitchingFileSystem.java:117)
2024-05-08 09:17:45 at io.trino.filesystem.cache.CacheFileSystem.directoryExists(CacheFileSystem.java:104)
2024-05-08 09:17:45 at io.trino.filesystem.tracing.TracingFileSystem.lambda$directoryExists$5(TracingFileSystem.java:119)
2024-05-08 09:17:45 at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
2024-05-08 09:17:45 at io.trino.filesystem.tracing.TracingFileSystem.directoryExists(TracingFileSystem.java:119)
2024-05-08 09:17:45 at io.trino.plugin.hive.HiveMetadata.lambda$createSchema$22(HiveMetadata.java:951)
2024-05-08 09:17:45 ... 20 more
2024-05-08 09:17:45
2024-05-08 09:17:45
2024-05-08 09:17:45 2024-05-08T07:17:45.225Z INFO dispatcher-query-1 io.trino.event.QueryMonitor TIMELINE: Query 20240508_071745_00002_vyeug :: FAILED (INVALID_SCHEMA_PROPERTY) :: elapsed 12ms :: planning 12ms :: waiting 0ms :: scheduling 0ms :: running 0ms :: finishing 0ms :: begin 2024-05-08T07:17:45.211Z :: end 2024-05-08T07:17:45.223Z
2024-05-08 09:17:45 2024-05-08T07:17:45.430Z DEBUG http-client-node-manager-63 io.trino.connector.CatalogPruneTask Pruned catalogs on server: http://172.21.0.14:8080/v1/task/pruneCatalogs
The alluxio jars, I have even added more in a trial and error one by one, are included in both of Trino and Hive metastore containers:
[trino@4515602f2e82 hive]$ pwd
/usr/lib/trino/plugin/hive
[trino@4515602f2e82 hive]$ ls -ll | grep allu
-rwxrwxrwx 1 root root 0 May 8 06:48 alluxio-2.9.3-client.jar
-rwxrwxrwx 1 root root 90338926 Mar 24 2023 alluxio-2.9.3-hadoop2-client.jar
-rw-r--r-- 4 trino trino 519152 Jan 1 1980 alluxio-core-client-fs-312.jar
-rw-r--r-- 4 trino trino 1499627 Jan 1 1980 alluxio-core-common-312.jar
-rw-r--r-- 4 trino trino 6097283 Jan 1 1980 alluxio-core-transport-312.jar
-rwxrwxrwx 1 root root 90338925 May 8 07:06 alluxio-shaded-client-2.9.3.jar
-rw-r--r-- 4 trino trino 34723 Jan 1 1980 trino-filesystem-cache-alluxio-445.jar
And
I would suggest to NOT mix spilling and file system caching for starters, they are not designed to work together. Beyond that I would say that it completely depends on what your queries are, what data they have to access, how your files and partitions are structured and how you define "expected". So I really cant answer with any more detail at this stage.