I would appreciate your help with my problem.
I'm running a spark application on AWS EMR serverless with emr 6.11 release. I'm using Spark 3.3.2 with java 17, with configuration: maximum recourses of 200 vCPU and 1600gb memory. My application is structured to read records from s3 and saving into a broadcast variable, Then have reading some other records from s3 and iterating over them with parallelize rdd and checking them against the first records, and saving them at last. To simplify it is something like this :
Dataset<Row> groupA = readAllRecords();
groupA = manipulateData(groupA);
Dataset<Row> groupB = readAllRecords();
groupB.parallelize( recordB -> {
checkRecord(groupA, recordB)
saveRecord(record)
});
My problem is I don't see in any stage that spark is scaling up any executors, although I have heavy workloads.
This is my dashboard : Server UI history while running local:
Is it possible it is all running on the driver ? what it is i don't see ?
Please let me know if more information is needed. I attached some of the log at start of the app. I noticed this error :
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.storage.BlockManagerId.executorId()" because "idWithoutTopologyInfo" is null
but it is ignored and i'm not sure it's related.
3.3.2-amzn-0 23/07/05 11:17:44 INFO ResourceUtils: ============================================================== 23/07/05 11:17:44 INFO ResourceUtils: No custom resources configured for spark.driver. 23/07/05 11:17:44 INFO ResourceUtils: ============================================================== 23/07/05 11:17:44 INFO SparkContext: Submitted application: FusionApp 23/07/05 11:17:44 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 14336, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/07/05 11:17:44 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 23/07/05 11:17:44 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/07/05 11:17:44 INFO SecurityManager: Changing view acls to: hadoop 23/07/05 11:17:44 INFO SecurityManager: Changing modify acls to: hadoop 23/07/05 11:17:44 INFO SecurityManager: Changing view acls groups to: 23/07/05 11:17:44 INFO SecurityManager: Changing modify acls groups to: 23/07/05 11:17:44 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 23/07/05 11:17:45 INFO Utils: Successfully started service 'sparkDriver' on port 41285. 23/07/05 11:17:45 INFO SparkEnv: Registering MapOutputTracker 23/07/05 11:17:45 INFO SparkEnv: Registering BlockManagerMaster 23/07/05 11:17:45 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/07/05 11:17:45 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/07/05 11:17:45 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/07/05 11:17:45 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-98a76adc-5862-4053-93a5-584fec796959 23/07/05 11:17:45 INFO MemoryStore: MemoryStore started with capacity 8.2 GiB 23/07/05 11:17:45 INFO SparkEnv: Registering OutputCommitCoordinator 23/07/05 11:17:45 INFO SubResultCacheManager: Sub-result caches are disabled. 23/07/05 11:17:45 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/07/05 11:17:45 INFO SparkContext: Added JAR s3://jobjars/fusion/fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar at s3://jobjars/fusion/fusion-job-1.0.0-29114-SNAPSHOT-jar-with-dependencies.jar with timestamp 1688555864852 23/07/05 11:17:45 INFO Executor: Starting executor ID driver on host ip-172-31.ec2.internal 23/07/05 11:17:45 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): 'file:/usr/lib/hadoop-lzo/lib/*,file:/usr/lib/hadoop/hadoop-aws.jar,file:/usr/share/aws/aws-java-sdk/*,file:/usr/share/aws/emr/emrfs/conf/,file:/usr/share/aws/emr/emrfs/lib/*,file:/usr/share/aws/emr/emrfs/auxlib/*,file:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/usr/share/aws/emr/goodies/lib/emr-serverless-spark-goodies.jar,file:/usr/share/aws/emr/security/conf,file:/usr/share/aws/emr/security/lib/*,file:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/docker/usr/lib/hadoop-lzo/lib/*,file:/docker/usr/lib/hadoop/hadoop-aws.jar,file:/docker/usr/share/aws/aws-java-sdk/*,file:/docker/usr/share/aws/emr/emrfs/conf,file:/docker/usr/share/aws/emr/emrfs/lib/*,file:/docker/usr/share/aws/emr/emrfs/auxlib/*,file:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/docker/usr/share/aws/emr/security/conf,file:/docker/usr/share/aws/emr/security/lib/*,file:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar,file:/usr/share/aws/redshift/spark-redshift/lib/*,file:/home/hadoop/conf,file:/home/hadoop/emr-serverless-spark-goodies.jar,file:/home/hadoop/emr-spark-goodies.jar,file:/home/hadoop/*,file:/home/hadoop/aws-glue-datacatalog-spark-client.jar,file:/home/hadoop/hive-openx-serde.jar,file:/home/hadoop/sagemaker-spark-sdk.jar,file:/home/hadoop/hadoop-aws.jar,file:/home/hadoop/RedshiftJDBC.jar,file:/home/hadoop/emr-s3-select-spark-connector.jar' 23/07/05 11:17:45 INFO Executor: Fetching s3://jobjars-/fusion/yyyy-fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar with timestamp 1688555864852 23/07/05 11:17:45 INFO S3NativeFileSystem: Opening 's3://jobjars-29114/fusion/xxx-fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar' for reading 23/07/05 11:17:45 INFO Utils: Fetching s3://jobjars-/fusion/fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar to /tmp/spark-fd0b8589-fbd6-4b4a-b69f-98955f2f906f/userFiles-7edecef0-92cd-46c4-9ca8-bdabe9c64f86/fetchFileTemp15434886852656050499.tmp 23/07/05 11:17:57 INFO Executor: Told to re-register on heartbeat 23/07/05 11:17:57 INFO BlockManager: BlockManager null re-registering with master 23/07/05 11:17:57 INFO BlockManagerMaster: Registering BlockManager null 23/07/05 11:17:57 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:89) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:643) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1057) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:238) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?] at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2100) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?] Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.storage.BlockManagerId.executorId()" because "idWithoutTopologyInfo" is null at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:591) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:123) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] 3 more 23/07/05 11:17:57 ERROR Inbox: Ignoring error java.lang.NullPointerException: Cannot invoke "org.apache.spark.storage.BlockManagerId.executorId()" because "idWithoutTopologyInfo" is null at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:591) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:123) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?] 23/07/05 11:18:00 INFO Executor: Adding file:/tmp/spark-fd0b8589-fbd6-4b4a-b69f-98955f2f906f/userFiles-92cd-46c4-9ca8-/fusion-job-1.0.0-29114-SNAPSHOT-jar-with-dependencies.jar to class loader 23/07/05 11:18:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35657. 23/07/05 11:18:00 INFO NettyBlockTransferService: Server created on [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 23/07/05 11:18:00 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/07/05 11:18:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05 11:18:00 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 with 8.2 GiB RAM, BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05 11:18:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05 11:18:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05 11:18:00 INFO SingleEventLogFileWriter: Logging events to file:/var/log/spark/apps/local-1688555865619.inprogress 23/07/05 11:18:00 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 23/07/05 11:18:00 INFO SharedState: Warehouse path is 'file:/home/hadoop/spark-warehouse'. 23/07/05 11:18:01 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 72.0 B, free 8.2 GiB) 23/07/05 11:18:01 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 124.0 B, free 8.2 GiB) 23/07/05 11:18:01 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 124.0 B, free: 8.2 GiB) 23/07/05 11:18:01 INFO SparkContext: Created broadcast 0 from broadcast at Transformer.java:76 23/07/05 11:18:01 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 168.0 B, free 8.2 GiB) 23/07/05 11:18:01 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 304.0 B, free 8.2 GiB) 23/07/05 11:18:01 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 304.0 B, free: 8.2 GiB) 23/07/05 11:18:01 INFO SparkContext: Created broadcast 1 from broadcast at Transformer.java:80 23/07/05 11:18:02 INFO SparkContext: Starting job: collect at Transformer.java:123 23/07/05 11:18:02 INFO DAGScheduler: Got job 0 (collect at Transformer.java:123) with 52 output partitions 23/07/05 11:18:02 INFO DAGScheduler: Final stage: ResultStage 0 (collect at Transformer.java:123) 23/07/05 11:18:02 INFO DAGScheduler: Parents of final stage: List() 23/07/05 11:18:02 INFO DAGScheduler: Missing parents: List() 23/07/05 11:18:02 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at Transformer.java:122), which has no missing parents 23/07/05 11:18:02 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KiB, free 8.2 GiB) 23/07/05 11:18:02 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1899.0 B, free 8.2 GiB) 23/07/05 11:18:02 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 1899.0 B, free: 8.2 GiB) 23/07/05 11:18:02 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1570 23/07/05 11:18:02 INFO DAGScheduler: Submitting 52 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at Transformer.java:122) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/07/05 11:18:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 52 tasks resource profile 0 23/07/05 11:18:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (ip-172-31.ec2.internal, executor driver, partition 0, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (ip-172-31.ec2.internal, executor driver, partition 1, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) (ip-172-31.ec2.internal, executor driver, partition 2, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3) (ip-172-31.ec2.internal, executor driver, partition 3, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 23/07/05 11:18:02 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 23/07/05 11:18:02 INFO Executor: Running task 3.0 in stage 0.0 (TID 3) 23/07/05 11:18:02 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 23/07/05 11:18:02 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4) (ip-172-31.ec2.internal, executor driver, partition 4, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 4.0 in stage 0.0 (TID 4) 23/07/05 11:18:02 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5) (ip-172-31.ec2.internal, executor driver, partition 5, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 5.0 in stage 0.0 (TID 5) 23/07/05 11:18:02 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6) (ip-172-31.ec2.internal, executor driver, partition 6, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 6.0 in stage 0.0 (TID 6) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7) (ip-172-31.ec2.internal, executor driver, partition 7, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 7.0 in stage 0.0 (TID 7) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8) (ip-172-31.ec2.internal, executor driver, partition 8, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 8.0 in stage 0.0 (TID 8) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9) (ip-172-31.ec2.internal, executor driver, partition 9, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Running task 9.0 in stage 0.0 (TID 9) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10) (ip-172-31.ec2.internal, executor driver, partition 10, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 10.0 in stage 0.0 (TID 10) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 143 ms on ip-172-31.ec2.internal (executor driver) (1/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 122 ms on ip-172-31.ec2.internal (executor driver) (2/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 33 ms on ip-172-31.ec2.internal (executor driver) (3/52) 23/07/05 11:18:02 INFO Executor: Finished task 9.0 in stage 0.0 (TID 9). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 25 ms on ip-172-31.ec2.internal (executor driver) (4/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 124 ms on ip-172-31.ec2.internal (executor driver) (5/52) 23/07/05 11:18:02 INFO Executor: Finished task 10.0 in stage 0.0 (TID 10). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 8.0 in stage 0.0 (TID 8). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 140 ms on ip-172-31.ec2.internal (executor driver) (6/52) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11) (ip-172-31.ec2.internal, executor driver, partition 11, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 11.0 in stage 0.0 (TID 11) 23/07/05 11:18:02 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 34 ms on ip-172-31.ec2.internal (executor driver) (7/52) 23/07/05 11:18:02 INFO Executor: Finished task 11.0 in stage 0.0 (TID 11). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12) (ip-172-31.ec2.internal, executor driver, partition 12, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 30 ms on ip-172-31.ec2.internal (executor driver) (8/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 29 ms on ip-172-31.ec2.internal (executor driver) (9/52) 23/07/05 11:18:02 INFO Executor: Running task 12.0 in stage 0.0 (TID 12) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13) (ip-172-31.ec2.internal, executor driver, partition 13, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Finished task 12.0 in stage 0.0 (TID 12). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Running task 13.0 in stage 0.0 (TID 13) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 39 ms on ip-172-31.ec2.internal (executor driver) (10/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 42 ms on ip-172-31.ec2.internal (executor driver) (11/52) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14) (ip-172-31.ec2.internal, executor driver, partition 14, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Finished task 13.0 in stage 0.0 (TID 13). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Running task 14.0 in stage 0.0 (TID 14) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15) (ip-172-31.ec2.internal, executor driver, partition 15, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 15.0 in stage 0.0 (TID 15) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 21 ms on ip-172-31.ec2.internal (executor driver) (12/52) 23/07/05 11:18:02 INFO Executor: Finished task 14.0 in stage 0.0 (TID 14). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 15.0 in stage 0.0 (TID 15). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16) (ip-172-31.ec2.internal, executor driver, partition 16, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 24 ms on ip-172-31.ec2.internal (executor driver) (13/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 18 ms on ip-172-31.ec2.internal (executor driver) (14/52) 23/07/05 11:18:02 INFO Executor: Running task 16.0 in stage 0.0 (TID 16) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17) (ip-172-31.ec2.internal, executor driver, partition 17, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Finished task 16.0 in stage 0.0 (TID 16). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18) (ip-172-31.ec2.internal, executor driver, partition 18, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 18.0 in stage 0.0 (TID 18) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 20 ms on ip-172-31.ec2.internal (executor driver) (15/52) 23/07/05 11:18:02 INFO Executor: Finished task 18.0 in stage 0.0 (TID 18). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 22 ms on ip-172-31.ec2.internal (executor driver) (16/52) 23/07/05 11:18:02 INFO Executor: Running task 17.0 in stage 0.0 (TID 17) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19) (ip-172-31.ec2.internal, executor driver, partition 19, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 19.0 in stage 0.0 (TID 19) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 16 ms on ip-172-31.ec2.internal (executor driver) (17/52) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20) (ip-172-31.ec2.internal, executor driver, partition 20, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 20.0 in stage 0.0 (TID 20) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21) (ip-172-31.ec2.internal, executor driver, partition 21, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Finished task 17.0 in stage 0.0 (TID 17). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 19.0 in stage 0.0 (TID 19). 1453 bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Running task 21.0 in stage 0.0 (TID 21) 23/07/05 11:18:02 INFO TaskSetManager: Starting task 22.0 in stage 0.0 (TID 22) (ip-172-31.ec2.internal, executor driver, partition 22, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running task 22.0 in stage 0.0 (TID 22) ```
Any insight would be precious :) Thanks a lot!
I already was trying to add executors.instances and dynamicAllocation = true also tried Dataset recordsDF.repartition(100) for example
For anyone encounter the same problem, i found the problem was quite simple - it appears that 'spark.setMaster(local[*])' should only be used when you run locally and not to run on the cluster.