javaapache-sparkapache-spark-sqlamazon-emremr-serverless

Executors not seem to be created or scaling up on Spark Application on AWS EMR Serverless


I would appreciate your help with my problem.

I'm running a spark application on AWS EMR serverless with emr 6.11 release. I'm using Spark 3.3.2 with java 17, with configuration: maximum recourses of 200 vCPU and 1600gb memory. My application is structured to read records from s3 and saving into a broadcast variable, Then have reading some other records from s3 and iterating over them with parallelize rdd and checking them against the first records, and saving them at last. To simplify it is something like this :

Dataset<Row> groupA = readAllRecords();
groupA = manipulateData(groupA);
Dataset<Row> groupB = readAllRecords();

groupB.parallelize( recordB -> {
   checkRecord(groupA, recordB) 
   saveRecord(record)      
});

My problem is I don't see in any stage that spark is scaling up any executors, although I have heavy workloads.

This is my dashboard : cloudwatch dashboard Server UI history while running local: Server UI history

Is it possible it is all running on the driver ? what it is i don't see ?

Please let me know if more information is needed. I attached some of the log at start of the app. I noticed this error :

Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.storage.BlockManagerId.executorId()" because "idWithoutTopologyInfo" is null

but it is ignored and i'm not sure it's related.

3.3.2-amzn-0 23/07/05 11:17:44 INFO ResourceUtils: ============================================================== 23/07/05 11:17:44 INFO ResourceUtils: No custom resources configured
for spark.driver. 23/07/05 11:17:44 INFO ResourceUtils:
============================================================== 23/07/05 11:17:44 INFO SparkContext: Submitted application: FusionApp
23/07/05 11:17:44 INFO ResourceProfile: Default ResourceProfile
created, executor resources: Map(cores -> name: cores, amount: 4,
script: , vendor: , memory -> name: memory, amount: 14336, script: ,
vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ),
task resources: Map(cpus -> name: cpus, amount: 1.0) 23/07/05 11:17:44
INFO ResourceProfile: Limiting resource is cpus at 4 tasks per
executor 23/07/05 11:17:44 INFO ResourceProfileManager: Added
ResourceProfile id: 0 23/07/05 11:17:44 INFO SecurityManager: Changing
view acls to: hadoop 23/07/05 11:17:44 INFO SecurityManager: Changing
modify acls to: hadoop 23/07/05 11:17:44 INFO SecurityManager:
Changing view acls groups to:  23/07/05 11:17:44 INFO SecurityManager:
Changing modify acls groups to:  23/07/05 11:17:44 INFO
SecurityManager: SecurityManager: authentication enabled; ui acls
disabled; users  with view permissions: Set(hadoop); groups with view
permissions: Set(); users  with modify permissions: Set(hadoop);
groups with modify permissions: Set() 23/07/05 11:17:45 INFO Utils:
Successfully started service 'sparkDriver' on port 41285. 23/07/05
11:17:45 INFO SparkEnv: Registering MapOutputTracker 23/07/05 11:17:45
INFO SparkEnv: Registering BlockManagerMaster 23/07/05 11:17:45 INFO
BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information 23/07/05 11:17:45 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up 23/07/05 11:17:45 INFO SparkEnv:
Registering BlockManagerMasterHeartbeat 23/07/05 11:17:45 INFO
DiskBlockManager: Created local directory at
/tmp/blockmgr-98a76adc-5862-4053-93a5-584fec796959 23/07/05 11:17:45
INFO MemoryStore: MemoryStore started with capacity 8.2 GiB 23/07/05
11:17:45 INFO SparkEnv: Registering OutputCommitCoordinator 23/07/05
11:17:45 INFO SubResultCacheManager: Sub-result caches are disabled.
23/07/05 11:17:45 INFO Utils: Successfully started service 'SparkUI'
on port 4040. 23/07/05 11:17:45 INFO SparkContext: Added JAR
s3://jobjars/fusion/fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar
at
s3://jobjars/fusion/fusion-job-1.0.0-29114-SNAPSHOT-jar-with-dependencies.jar
with timestamp 1688555864852 23/07/05 11:17:45 INFO Executor: Starting
executor ID driver on host ip-172-31.ec2.internal 23/07/05 11:17:45
INFO Executor: Starting executor with user classpath
(userClassPathFirst = false):
'file:/usr/lib/hadoop-lzo/lib/*,file:/usr/lib/hadoop/hadoop-aws.jar,file:/usr/share/aws/aws-java-sdk/*,file:/usr/share/aws/emr/emrfs/conf/,file:/usr/share/aws/emr/emrfs/lib/*,file:/usr/share/aws/emr/emrfs/auxlib/*,file:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/usr/share/aws/emr/goodies/lib/emr-serverless-spark-goodies.jar,file:/usr/share/aws/emr/security/conf,file:/usr/share/aws/emr/security/lib/*,file:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/docker/usr/lib/hadoop-lzo/lib/*,file:/docker/usr/lib/hadoop/hadoop-aws.jar,file:/docker/usr/share/aws/aws-java-sdk/*,file:/docker/usr/share/aws/emr/emrfs/conf,file:/docker/usr/share/aws/emr/emrfs/lib/*,file:/docker/usr/share/aws/emr/emrfs/auxlib/*,file:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/docker/usr/share/aws/emr/security/conf,file:/docker/usr/share/aws/emr/security/lib/*,file:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar,file:/usr/share/aws/redshift/spark-redshift/lib/*,file:/home/hadoop/conf,file:/home/hadoop/emr-serverless-spark-goodies.jar,file:/home/hadoop/emr-spark-goodies.jar,file:/home/hadoop/*,file:/home/hadoop/aws-glue-datacatalog-spark-client.jar,file:/home/hadoop/hive-openx-serde.jar,file:/home/hadoop/sagemaker-spark-sdk.jar,file:/home/hadoop/hadoop-aws.jar,file:/home/hadoop/RedshiftJDBC.jar,file:/home/hadoop/emr-s3-select-spark-connector.jar'
23/07/05 11:17:45 INFO Executor: Fetching
s3://jobjars-/fusion/yyyy-fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar
with timestamp 1688555864852 23/07/05 11:17:45 INFO
S3NativeFileSystem: Opening
's3://jobjars-29114/fusion/xxx-fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar'
for reading 23/07/05 11:17:45 INFO Utils: Fetching
s3://jobjars-/fusion/fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar
to
/tmp/spark-fd0b8589-fbd6-4b4a-b69f-98955f2f906f/userFiles-7edecef0-92cd-46c4-9ca8-bdabe9c64f86/fetchFileTemp15434886852656050499.tmp
23/07/05 11:17:57 INFO Executor: Told to re-register on heartbeat
23/07/05 11:17:57 INFO BlockManager: BlockManager null re-registering
with master 23/07/05 11:17:57 INFO BlockManagerMaster: Registering
BlockManager null 23/07/05 11:17:57 WARN Executor: Issue communicating
with driver in heartbeater org.apache.spark.SparkException: Exception
thrown in awaitResult:    at
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:89)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:643)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1057)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:238)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
~[scala-library-2.12.15.jar:?]    at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2100)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]    at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
~[?:?]    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
~[?:?]    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]    at java.lang.Thread.run(Thread.java:833) ~[?:?] Caused by:
java.lang.NullPointerException: Cannot invoke
"org.apache.spark.storage.BlockManagerId.executorId()" because
"idWithoutTopologyInfo" is null   at
org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:591)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:123)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
~[?:?] 3 more 23/07/05 11:17:57 ERROR Inbox: Ignoring error
java.lang.NullPointerException: Cannot invoke
"org.apache.spark.storage.BlockManagerId.executorId()" because
"idWithoutTopologyInfo" is null   at
org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:591)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:123)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
~[?:?]    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
~[?:?]    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
~[?:?]    at java.lang.Thread.run(Thread.java:833) ~[?:?] 23/07/05
11:18:00 INFO Executor: Adding
file:/tmp/spark-fd0b8589-fbd6-4b4a-b69f-98955f2f906f/userFiles-92cd-46c4-9ca8-/fusion-job-1.0.0-29114-SNAPSHOT-jar-with-dependencies.jar
to class loader 23/07/05 11:18:00 INFO Utils: Successfully started
service 'org.apache.spark.network.netty.NettyBlockTransferService' on
port 35657. 23/07/05 11:18:00 INFO NettyBlockTransferService: Server
created on [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 23/07/05
11:18:00 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block
replication policy 23/07/05 11:18:00 INFO BlockManagerMaster:
Registering BlockManager BlockManagerId(driver,
[2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05
11:18:00 INFO BlockManagerMasterEndpoint: Registering block manager
[2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 with 8.2 GiB RAM,
BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab],
35657, None) 23/07/05 11:18:00 INFO BlockManagerMaster: Registered
BlockManager BlockManagerId(driver,
[2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05
11:18:00 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab],
35657, None) 23/07/05 11:18:00 INFO SingleEventLogFileWriter: Logging
events to file:/var/log/spark/apps/local-1688555865619.inprogress
23/07/05 11:18:00 INFO SharedState: Setting
hive.metastore.warehouse.dir ('null') to the value of
spark.sql.warehouse.dir. 23/07/05 11:18:00 INFO SharedState: Warehouse
path is 'file:/home/hadoop/spark-warehouse'. 23/07/05 11:18:01 INFO
MemoryStore: Block broadcast_0 stored as values in memory (estimated
size 72.0 B, free 8.2 GiB) 23/07/05 11:18:01 INFO MemoryStore: Block
broadcast_0_piece0 stored as bytes in memory (estimated size 124.0 B,
free 8.2 GiB) 23/07/05 11:18:01 INFO BlockManagerInfo: Added
broadcast_0_piece0 in memory on
[2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 124.0 B, free:
8.2 GiB) 23/07/05 11:18:01 INFO SparkContext: Created broadcast 0 from broadcast at Transformer.java:76 23/07/05 11:18:01 INFO MemoryStore:
Block broadcast_1 stored as values in memory (estimated size 168.0 B,
free 8.2 GiB) 23/07/05 11:18:01 INFO MemoryStore: Block
broadcast_1_piece0 stored as bytes in memory (estimated size 304.0 B,
free 8.2 GiB) 23/07/05 11:18:01 INFO BlockManagerInfo: Added
broadcast_1_piece0 in memory on
[2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 304.0 B, free:
8.2 GiB) 23/07/05 11:18:01 INFO SparkContext: Created broadcast 1 from broadcast at Transformer.java:80 23/07/05 11:18:02 INFO SparkContext:
Starting job: collect at Transformer.java:123 23/07/05 11:18:02 INFO
DAGScheduler: Got job 0 (collect at Transformer.java:123) with 52
output partitions 23/07/05 11:18:02 INFO DAGScheduler: Final stage:
ResultStage 0 (collect at Transformer.java:123) 23/07/05 11:18:02 INFO
DAGScheduler: Parents of final stage: List() 23/07/05 11:18:02 INFO
DAGScheduler: Missing parents: List() 23/07/05 11:18:02 INFO
DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at
parallelize at Transformer.java:122), which has no missing parents
23/07/05 11:18:02 INFO MemoryStore: Block broadcast_2 stored as values
in memory (estimated size 3.2 KiB, free 8.2 GiB) 23/07/05 11:18:02
INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory
(estimated size 1899.0 B, free 8.2 GiB) 23/07/05 11:18:02 INFO
BlockManagerInfo: Added broadcast_2_piece0 in memory on
[2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 1899.0 B, free:
8.2 GiB) 23/07/05 11:18:02 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1570 23/07/05 11:18:02 INFO
DAGScheduler: Submitting 52 missing tasks from ResultStage 0
(ParallelCollectionRDD[0] at parallelize at Transformer.java:122)
(first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14)) 23/07/05 11:18:02 INFO TaskSchedulerImpl:
Adding task set 0.0 with 52 tasks resource profile 0 23/07/05 11:18:02
INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0)
(ip-172-31.ec2.internal, executor driver, partition 0, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1)
(ip-172-31.ec2.internal, executor driver, partition 1, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2)
(ip-172-31.ec2.internal, executor driver, partition 2, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3)
(ip-172-31.ec2.internal, executor driver, partition 3, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Running task 1.0 in stage 0.0 (TID 1) 23/07/05 11:18:02 INFO
Executor: Running task 2.0 in stage 0.0 (TID 2) 23/07/05 11:18:02 INFO
Executor: Running task 3.0 in stage 0.0 (TID 3) 23/07/05 11:18:02 INFO
Executor: Running task 0.0 in stage 0.0 (TID 0) 23/07/05 11:18:02 INFO
Executor: Finished task 2.0 in stage 0.0 (TID 2). 1496 bytes result
sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 3.0 in
stage 0.0 (TID 3). 1496 bytes result sent to driver 23/07/05 11:18:02
INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1496 bytes
result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting
task 4.0 in stage 0.0 (TID 4) (ip-172-31.ec2.internal, executor
driver, partition 4, PROCESS_LOCAL, 5097 bytes)
taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running
task 4.0 in stage 0.0 (TID 4) 23/07/05 11:18:02 INFO Executor:
Finished task 1.0 in stage 0.0 (TID 1). 1496 bytes result sent to
driver 23/07/05 11:18:02 INFO Executor: Finished task 4.0 in stage 0.0
(TID 4). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO
TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5)
(ip-172-31.ec2.internal, executor driver, partition 5, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Running task 5.0 in stage 0.0 (TID 5) 23/07/05 11:18:02 INFO
Executor: Finished task 5.0 in stage 0.0 (TID 5). 1496 bytes result
sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task
6.0 in stage 0.0 (TID 6) (ip-172-31.ec2.internal, executor driver, partition 6, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map()
23/07/05 11:18:02 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
23/07/05 11:18:02 INFO TaskSetManager: Starting task 7.0 in stage 0.0
(TID 7) (ip-172-31.ec2.internal, executor driver, partition 7,
PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05
11:18:02 INFO Executor: Running task 7.0 in stage 0.0 (TID 7) 23/07/05
11:18:02 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8)
(ip-172-31.ec2.internal, executor driver, partition 8, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Running task 8.0 in stage 0.0 (TID 8) 23/07/05 11:18:02 INFO
TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9)
(ip-172-31.ec2.internal, executor driver, partition 9, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Finished task 6.0 in stage 0.0 (TID 6). 1496 bytes result
sent to driver 23/07/05 11:18:02 INFO Executor: Running task 9.0 in
stage 0.0 (TID 9) 23/07/05 11:18:02 INFO TaskSetManager: Starting task
10.0 in stage 0.0 (TID 10) (ip-172-31.ec2.internal, executor driver, partition 10, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map()
23/07/05 11:18:02 INFO Executor: Running task 10.0 in stage 0.0 (TID
10) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 0.0 in stage
0.0 (TID 0) in 143 ms on ip-172-31.ec2.internal (executor driver) (1/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 3.0 in
stage 0.0 (TID 3) in 122 ms on ip-172-31.ec2.internal (executor
driver) (2/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task
4.0 in stage 0.0 (TID 4) in 33 ms on ip-172-31.ec2.internal (executor driver) (3/52) 23/07/05 11:18:02 INFO Executor: Finished task 9.0 in
stage 0.0 (TID 9). 1496 bytes result sent to driver 23/07/05 11:18:02
INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 25 ms
on ip-172-31.ec2.internal (executor driver) (4/52) 23/07/05 11:18:02
INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 124 ms
on ip-172-31.ec2.internal (executor driver) (5/52) 23/07/05 11:18:02
INFO Executor: Finished task 10.0 in stage 0.0 (TID 10). 1496 bytes
result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task
8.0 in stage 0.0 (TID 8). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1)
in 140 ms on ip-172-31.ec2.internal (executor driver) (6/52) 23/07/05
11:18:02 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11)
(ip-172-31.ec2.internal, executor driver, partition 11, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Running task 11.0 in stage 0.0 (TID 11) 23/07/05 11:18:02
INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1496 bytes
result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished
task 6.0 in stage 0.0 (TID 6) in 34 ms on ip-172-31.ec2.internal
(executor driver) (7/52) 23/07/05 11:18:02 INFO Executor: Finished
task 11.0 in stage 0.0 (TID 11). 1496 bytes result sent to driver
23/07/05 11:18:02 INFO TaskSetManager: Starting task 12.0 in stage 0.0
(TID 12) (ip-172-31.ec2.internal, executor driver, partition 12,
PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05
11:18:02 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9)
in 30 ms on ip-172-31.ec2.internal (executor driver) (8/52) 23/07/05
11:18:02 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10)
in 29 ms on ip-172-31.ec2.internal (executor driver) (9/52) 23/07/05
11:18:02 INFO Executor: Running task 12.0 in stage 0.0 (TID 12)
23/07/05 11:18:02 INFO TaskSetManager: Starting task 13.0 in stage 0.0
(TID 13) (ip-172-31.ec2.internal, executor driver, partition 13,
PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05
11:18:02 INFO Executor: Finished task 12.0 in stage 0.0 (TID 12). 1496
bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Running
task 13.0 in stage 0.0 (TID 13) 23/07/05 11:18:02 INFO TaskSetManager:
Finished task 8.0 in stage 0.0 (TID 8) in 39 ms on
ip-172-31.ec2.internal (executor driver) (10/52) 23/07/05 11:18:02
INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 42 ms
on ip-172-31.ec2.internal (executor driver) (11/52) 23/07/05 11:18:02
INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14)
(ip-172-31.ec2.internal, executor driver, partition 14, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Finished task 13.0 in stage 0.0 (TID 13). 1496 bytes result
sent to driver 23/07/05 11:18:02 INFO Executor: Running task 14.0 in
stage 0.0 (TID 14) 23/07/05 11:18:02 INFO TaskSetManager: Starting
task 15.0 in stage 0.0 (TID 15) (ip-172-31.ec2.internal, executor
driver, partition 15, PROCESS_LOCAL, 5097 bytes)
taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running
task 15.0 in stage 0.0 (TID 15) 23/07/05 11:18:02 INFO TaskSetManager:
Finished task 11.0 in stage 0.0 (TID 11) in 21 ms on
ip-172-31.ec2.internal (executor driver) (12/52) 23/07/05 11:18:02
INFO Executor: Finished task 14.0 in stage 0.0 (TID 14). 1496 bytes
result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task
15.0 in stage 0.0 (TID 15). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16)
(ip-172-31.ec2.internal, executor driver, partition 16, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 24 ms on
ip-172-31.ec2.internal (executor driver) (13/52) 23/07/05 11:18:02
INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 18 ms
on ip-172-31.ec2.internal (executor driver) (14/52) 23/07/05 11:18:02
INFO Executor: Running task 16.0 in stage 0.0 (TID 16) 23/07/05
11:18:02 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17)
(ip-172-31.ec2.internal, executor driver, partition 17, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Finished task 16.0 in stage 0.0 (TID 16). 1496 bytes result
sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task
18.0 in stage 0.0 (TID 18) (ip-172-31.ec2.internal, executor driver, partition 18, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map()
23/07/05 11:18:02 INFO Executor: Running task 18.0 in stage 0.0 (TID
18) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 14.0 in stage
0.0 (TID 14) in 20 ms on ip-172-31.ec2.internal (executor driver) (15/52) 23/07/05 11:18:02 INFO Executor: Finished task 18.0 in stage
0.0 (TID 18). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 22 ms on
ip-172-31.ec2.internal (executor driver) (16/52) 23/07/05 11:18:02
INFO Executor: Running task 17.0 in stage 0.0 (TID 17) 23/07/05
11:18:02 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19)
(ip-172-31.ec2.internal, executor driver, partition 19, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Running task 19.0 in stage 0.0 (TID 19) 23/07/05 11:18:02
INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 16 ms
on ip-172-31.ec2.internal (executor driver) (17/52) 23/07/05 11:18:02
INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20)
(ip-172-31.ec2.internal, executor driver, partition 20, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Running task 20.0 in stage 0.0 (TID 20) 23/07/05 11:18:02
INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21)
(ip-172-31.ec2.internal, executor driver, partition 21, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Finished task 17.0 in stage 0.0 (TID 17). 1496 bytes result
sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 19.0 in
stage 0.0 (TID 19). 1453 bytes result sent to driver 23/07/05 11:18:02
INFO Executor: Running task 21.0 in stage 0.0 (TID 21) 23/07/05
11:18:02 INFO TaskSetManager: Starting task 22.0 in stage 0.0 (TID 22)
(ip-172-31.ec2.internal, executor driver, partition 22, PROCESS_LOCAL,
5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
Executor: Running task 22.0 in stage 0.0 (TID 22) ```

Any insight would be precious :) Thanks a lot!

I already was trying to add executors.instances and dynamicAllocation = true also tried Dataset recordsDF.repartition(100) for example


Solution

  • For anyone encounter the same problem, i found the problem was quite simple - it appears that 'spark.setMaster(local[*])' should only be used when you run locally and not to run on the cluster.