I try to generate user-defined log file on spark 4.0.
OS : Windows 11
Spark : spark-4.0.0-preview2-bin-hadoop3
First I make log4j2.properties file on %SPARK_HOME%\conf folder.
rootLogger.level = info
rootLogger.appenderRef.stdout.ref = file
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = JsonTemplateLayout
appender.console.layout.eventTemplateUri = classpath:org/apache/spark/SparkLayout.json
appender.file.type = File
appender.file.name = FileAppender
appender.file.fileName = file:///C:/spark-4.0.0-preview2-bin-hadoop3/logs/spark.log
appender.file.layout.type = PatternLayout
appender.file.layout.pattern = "%d{ISO8601} %-5level %logger{36} - %msg%n"
appender.file.append = true
And I also make spark.log file on %SPARK_HOME%\logs folder.
And belows are spark sample codes.
import java.util.Arrays;
import java.util.List;
import org.apache.log4j.Logger;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
public class Main {
private static final Logger logger = Logger.getLogger(Main.class);
public static void main(String[] args) {
List<Row> rows = Arrays.asList(
RowFactory.create("father", 35),
RowFactory.create("mother", 30),
RowFactory.create("son", 15)
);
StructType schema = DataTypes.createStructType(
new StructField[] {
DataTypes.createStructField("name", DataTypes.StringType, false),
DataTypes.createStructField("age", DataTypes.IntegerType, false)
}
);
SparkSession spark = SparkSession.builder()
.appName("Test")
.master("local[*]")
.getOrCreate();
logger.info("Spark Session built successfully");
Dataset<Row> df = spark.createDataFrame(rows, schema);
df.printSchema();
df.show();
logger.info("DataFrame shown successfully");
spark.close();
logger.info("Spark session stopped");
}
}
The codes works successfully, but spark job does not generate any logs on spark.log file at all. Any idea?
== Updated parts ==
In spark-defaults.conf file,
spark.driver.host localhost
spark.yarn.jars file:///C:/spark-4.0.0-preview2-bin-hadoop3/jars/*.jar
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 5g
spark.yarn.am.memory 1g
spark.executor.instances 1
spark.files file:///C:/spark-4.0.0-preview2-bin-hadoop3/conf/log4j2.properties
spark.executor.extraJavaOptions -Dlog4j2.configuration=file:///C:/spark-4.0.0-preview2-bin-hadoop3/conf/log4j2.properties
spark.driver.extraJavaOptions -Dlog4j2.configuration=file:///C:/spark-4.0.0-preview2-bin-hadoop3/conf/log4j2.properties
But the console show the following messages,
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
I am afraid my log4j configuration does not work.
Are you sure you're running Spark in client mode and not in cluster mode? If it's cluster mode, the executors might not have access to the log4j2.properties file located on your local C://