apache-sparklog4j2spark-java

Apache Spark log4j2.properties file does not generate log files


I try to generate user-defined log file on spark 4.0.

OS : Windows 11
Spark : spark-4.0.0-preview2-bin-hadoop3

First I make log4j2.properties file on %SPARK_HOME%\conf folder.

rootLogger.level = info
rootLogger.appenderRef.stdout.ref = file

appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = JsonTemplateLayout
appender.console.layout.eventTemplateUri = classpath:org/apache/spark/SparkLayout.json

appender.file.type = File
appender.file.name = FileAppender
appender.file.fileName = file:///C:/spark-4.0.0-preview2-bin-hadoop3/logs/spark.log
appender.file.layout.type = PatternLayout
appender.file.layout.pattern = "%d{ISO8601} %-5level %logger{36} - %msg%n"
appender.file.append = true

And I also make spark.log file on %SPARK_HOME%\logs folder.

And belows are spark sample codes.

import java.util.Arrays;
import java.util.List;

import org.apache.log4j.Logger;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;

public class Main {
    private static final Logger logger = Logger.getLogger(Main.class);
    
    public static void main(String[] args) {
        List<Row> rows = Arrays.asList(
            RowFactory.create("father", 35),
            RowFactory.create("mother", 30),
            RowFactory.create("son", 15)
        );
            
        StructType schema = DataTypes.createStructType(
            new StructField[] { 
                DataTypes.createStructField("name", DataTypes.StringType, false),
                DataTypes.createStructField("age", DataTypes.IntegerType, false)
            }
        );

        SparkSession spark = SparkSession.builder()
            .appName("Test")
            .master("local[*]")
            .getOrCreate();
        logger.info("Spark Session built successfully");
        
        Dataset<Row> df = spark.createDataFrame(rows, schema);

        df.printSchema();
        df.show();
        logger.info("DataFrame shown successfully");
        
        spark.close();
        logger.info("Spark session stopped");
    }
}

The codes works successfully, but spark job does not generate any logs on spark.log file at all. Any idea?

== Updated parts ==

In spark-defaults.conf file,

spark.driver.host                  localhost
spark.yarn.jars                    file:///C:/spark-4.0.0-preview2-bin-hadoop3/jars/*.jar
 
spark.serializer                   org.apache.spark.serializer.KryoSerializer
spark.driver.memory                5g
spark.yarn.am.memory               1g
spark.executor.instances           1
spark.files                        file:///C:/spark-4.0.0-preview2-bin-hadoop3/conf/log4j2.properties

spark.executor.extraJavaOptions    -Dlog4j2.configuration=file:///C:/spark-4.0.0-preview2-bin-hadoop3/conf/log4j2.properties
spark.driver.extraJavaOptions      -Dlog4j2.configuration=file:///C:/spark-4.0.0-preview2-bin-hadoop3/conf/log4j2.properties

But the console show the following messages,

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties

I am afraid my log4j configuration does not work.


Solution

  • Are you sure you're running Spark in client mode and not in cluster mode? If it's cluster mode, the executors might not have access to the log4j2.properties file located on your local C://