javascalaapache-sparklogbackapache-spark-standalone

Logging using Logback on Spark StandAlone


We are using Spark StandAlone 2.3.2 and logback-core/logback-classic with 1.2.3

Have very simple Logback configuration file which allows us to log the data to a specific directory and on local I can pass the vm parameters from editor

-Dlogback.configurationFile="C:\path\logback-local.xml"

and it works and logs properly

On Spark StandAlone I am trying to pass the arguments using external link

spark-submit
  --master spark://127.0.0.1:7077
  --driver-java-options "-Dlog4j.configuration=file:/path/logback.xml"
  --conf "spark.executor.extraJavaOptions=-Dlogback.configurationFile=file:/path/logback.xml"

Here is the config file (bit ansibilized), have verified the actual paths and they exist, any idea what could be the issue on the cluster. I have verified the Environment variables on Spark UI and they reflect the same for drvier and executor options.

Any potential issues with Logback and Spark StandAlone together?

There is nothing specific to configuration file here, it just filters the data for json logging vs file for better visualization on log server

<configuration>
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>{{ app_log_file_path }}</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <!--daily-->
            <fileNamePattern>{{ app_log_dir }}/{{ app_name }}.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
            <maxFileSize>100MB</maxFileSize>
            <maxHistory>90</maxHistory>
            <totalSizeCap>10GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>%d [%thread] %-5level %logger{36} %X{user} - %msg%n</pattern>
        </encoder>
    </appender>
    <appender name="FILE_JSON" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <filter class="ch.qos.logback.core.filter.EvaluatorFilter">
            <evaluator>
                <expression>
                    return message.contains("timeStamp") &amp;&amp;
                    message.contains("logLevel") &amp;&amp;
                    message.contains("sourceLocation") &amp;&amp;
                    message.contains("exception");
                </expression>
            </evaluator>
            <OnMismatch>DENY</OnMismatch>
            <OnMatch>NEUTRAL</OnMatch>
        </filter>
        <file>{{ app_json_log_file_path }}</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <!--daily-->
            <fileNamePattern>{{ app_log_dir }}/{{ app_name }}_json.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
            <maxFileSize>100MB</maxFileSize>
            <maxHistory>90</maxHistory>
            <totalSizeCap>10GB</totalSizeCap>
        </rollingPolicy>
        <encoder>
            <pattern>%msg%n</pattern>
        </encoder>
    </appender>
    <logger name="com.baml.ctrltech.greensheet.logging.GSJsonLogging" level="info" additivity="false">
        <appender-ref ref="FILE_JSON" />
    </logger>
    <root level="INFO">
        <appender-ref ref="FILE" />
        <appender-ref ref="FILE_JSON"/>
    </root>
</configuration>

Solution

  • We couldn't get Logback to work with Spark, as Spark uses Log4J internally, we had to switch to the same