Appendable files in JCR

I am going to run a process that may take minutes or even hours. To track history of such runs I create a node of custom type per run with relevant process metadata stored within. Additionaly I want to store log file under such node. This seems to be more consistent and convenient approach rather than storing log file on a disk, separately from process meta.

Now nt:file nodetype itself has a jcr:content subnode with jcr:data property that allows me to store binary content. This is fine for one-time or infrequent content change of file.

However I am going to continuously append new content to that file and, additionally to that, poll it's content in separate threads (to track progress).

The JCR API in face of javax.jcr.ValueFactory, javax.jcr.Binary doesn't really seem to support such approach, I'd rather be forced to override that file (or more accurately - binary property) over and over again every time I add a single line of log. I am concerned about performance.

I've searched documentation for tools that would allow me to open output stream for that file and periodically flush changes from that stream to JCR, but there seems to be nothing like that available.

So is there anything more clever than going with plain javax.jcr.ValueFactory and javax.jcr.Binary?

Solution

After considering a while all options I have here:

Keep logs in memory, save them to CRX every time user invokes info/warn/error. Pros: Logs stored at the same place as migration task metadata, easy to locate and access. Cons: Potentially the slowest and least resource effective of all approaches in case of large amount of log entries.
Keep logs in memory, save them to JCR only at the end of migration. Pros: Simple to refactor current solution, less pressure on CRX during migration process. Cons: Cannot track real time progress, potentiall loss of logs during unexpected error or instance shutdown.
Create a node of custom type for each log entry instead of log.txt. Aggregate logs in text file via special log servlet. i.e. /var/migration/uuid/log.txt or /var/migration/uuid/log.json. Pros: More JCR way to store such content. With custom node type and indexes should be fast enough to consider as an option. Has diversity to support text and json format of logs. Cons: Unclear performance comparison to current approach. Potential issues due to large amount of nodes located at the same level. User should be aware of log servlet existence, otherwise user cannot see them in convenient format. Log servlet performance is not clear in case of large amount of log entries.
Create log files on a file system (let's say at crx-quickstart/logs/migration/<uuid>.log), display it's content (if needed) through API, with ability to trunkate log API response to last 100-1000 lines. Pros: Classic and well-known log approach when log files stored on a file system. Sling provides configured binding slf4j to LogBack with all needed LogBack dependencies exported for usage in your custom bundles. Cons: Separation of logs and task metadata. User should be aware of log files location on a disk.

Starting with option 1 I then realized that log entries amount potentially can scale to hundreds of thousands - rare, but possible scenario. So finally decided to go with option 4.

In case anyone will face similar task I am posting here the implementation details for the option 4, since it is not that trivial as it might seem at first.

I am using AEM 6.2 (Felix-Jackrabbit-Sling under the hood) and I want every migration task run - which is essentially just a separate thread - to create it's own log file with a special name - a unique identifier of that migration process.

Now, Sling itself allows you to define multiple log configurations through org.apache.sling.commons.log.LogManager.factory.config OSGi configuration. However these log configs too simple for this case - you cannot create with it what is called in LogBack a SiftingAppender - special case of log appender that will instantiate appenders for particular loggers per thread, rather than once and application-wide - in other words you cannot instruct LogBack to create file per thread using OSGi configuration.

So logically thinking you would want to programmatically take Sling's LogBack configuration on a runtime (for example at the moment you've uploaded your custom bundle and activated it) and use it to configure such appender for specific loggers. Unfortunatelly, though there is a lot of documentation on how to configure LogBack through logback.xml, there are quite few docs that describe how to do it programmatically through LogBack's Java objects like ch.qos.logback.classic.LoggerContext, and like zero of them explain how to configure that way a SiftingAppender.

So after reading LogBack sources and tests I ended up with this helper class:

import ch.qos.logback.classic.LoggerContext;
import ch.qos.logback.classic.encoder.PatternLayoutEncoder;
import ch.qos.logback.classic.sift.MDCBasedDiscriminator;
import ch.qos.logback.classic.sift.SiftingAppender;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.core.Appender;
import ch.qos.logback.core.Context;
import ch.qos.logback.core.FileAppender;
import ch.qos.logback.core.joran.spi.JoranException;
import ch.qos.logback.core.sift.AppenderFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;

import java.util.Objects;

/**
 * This class dynamically adds configuration to AEM's LogBack logging implementation behind slf4j.
 * The point is to provide loggers bound to specific task ID and therefore specific log file, so
 * each migration task run will be written in it's standalone log file.
 * */
public class LogUtil {

    static {
        LoggerContext rootContext = (LoggerContext) LoggerFactory.getILoggerFactory();

        ch.qos.logback.classic.Logger logger = rootContext.getLogger("migration-logger");

        //since appender lives until AEM instance restarted
        //we are checking if appender had being registered previously 
        //to ensure we won't do it more than once
        if(logger.getAppender("MIGRATION-TASK-SIFT") == null) {
            MDCBasedDiscriminator mdcBasedDiscriminator = new MDCBasedDiscriminator();
            mdcBasedDiscriminator.setContext(rootContext);
            mdcBasedDiscriminator.setKey("taskId");
            mdcBasedDiscriminator.setDefaultValue("no-task-id");
            mdcBasedDiscriminator.start();

            SiftingAppender siftingAppender = new SiftingAppender();
            siftingAppender.setContext(rootContext);
            siftingAppender.setName("MIGRATION-TASK-SIFT");
            siftingAppender.setDiscriminator(mdcBasedDiscriminator);
            siftingAppender.setAppenderFactory(new FileAppenderFactory());
            siftingAppender.start();

            logger.setAdditive(false);
            logger.setLevel(ch.qos.logback.classic.Level.ALL);
            logger.addAppender(siftingAppender);
        }
    }

    public static class FileAppenderFactory implements AppenderFactory<ILoggingEvent> {

        @Override
        public Appender<ILoggingEvent> buildAppender(Context context, String taskId) throws JoranException {
            PatternLayoutEncoder logEncoder = new PatternLayoutEncoder();
            logEncoder.setContext(context);
            logEncoder.setPattern("%-12date{YYYY-MM-dd HH:mm:ss.SSS} %-5level - %msg%n");
            logEncoder.start();

            FileAppender<ILoggingEvent> appender = new FileAppender<>();
            appender.setContext(context);
            appender.setName("migration-log-file");
            appender.setFile("crx-quickstart/logs/migration/task-" + taskId + ".log");
            appender.setEncoder(logEncoder);
            appender.setAppend(true);
            appender.start();

            //need to add cleanup configuration for old logs ?

            return appender;
        }
    }

    private LogUtil(){
    }

    public static Logger getTaskLogger(String taskId) {
        Objects.requireNonNull(taskId);
        MDC.put("taskId", taskId);
        return LoggerFactory.getLogger("migration-logger");
    }

    public static void releaseTaskLogger() {
        MDC.remove("taskId");
    }
}

The part to pay attention is that SiftingAppender requires you to implement AppenderFactory interface, that will produce configured appenders for logger per thread to work with.

Now you can obtain logger via:

LogUtil.getTaskLogger("some-task-uuid")

And use it to create log files like crq-quickstart/logs/migration/task-<taskId>.log per provided taskId

According to the docs you are also required to release such logger once you've done with it

LogUtil.releaseTaskLogger()

And pretty much that's it.