herokuloggingpapertrail-app

How to get python logs to show up in Heroku logging add-ons


I have some python code running in a Heroku project. Specifically, it's running scrapy. Scrapy generates logs using python's native logging capability (source). My code also has print statements as my own hacky logging.

When I run code either locally or via CLI bash'ing into my Heroku instance, I see logs as the code runs, such as:

2024-04-04 13:20:28 [scrapy.core.engine] INFO: Spider opened
2024-04-04 13:20:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-04-04 13:20:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-04-04 13:20:28 [root] INFO: Searches left - 3601
This could be a random log from a print() statement

I'm trying to connect my Heroku instance to one of their logging add-ons to be able to search through past logs better. I've installed both Mezmo and Papertrail. The same logs are flowing into both, but they only appear to be "system logs", like:

Apr 4 09:18:45 paymosaic app[api] notice Release v40 created by user logdna@addons.heroku.com
Apr 4 09:18:45 paymosaic app[api] notice Update LOGDNA by logdna by user logdna@addons.heroku.com

None of the scrapy logs are getting into the add-ons. Is there something I have to configure to get the code's logs to flow through?


Solution

  • This should more-or-less "just work".

    Heroku aggregates runtime logs, including things you print to stdout or stderr, into a single stream:

    A Heroku app’s logs come from the output streams of all of its running processes, system components, and backing services. Heroku’s Logplex routes log streams from all sources into a single channel, providing a foundation for comprehensive logging.

    That stream is what logging addons like Papertrail typically ingest.

    The reason this doesn't appear to work is that you're running your command interactively. Commands run via heroku run run on a one-off dyno. One distinguishing feature of one-off dynos is that they don't contribute to the log stream (emphasis added):

    One-off dynos run attached to your terminal, with a character-by-character TCP connection for STDIN and STDOUT. This allows you to use interactive processes like a console. Since STDOUT is going to your terminal, the only thing recorded in the app’s logs is the startup and shutdown of the dyno.

    If you run your scraper via the Heroku Scheduler (even though this is also technically a one-off dyno) you should see its output show up in your logs:

    Logs for scheduled jobs go into your logs as process scheduler.X:

    $ heroku logs --ps scheduler.1
    2011-02-04T14:10:16-08:00 heroku[scheduler.1]: State changed from created to starting
    2011-02-04T14:10:16-08:00 app[scheduler.1]: Starting process with command `bin/clean_sessions`
    2011-02-04T14:10:19-08:00 app[scheduler.1]: Deleting stale sessions...
    2011-02-04T14:10:27-08:00 app[scheduler.1]: done.
    2011-02-04T14:10:28-08:00 heroku[scheduler.1]: State changed from up to complete