node.jsexpressloggingbots

How to filter bots on Express.js server


I have created an express node.js API, and deployed it to AWS (Elasticbeanstalk with 2 EC2 instances). I am using the morgan-body package to log the requests and responses on my endpoints, but it seems that tons of bots are "attacking" my API, and this results in millions of logs every months, which cost me a fortune with datadog. I have used morgan-boday's built-in "skip" feature to filter requests based on the user agents, but new ones seem to appear every day. Is there a way to skip logging for all kinds of bots, without checking them one by one ? Here is my code, many thanks for your help ! :)

morganBody(app, {
skip: (req, res) => {
    if(req.get('user-agent')){
        if (req.get('user-agent').startsWith('ELB-HealthChecker') ||
        req.get('user-agent').startsWith('Mozilla') ||
        req.get('user-agent').startsWith('Mozlila')||
        req.get('user-agent').startsWith('Python')||
        req.get('user-agent').startsWith('python')||
        req.get('user-agent').startsWith('l9explore')||
        req.get('user-agent').startsWith('Go-http-client')
        
        ) {
            return true
        }
    }
    return false},
    logRequestBody:false,
    logResponseBody: false
});``` 

Solution

  • Welcome to internet. Bot/Spam detection is one of most trivial problem to solve. Every logic you add can be negated by reverse logic at the client side.

    AWS itself has a tool for it. https://aws.amazon.com/waf/features/bot-control/

    A good strategy to filter traffic will be based on use case.

    Some suggestions.

    1. introduce login/session allow only authenticated session
    2. request headers filtering
    3. Ip ranges filter
    4. Amount of traffic from single i.p.
    5. Request rate from different IP etc.
    6. Take service offline when not required.

    There should be more material available on internet.