Erlang: Functions work in shell but not in YAWS

My sole method of debugging (io:format/2) is not working in YAWS. I'm at a loss. My supervisor starts three processes: ETS Manager, YAWS Init, and Ratelimiter. This is successful. I can play around with the rate limiter in the shell... calling the same functions YAWS should be. The difference being the shell behaves as I would expect and I have no idea what is happening in YAWS.

I do know if I spam the command in shell: ratelimiter:limit(IP) it will return true eventually. I can execute the following and it will also return true: ratelimiter:lockout(IP), ratelimiter:blacklist(IP). The limiter is a gen_server.

The functions do the following:

limit/1: Check ETS table if counter > threshold; update counter. If counter > blacklist threshold make entry in mnesia table
blacklist/1: Check mnesia table if entry exists; Yes: reset timer
lockout/1: Immediately enters ID into mnesia table

In my arg_rewrite_mod module I'm doing some checks to ensure I'm getting the HTTP requests I expect, namely GET, POST, and HEAD. I thought this would be a nice place to also do the rate limiting. Do it as soon as possible in the web server's chain of events.

All the changes I've made to the arg_rewrite module seem to work except using "printf"s and the limiter. I'm new to the language so I'm not sure my mistake is obvious or not.

Skeleton of my arg_rewrite_mod:

-module(arg_preproc).
-export([arg_rewrite/1]).

-include("limiter_def.hrl").
-include_lib("/usr/lib/yaws/include/yaws_api.hrl").


is_blacklisted(ID) ->
    case ratelimiter:blacklist(ID) of
    false ->    continue;
    true ->     throw(blacklist)
    end.

is_limited(ID) ->
    case ratelimiter:limit(ID) of
    false ->    continue;
    true ->     throw(limit)
    end.


arg_rewrite(A) ->
    Allow = ['GET','POST', 'HEAD'],

    try
        {IP, _} = A#arg.client_ip_port,

        ID = IP,
        is_blacklisted(ID),

io:format("~p ~p ~n",[ID, is_blacklisted(ID)]),         

        %% === Allow expected HTTP requests
        HttpReq = (A#arg.req)#http_request.method,

        case lists:member(HttpReq, Allow) of
            true ->
                {_,ReqTgt} = (A#arg.req)#http_request.path,
                PassThru = [".css",".jpg",".jpeg",".png",".js"],
                %% ... much more ...
            false ->
                is_limited(ID),
                throw(http_method_denied)
        end
    catch
        throw:blacklist -> %% Send back a 429;
        throw:limit -> %% Same but no Retry-After;
        throw:http_method_denied ->
        %%Only thrown experienced
            AllowedReq = string:join([atom_to_list(M) || M <- Allow], ","),
            A#arg{state=#rewrite_response{status=405,
                        headers=[{header, {"Allow", AllowedReq}},{header, {connection, "close"}}]
            }};
        Type:Reason -> {error, {unhandled,{Type, Reason}}}
    end.

I can spam curl -I -X HEAD <<any page>> as fast as I can in a bash shell and all I get is HTTP 200. The ETS table has zero entries as well. Using PUT I get a HTTP 405 as intended. I can ratelimiter:lockout({MY_IP}) and get the web page to load in my browser and a HTTP 200 with curl.

I'm confused. Is it the way I started YAWS?

start() ->
    os:putenv("YAWSHOME", ?HOMEPATH_YAWS),
    code:add_patha(?MODPATH_YAWS),

    ok = case (R = application:start(yaws)) of
        {error, {already_started, _}} -> ok;
        _ -> R
    end,

    {ok,self()}. %% Tell supervisor everything okay in a manner it expects.

I did this because I thought it would be "easier."

Solution

When starting Yaws as part of another application, it's important to use its embedding support. One important thing the Yaws embedding startup code does is set the application environment variable embedded to true:

application:set_env(yaws, embedded, true),

Yaws checks this variable in several of its code paths, especially during initialization, in order to avoid assuming that it's running as a stand-alone daemon process.

Regarding rate limiting, rather than using an arg rewriter, you might consider using a shaper. The yaws_shaper module provides a behavior that expects its callback module to implement two functions:

check/1: yaws_shaper calls this to allow the callback module to decide whether to allow the request from the client. It passes client host information as the callback argument. Your shaper callback module returns either the atom allow to allow the request to proceed, or the tuple {deny, Status, Message} where Status is an HTTP status code to return to the client, such as 429 to indicate the client is making too many requests, and Message is any extra HTML to be returned to the client. (It might be nice if Message could include a reply header such as Retry-After as well; this is something I'll consider adding to Yaws.)
update/3: yaws_shaper calls this when the response for a client is ready to be returned. The first argument is the client host information, the second argument is the number of "hits" (the value 1 for each request), and the third argument is the number of bytes being delivered in response to the client's request. Your shaper callback module can return ok from update/3 (Yaws does not use the return value).

A shaper can use this framework to track how many requests each client is making and how much data Yaws is delivering to each client, and use that information to limit or deny particular clients.

And finally, while "printf debugging" works, it's less than ideal especially in Erlang, which has built-in tracing. You should consider learning the dbg module so you can trace any function you want, see who called it, see what arguments are being passed to it, see what it returns, etc.