erlangyaws

Erlang: Functions work in shell but not in YAWS


My sole method of debugging (io:format/2) is not working in YAWS. I'm at a loss. My supervisor starts three processes: ETS Manager, YAWS Init, and Ratelimiter. This is successful. I can play around with the rate limiter in the shell... calling the same functions YAWS should be. The difference being the shell behaves as I would expect and I have no idea what is happening in YAWS.

I do know if I spam the command in shell: ratelimiter:limit(IP) it will return true eventually. I can execute the following and it will also return true: ratelimiter:lockout(IP), ratelimiter:blacklist(IP). The limiter is a gen_server.

The functions do the following:

In my arg_rewrite_mod module I'm doing some checks to ensure I'm getting the HTTP requests I expect, namely GET, POST, and HEAD. I thought this would be a nice place to also do the rate limiting. Do it as soon as possible in the web server's chain of events.

All the changes I've made to the arg_rewrite module seem to work except using "printf"s and the limiter. I'm new to the language so I'm not sure my mistake is obvious or not.

Skeleton of my arg_rewrite_mod:

-module(arg_preproc).
-export([arg_rewrite/1]).

-include("limiter_def.hrl").
-include_lib("/usr/lib/yaws/include/yaws_api.hrl").


is_blacklisted(ID) ->
    case ratelimiter:blacklist(ID) of
    false ->    continue;
    true ->     throw(blacklist)
    end.

is_limited(ID) ->
    case ratelimiter:limit(ID) of
    false ->    continue;
    true ->     throw(limit)
    end.


arg_rewrite(A) ->
    Allow = ['GET','POST', 'HEAD'],

    try
        {IP, _} = A#arg.client_ip_port,

        ID = IP,
        is_blacklisted(ID),

io:format("~p ~p ~n",[ID, is_blacklisted(ID)]),         

        %% === Allow expected HTTP requests
        HttpReq = (A#arg.req)#http_request.method,

        case lists:member(HttpReq, Allow) of
            true ->
                {_,ReqTgt} = (A#arg.req)#http_request.path,
                PassThru = [".css",".jpg",".jpeg",".png",".js"],
                %% ... much more ...
            false ->
                is_limited(ID),
                throw(http_method_denied)
        end
    catch
        throw:blacklist -> %% Send back a 429;
        throw:limit -> %% Same but no Retry-After;
        throw:http_method_denied ->
        %%Only thrown experienced
            AllowedReq = string:join([atom_to_list(M) || M <- Allow], ","),
            A#arg{state=#rewrite_response{status=405,
                        headers=[{header, {"Allow", AllowedReq}},{header, {connection, "close"}}]
            }};
        Type:Reason -> {error, {unhandled,{Type, Reason}}}
    end.

I can spam curl -I -X HEAD <<any page>> as fast as I can in a bash shell and all I get is HTTP 200. The ETS table has zero entries as well. Using PUT I get a HTTP 405 as intended. I can ratelimiter:lockout({MY_IP}) and get the web page to load in my browser and a HTTP 200 with curl.

I'm confused. Is it the way I started YAWS?

start() ->
    os:putenv("YAWSHOME", ?HOMEPATH_YAWS),
    code:add_patha(?MODPATH_YAWS),

    ok = case (R = application:start(yaws)) of
        {error, {already_started, _}} -> ok;
        _ -> R
    end,

    {ok,self()}. %% Tell supervisor everything okay in a manner it expects.

I did this because I thought it would be "easier."


Solution

  • When starting Yaws as part of another application, it's important to use its embedding support. One important thing the Yaws embedding startup code does is set the application environment variable embedded to true:

    application:set_env(yaws, embedded, true),
    

    Yaws checks this variable in several of its code paths, especially during initialization, in order to avoid assuming that it's running as a stand-alone daemon process.

    Regarding rate limiting, rather than using an arg rewriter, you might consider using a shaper. The yaws_shaper module provides a behavior that expects its callback module to implement two functions:

    A shaper can use this framework to track how many requests each client is making and how much data Yaws is delivering to each client, and use that information to limit or deny particular clients.

    And finally, while "printf debugging" works, it's less than ideal especially in Erlang, which has built-in tracing. You should consider learning the dbg module so you can trace any function you want, see who called it, see what arguments are being passed to it, see what it returns, etc.