My sole method of debugging (io:format/2
) is not working in YAWS. I'm at a loss. My supervisor starts three processes: ETS Manager, YAWS Init, and Ratelimiter. This is successful. I can play around with the rate limiter in the shell... calling the same functions YAWS should be. The difference being the shell behaves as I would expect and I have no idea what is happening in YAWS.
I do know if I spam the command in shell: ratelimiter:limit(IP)
it will return true
eventually. I can execute the following and it will also return true
: ratelimiter:lockout(IP), ratelimiter:blacklist(IP)
. The limiter is a gen_server
.
The functions do the following:
limit/1
: Check ETS table if counter > threshold; update counter. If counter > blacklist threshold make entry in mnesia tableblacklist/1
: Check mnesia table if entry exists; Yes: reset timerlockout/1
: Immediately enters ID into mnesia tableIn my arg_rewrite_mod
module I'm doing some checks to ensure I'm getting the HTTP requests I expect, namely GET, POST, and HEAD. I thought this would be a nice place to also do the rate limiting. Do it as soon as possible in the web server's chain of events.
All the changes I've made to the arg_rewrite
module seem to work except using "printf"s and the limiter. I'm new to the language so I'm not sure my mistake is obvious or not.
Skeleton of my arg_rewrite_mod
:
-module(arg_preproc).
-export([arg_rewrite/1]).
-include("limiter_def.hrl").
-include_lib("/usr/lib/yaws/include/yaws_api.hrl").
is_blacklisted(ID) ->
case ratelimiter:blacklist(ID) of
false -> continue;
true -> throw(blacklist)
end.
is_limited(ID) ->
case ratelimiter:limit(ID) of
false -> continue;
true -> throw(limit)
end.
arg_rewrite(A) ->
Allow = ['GET','POST', 'HEAD'],
try
{IP, _} = A#arg.client_ip_port,
ID = IP,
is_blacklisted(ID),
io:format("~p ~p ~n",[ID, is_blacklisted(ID)]),
%% === Allow expected HTTP requests
HttpReq = (A#arg.req)#http_request.method,
case lists:member(HttpReq, Allow) of
true ->
{_,ReqTgt} = (A#arg.req)#http_request.path,
PassThru = [".css",".jpg",".jpeg",".png",".js"],
%% ... much more ...
false ->
is_limited(ID),
throw(http_method_denied)
end
catch
throw:blacklist -> %% Send back a 429;
throw:limit -> %% Same but no Retry-After;
throw:http_method_denied ->
%%Only thrown experienced
AllowedReq = string:join([atom_to_list(M) || M <- Allow], ","),
A#arg{state=#rewrite_response{status=405,
headers=[{header, {"Allow", AllowedReq}},{header, {connection, "close"}}]
}};
Type:Reason -> {error, {unhandled,{Type, Reason}}}
end.
I can spam curl -I -X HEAD <<any page>>
as fast as I can in a bash shell and all I get is HTTP 200
. The ETS table has zero entries as well. Using PUT
I get a HTTP 405
as intended. I can ratelimiter:lockout({MY_IP})
and get the web page to load in my browser and a HTTP 200
with curl
.
I'm confused. Is it the way I started YAWS?
start() ->
os:putenv("YAWSHOME", ?HOMEPATH_YAWS),
code:add_patha(?MODPATH_YAWS),
ok = case (R = application:start(yaws)) of
{error, {already_started, _}} -> ok;
_ -> R
end,
{ok,self()}. %% Tell supervisor everything okay in a manner it expects.
I did this because I thought it would be "easier."
When starting Yaws as part of another application, it's important to use its embedding support. One important thing the Yaws embedding startup code does is set the application environment variable embedded
to true
:
application:set_env(yaws, embedded, true),
Yaws checks this variable in several of its code paths, especially during initialization, in order to avoid assuming that it's running as a stand-alone daemon process.
Regarding rate limiting, rather than using an arg rewriter, you might consider using a shaper. The yaws_shaper
module provides a behavior that expects its callback module to implement two functions:
check/1
: yaws_shaper
calls this to allow the callback module to decide whether to allow the request from the client. It passes client host information as the callback argument. Your shaper callback module returns either the atom allow
to allow the request to proceed, or the tuple {deny, Status, Message}
where Status
is an HTTP status code to return to the client, such as 429 to indicate the client is making too many requests, and Message
is any extra HTML to be returned to the client. (It might be nice if Message
could include a reply header such as Retry-After
as well; this is something I'll consider adding to Yaws.)
update/3
: yaws_shaper
calls this when the response for a client is ready to be returned. The first argument is the client host information, the second argument is the number of "hits" (the value 1 for each request), and the third argument is the number of bytes being delivered in response to the client's request. Your shaper callback module can return ok
from update/3
(Yaws does not use the return value).
A shaper can use this framework to track how many requests each client is making and how much data Yaws is delivering to each client, and use that information to limit or deny particular clients.
And finally, while "printf debugging" works, it's less than ideal especially in Erlang, which has built-in tracing. You should consider learning the dbg
module so you can trace any function you want, see who called it, see what arguments are being passed to it, see what it returns, etc.