algorithm statistics ranking android-1.5-cupcake

Cake Comparison Algorithm

This is literally about comparing cakes. My friend is having a cupcake party with the goal of determining the best cupcakery in Manhattan. Actually, it's much more ambitious than that. Read on.

There are 27 bakeries, and 19 people attending (with maybe one or two no-shows). There will be 4 cupcakes from each bakery, if possible including the staples -- vanilla, chocolate, and red velvet -- and rounding out the 4 with wildcard flavors. There are 4 attributes on which to rate the cupcakes: flavor, moistness, presentation (prettiness), and general goodness. People will provide ratings on a 5-point scale for each attribute for each cupcake they sample. Finally, each cupcake can be cut into 4 or 5 pieces.

The question is: what is a procedure for coming up with a statistically meaningful ranking of the bakeries for each attribute, and for each flavor (treating "wildcard" as a flavor)? Specifically, we want to rank the bakeries 8 times: for each flavor we want to rank the bakeries by goodness (goodness being one of the attributes), and for each attribute we want to rank the bakeries across all flavors (ie, independent of flavor, ie, aggregating over all flavors). The grand prize goes to the top-ranked bakery for the goodness attribute.

Bonus points for generalizing this, of course.

This is happening in about 12 hours so I'll post as an answer what we ended up doing if no one answers in the meantime.

PS: Here's the post-party blog post about it: http://gracenotesnyc.com/2009/08/05/gracenotes-nycs-cupcake-cagematch-the-sweetest-battle-ever/

Solution

Here's what we ended up doing. I made a huge table to collect everyone's ratings at http://etherpad.com/sugarorgy (Revision 25, just in case it gets vandalized with me adding this public link to it) and then used the following Perl script to parse the data into a CSV file:

#!/usr/bin/env perl
# Grabs the cupcake data from etherpad and parses it into a CSV file.

use LWP::Simple qw(get);

$content = get("http://etherpad.com/ep/pad/export/sugarorgy/latest?format=txt");
$content =~ s/^.*BEGIN_MAGIC\s*//s;
$content =~ s/END_MAGIC.*$//s;
$bakery = "none";
for $line (split('\n', $content)) {
  next if $line =~ /sar kri and deb/;
  if ($line =~ s/bakery\s+(\w+)//) { $bakery = $1; }
  $line =~ s/\([^\)]*\)//g; # strip out stuff in parens.
  $line =~ s/^\s+(\w)(\w)/$1 $2/;
  $line =~ s/\-/\-1/g;
  $line =~ s/^\s+//;
  $line =~ s/\s+$//;
  $line =~ s/\s+/\,/g;
  print "$bakery,$line\n"; 
}

Then I did the averaging and whatnot in Mathematica:

data = Import["!~/svn/sugar.pl", "CSV"];

(* return a bakery's list of ratings for the given type of cupcake *)
tratings[bak_, t_] := Select[Drop[First@Select[data, 
                        #[[1]]==bak && #[[2]]==t && #[[3]]=="g" &], 3], #!=-1&]

(* return a bakery's list of ratings for the given cupcake attribute *)
aratings[bak_, a_] := Select[Flatten[Drop[#,3]& /@ 
                        Select[data, #[[1]]==bak && #[[3]]==a&]], #!=-1&]

(* overall rating for a bakery *)
oratings[bak_] := Join @@ (tratings[bak, #] & /@ {"V", "C", "R", "W"})

bakeries = Union@data[[All, 1]]

SortBy[{#, oratings@#, Round[Mean@oratings[#], .01]}& /@ bakeries, -#[[3]]&]

The results are at the bottom of http://etherpad.com/sugarorgy.