amazon-simpledb

Implementing fraud-proof voting system using SimpleDB


I'd like to implement a voting system with all data stored in Amazon SimpleDB. The frontend is also running on Amazon EC2. My main concern is cost of the service.

The voting system needs to be fraud-proof. That is, each visitor is only allowed one vote per election and there are no user accounts. I have considered storing a SimpleDB item for each vote along with the ip address it came from. Then I would only insert a new vote if there is no entry for that ip/election pair.

However, this sounds very expensive in terms of storage and processing. I would have to retrieve all votes to tally them up to display an up to date result.

The other option I have considered is storing the votes and only occasionally summing them up and caching the result. But this still requires me to store all of the votes.

What is your recommendation?


Solution

  • Let's address your main concern first, cost. Lets assume you have 1 000 000 votes in your system. Each vote is represented by one simpledb item and each item contains three attributes, a timestamp, the actual vote and a unique value identifying the user (I'll get to that part later).

    Now, the simpledb overview page gives us a simple way of calculating the actual storage size of an item.

    Amazon SimpleDB measures the size of your billable data by adding the raw byte size of the data you upload + 45 bytes of overhead for each item, attribute name and attribute-value pair.

    Lets calculate with a bit of headroom in case you decide to include more data and let's say that each item/vote will cost you 300 bytes of storage. The total storage size for your data will then be ~286mb. Well within the free tier limit. Then there's the cost of inserting your items but that will probably be negligible. There's a cost associatied with tallying votes but as you've already suggesting caching can help alleviate this significantly.

    I threw these numbers into the excellent Amazon Simple Monthly Calculator service to get an approximate figure and got ~$4/month for 1GB storage, 1M puts, 250k gets and 100k selects. Now, in my experience it's very hard to approximate usage beforehand so you have to keep an eye on our usage as you go along. The usage reports provided by amazon contains detailed information about requests and you can use that to look at the effects of simulated real world usage of your app.

    Fraud proof

    Now, as for the fraud-proof part. It's a bit hard for me to assess the level of fraud-prevention you're looking for but in any case you're simply not going to have a fraud-proof voting system without user accounts. Even if you have accounts you have to be extremely careful to avoid XSS and CSRF so that malicious users won't exploit other users and their votes.

    Limiting vote per IP has a number of problems.

    There's even a possibility of users have a different ip-address on each request(!)

    Speaking of sticky sessions, we were surprised to find that there are those rare few users whose IP addresses will change radically from request to request.

    https://blog.stackoverflow.com/2009/07/

    If you're truly serious about creating a fool-proof online voting system you'll have to look into user accounts with real world identity verification of some sort (ie sending verification codes via post to the users registered address).

    And last, but not least. Regardless of the sofistication of your fraud prevention mechanism you have to perform regurlar auditing of your data to detect unexpected fraud scenarios early on.