I am developing a user-generated content site. The goal is that users are rewarded if their content is viewed by a certain number of people. Whereas a user account is required to post content, an account is not required to view content.
I am currently developing the algorithm to count the number of valid views, and I am concerned about the possibility that users create bots to falsely increase their number of views. I would exclude views from the content generator’s IP, but I do not want to exclude valid views from other users with the same external IP address. The same external IP address could in fact account for a large amount of valid views in a college campus or corporate setting.
The site is implemented in python, and hosted on apache servers. The question is more theoretical in nature, as how can I establish whether or not traffic from the same IP is legitimate or not. I can’t find any content management systems that do this, and was just going to implement it myself.
You cannot reliably do this. Any method you create can be automated.
That said, you can raise the bar. For instance every page viewed can have a random number encoded into a piece of JavaScript that will submit an AJAX request. Any view where you have that corresponding AJAX request is probably a real browser, and is likely to be a real human since few bots handle JavaScript correctly. But absolutely nothing stops someone from having an automatic script to drive a real browser.