I am very curious to know how malware detection (like google's safebrowsing) techniques work? Googling does not help my cause. I found some thing called cuckoobox which do such things.
Exactly how Malware detection of a website works? What may be the algorithm for that? What algorithm google safebrowsing etc uses?
Any python script available?
It is an interesting problem which is best served using multiple solutions.
Google probably keeps a list of malicious domains, visit the domain - did it attempt to serve you an .exe without user interaction? Does the content seem to be gibberish? And other such quantifiers. - Mark as malicious. Visit another domain, did it redirect you to the one in your list which is malicious? Mark as untrusted. Then you can apply machine learning/regression analasis to increase the confidence and decrease false positives. You could go further and have a light scan for some domains and a deep scan for other domains (because deep scan may use something like cuckoo which takes more resources). Is the domain name a sensible word and does it match the whois information? Or is it gibberish?
Another approach is to keep a list of known exploits (thier names and code-signature) for vulnerabilities in web-browsers and common plugins, then see if the web site attempts to serve you an exploit which you know about. To generate a list of known exploits, just scan CVE or another open database and fetch the exploits, make a hash out of them and so on... so this will not catch all of the crap, but most of it.