I'm in the middle of writing a commercial application that takes a list of URLs as input (in this case from Google Custom Search), process the pages pointed to by the URLs and storing the processed information alongside the URLs.
I was just wondering if anyone knows whether this breaks the rule in its TOS which states that: "You may not in any way frame, cache or modify the Results produced by Google"..
Source: http://www.google.com/cse/docs/tos.html
I would also be interested to know if anyone has any good search engine APIs to recommend.
You need to differentiate Google Custom Search and the Google Custom Search API.
CSE is the Google Search functionality you can embed into your web site. It may as far as I know only be used by the client web browser, and you must not modify/frame/etc the results in any way.
Documentation for the Custom Search API can be found here:
https://developers.google.com/custom-search/v1/overview
Note that queries are limited to 100 per day.
If you enable this API in the developer console, you will be presented an explicit TOS for this service, which probably are these:
https://developers.google.com/custom-search/terms
https://developers.google.com/terms/
Note that these do include
Prohibitions on Content
Unless expressly permitted by the content owner or by applicable law, you agree that you will not, and will not permit your end users to, do the following with content returned from the APIs:
Scrape, build databases or otherwise create permanent copies of such content, or keep cached copies longer than permitted by the cache header;
Copy, translate, modify, create a derivative work of, sell, lease, lend, convey, distribute, publicly display or sublicense to any third party;
Misrepresent the source or ownership; or
Remove, obscure, or alter any copyright, trademark or other proprietary rights notices, falsify or delete any author attributions, legal notices or other labels of the origin or source of material.
Your use sounds as if it falls into the "build databases" category.
Since you only get 100 requests a day and are not allowed to build a database out of that, I figure the API will not satisfy your needs.