google-cloud-logging

Something like GROUP BY with Logs Explorer


I am trying to find out details about suspicious traffic on my website which is running on Google Cloud (Google App Engine with Java, to be more specific). One idea is to analyze which IP addresses are sending requests very often. In SQL I would do something like

SELECT 
  protoPayload.ip,
  COUNT(protoPayload.ip) AS `ip_occurrence` 
FROM 
  foo /* TODO replace foo with correct table name */ 
WHERE 
  protoPayload.ip NOT LIKE '66.249.77.%' /* ignore Google bots */
GROUP BY 
  protoPayload.ip
ORDER BY 
  `ip_occurrence` DESC
LIMIT 100

But I have no idea how to do this with Logs Explorer. “Log Analytics” seems to allow such SQL, but requires to use it only on non-production projects.

I also tried to download the logs from Logs Explorer, but there is a limit of 10,000 logs, which is not enough at all.

Is there any easy way?

On the bigger picture, I am trying to get my AdSense account reopened. So far I failed. Maybe the proof I provided, my Google Analytics data, is not strong enough. The field description on the form mentions IP addresses. But in Google Analytics I don't see any IP addresses ...


Solution

  • Log Explorer allows you to create some easy Log Explorer queries for filtering but you won't have any Group By possibility there.

    To achieve something similar you can use Sink:

    Sinks control how Cloud Logging routes logs. Using sinks, you can route some or all of your logs to supported destinations. Some of the reasons that you might want to control how your logs are routed include the following:

    • To store logs that are unlikely to be read but that must be retained for compliance purposes.
    • To organize your logs in buckets in a format that is useful to you.
    • To use big-data analysis tools on your logs.
    • To stream your logs to other applications, other repositories, or third parties.

    Supported destinations are:

    Cloud Storage: JSON files stored in Cloud Storage buckets.

    Pub/Sub: JSON messages delivered to Pub/Sub topics. Supports third-party integrations, such as Splunk, with Logging.

    BigQuery: Tables created in BigQuery datasets.

    Another Cloud Logging bucket: Log entries held in Cloud Logging log buckets.

    For your scenario best would be BigQuery Sink

    In the documentation you have a step by step guide on how to Create Sink.

    Useful links: