google-bigquerygoogle-patent-search

Acquire patents' forward citation data from BigQuery by application


I want to collect data by application_number like this. The real application number is CN 201510747352.

SELECT c.application_number AS Pub, COUNT(p.publication_number) AS CitedBy 
     FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c 
     WHERE c.application_number IN ('CN-201510747352-A') 
     GROUP BY c.application_number

But it can't work. The url is the patent page. Who can do me a favor? https://patents.google.com/patent/CN105233911B/zh?oq=CN201510747352.8


Solution

  • My guess is that patent can be cited after it's status is Application - so instead of using initial number CN-201510747352 - you should use app/pub number when status is Application - Also you need to apply not only distinct counting but also excluding counting same app with -A or -B or etc. suffix - that is why you will see use of the regex_extract function

    #standardSQL
    SELECT 
      c.publication_number AS Pub, 
      COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
    FROM `patents-public-data.patents.publications` AS p, 
    UNNEST(citation) AS c 
    WHERE c.publication_number LIKE ('CN-105233911%') 
    GROUP BY c.publication_number  
    

    with result

    Row Pub             CitedBy  
    1   CN-105233911-A  10   
    

    ... If I only have the application data, how can I realize it?

    #standardSQL
    SELECT 
      c.publication_number AS Pub, 
      COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
    FROM `patents-public-data.patents.publications` AS p, 
    UNNEST(citation) AS c 
    WHERE c.publication_number IN (
      SELECT publication_number 
      FROM `patents-public-data.patents.publications`
      WHERE application_number IN ('CN-201510747352-A') 
    )
    GROUP BY c.publication_number