I want to collect data by application_number
like this. The real application number is CN 201510747352
.
SELECT c.application_number AS Pub, COUNT(p.publication_number) AS CitedBy
FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c
WHERE c.application_number IN ('CN-201510747352-A')
GROUP BY c.application_number
But it can't work. The url is the patent page. Who can do me a favor? https://patents.google.com/patent/CN105233911B/zh?oq=CN201510747352.8
My guess is that patent can be cited after it's status is Application - so instead of using initial number CN-201510747352
- you should use app/pub number when status is Application - Also you need to apply not only distinct counting but also excluding counting same app with -A or -B or etc. suffix - that is why you will see use of the regex_extract function
#standardSQL
SELECT
c.publication_number AS Pub,
COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p,
UNNEST(citation) AS c
WHERE c.publication_number LIKE ('CN-105233911%')
GROUP BY c.publication_number
with result
Row Pub CitedBy
1 CN-105233911-A 10
... If I only have the application data, how can I realize it?
#standardSQL
SELECT
c.publication_number AS Pub,
COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p,
UNNEST(citation) AS c
WHERE c.publication_number IN (
SELECT publication_number
FROM `patents-public-data.patents.publications`
WHERE application_number IN ('CN-201510747352-A')
)
GROUP BY c.publication_number