pythonnlpn-gram

How i get the occurrence of a sentence with google ngram viewer and python?


short backround: i try to enhance the spelling corrector by Peter Norvig in python. In this sense i need the occurrence of a sentence (up to 3-4 words)... The Ngram viewer from Google would help me a lot but i don't know how i get the value with an API or something else.

pseudocode:

# Sentence without meaning but word for word correct.
>> occurrence("were are you")
0.0000000978

# Sentence that makes sense
>> occurrence("where are you")
0.000148

# Then my method should return the sentence with the highest value. (But thats not the problem)

sorry for my english :-D Thank you!


Solution

  • They actually have an undocumented api.

    import requests
    import json
    
    term = "where are you"
    url =f"https://books.google.com/ngrams/json?content={term}&year_start=1800&year_end=2000&corpus=26&smoothing=3"
    resp = requests.get(url)
    if resp.ok:
      results = json.loads(resp.content)
    

    results[0]['timeseries'] has the frequencies you need:

    [2.854326695000964e-07,
     3.4926038665616944e-07,
     3.3916604043800663e-07,
     ...]
    

    Source: https://jameshfisher.com/2018/11/25/google-ngram-api/