I want to download a csv file from an API endpoint with pandas. I am using the following code:
df=pd.read_csv('https://data.cityofnewyork.us/resource/nu7n-tubp.csv').
However, the resulting dataframe has only 1,000 rows, even though the dataset is much larger (around 121k rows). How can I download all the rows?
I tried to specify a number larger than 1,000 with nrows but I get the same result.
Socrata typically requires you to page through data, which is set at 1,000 rows. You could modify it by increasing it by using the $limit
parameter. Based on the data set page, this is about 122k rows, so can use a limit of 130k to get them all:
df=pd.read_csv('https://data.cityofnewyork.us/resource/nu7n-tubp.csv?$limit=130000')
You also may want to explore the SodaPy library.