Splitting very large csv files into smaller files

Is Dask proper to read large csv files in parallel and split them into multiple smaller files?

Solution

Yes, dask can read large CSV files. It will split them into chunks

df = dd.read_csv("/path/to/myfile.csv")

Then, when saving, Dask always saves CSV data to multiple files

df.to_csv("/output/path/*.csv")

See the read_csv and to_csv docstrings for much more information about this.

dd.read_csv
dd.DataFrame.to_csv

Word Clouds using TabPy
Extracting lines from two files where there is a match of value in specific columns
Instantiate empty type-hinted list
How to fix the GSException: "Container not found" even when the container exists?
Find index of min value in a matrix
Break statement in finally block swallows exception
Does `anaconda` create a separate PYTHONPATH variable for each new environment?
How can I replace a substring in a Python pathlib.Path?
How to draw a line on an image in OpenCV?
Match 2 strings then print both matches
subprocess and exchanging json: How can I use read() on stdin non-blockingly?
DNS request with scapy (Python)
What is the difference between xpath() and findall()?
if statement in Django template not working
Extract header/footer from PDF (programmatically)
How to create virtual env with Python 3?
Type hinting / annotation (PEP 484) for numpy.ndarray
Allowing resizing window pyGame
Python Multiprocessing empty array
Different / better approaches for calling python function from Java
Calling Python Functions from Java (Without Jython, because it is too slow.)
Best way to find the months between two dates
Calling Java from Python
Calling Java from Python
Identifying outliers from openCV contours based on curvature?
Unfolding a cartesian binned dataset into polar coordinates
Why is facecolor argument in plot_surface() of matplotlib not working in python?
Data class with argument optional only in init
Repl.it Python 3 Short cut Comment/Uncomment a block
How can I find where Python is installed on Windows?