I'm looking to convert a given excel file to a csv using csvkit as a library, not from the command line. I'm having trouble finding any information on library usage syntax. Can anyone shed some light on how to use csvkit as a library for this purpose?
My test case is simple - take input.xlsx or input.xls, convert, and save as output.csv. Here's what I've tried so far, which is based on suggestions found elsewhere :
import csvkit
with open('input.xlsx') as csvfile:
reader = in2csv(csvfile)
# below is just to test whether the file could be accessed
for row in reader:
print(row)
gives
Traceback (most recent call last):
File "excelconvert.py", line 6, in <module>
reader = in2csv(csvfile)
NameError: name 'in2csv' is not defined
There's a similar question here, but the answers just seem to reference documentation that either isn't up or doesn't actually explain library usage syntax, it just lists classes. There is an answer suggesting the syntax might be similar to the csv module, which is what I used to make the attempt above, but I'm getting nowhere.
The docs strongly suggest this is meant to be a command line tool, not to be used from inside the Python interpreter. You can do something like this to convert a file to csv from the command line (or you could pop it in a shell script):
in2csv your_file.xlsx > your_new_file.csv
If you'd like to read the file, just do this (it's similar to what you have, but you don't need any external modules, just use built-in Python):
with open('input.xlsx') as csvfile:
reader = csvfile.readlines() # This was the only line of your code I changed
# below is just to test whether the file could be accessed
for row in reader:
print(row)
Or you could call your command line using the os
module:
# Careful, raw sys call. Use subprocess.Popen
# if you need to accept untrusted user input here
os.popen("in2csv your_file.xlsx > your_new_file.csv").read()
One of the snippets above is is probably what you need, but if you're really looking for punishment, you can attempt to use the in2csv
file from inside the interpreter. Here's how you might go about doing it (there is no support for this in the docs that I could find, it's just me poking around in the interpreter):
>>> from csvkit import in2csv
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name in2csv
>>> import csvkit
>>> help(csvkit)
Help on package csvkit:
NAME
csvkit
FILE
c:\python27\lib\site-packages\csvkit\__init__.py
DESCRIPTION
This module contains csvkit's superpowered alternative to the standard Python
CSV reader and writer. It can be used as a drop-in replacement for the standard
module.
.. warn::
Since version 1.0 csvkit relies on `agate <http://agate.rtfd.org>`_'s
CSV reader and writer. This module is supported for legacy purposes only and you
should migrate to using agate.
PACKAGE CONTENTS
cleanup
cli
convert (package)
exceptions
grep
utilities (package)
So you can't import in2csv directly from csvkit (since it isn't listed under PACKAGE CONTENTS
). However, if you do a little hunting, you'll find that you can access the package from csvkit.utilities
. But it only gets worse from here. If you do more "help hunting" (i.e. calling help from the interpreter) like above, you'll find that the class was designed to be used from the command line. So it's a real pain in the ass to use from inside the interpreter. Here's an example of trying to use the defaults (results in an explosion):
>>> from csvkit.utilities import in2csv
>>> i = in2csv.In2CSV()
>>> i.main()
usage: [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
[-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H] [-v]
[-l] [--zero] [-f FILETYPE] [-s SCHEMA] [-k KEY] [--sheet SHEET]
[-y SNIFF_LIMIT] [--no-inference]
[FILE]
: error: You must specify a format when providing data via STDIN (pipe).
Taking a look at the in2csv.py module, you'll have to monkey patch the args
to get it to do what you want from inside the interpreter. Again, this was not designed to be used from inside the interpreter, it was designed to be called from the cmd line (so args
is defined if you call it from the cmd line). Something like this seemed to run, but I didn't thoroughly test it:
>>> from csvkit.utilities import in2csv
>>> i = in2csv.In2CSV()
>>> from collections import namedtuple
>>> i.args = namedtuple("patched_args", "input_path filetype no_inference")
>>> i.args.input_path = "/path/to/your/file.xlsx"
>>> i.args.no_inference = True
>>> i.args.filetype = None
>>> i.main()