scikit-learnscikits

How do I create a sklearn.datasets.base.Bunch object in scikit-learn from my own data?


In most of the Scikit-learn algorithms, the data must be loaded as a Bunch object. For many example in the tutorial load_files() or other functions are used to populate the Bunch object. Functions like load_files() expect data to be present in certain format, but I have data stored in a different format, namely a CSV file with strings for each field.

How do I parse this and load data in the Bunch object format?


Solution

  • You don't have to create Bunch objects. They are just useful for loading the internal sample datasets of scikit-learn.

    You can directly feed a list of Python strings to your vectorizer object.