pythonvectorizationcountvectorizer

Transforming sentences to Numbers using SciKit-Learn’s CountVectorizer()


I am trying to convert a input sentence Review into a CountVectorizer. I am struggling to handle the sentences that are passed through. How do I deal with the sentences and add vectors to these? Any assistance will be highly appreciated.

Input Data:

Sentiment   Review
Neg  The new Ford Focus came highly recommended to me when I was looking to buy my first new car  I researched its history and found that it received great reviews for comfort and safety during its European release  Test driving the car  I found it to be comfortable  well equipped and stylish  I have now driven the car for for 6 months and have put only 5000 miles on it  While I have been happy with the overall performance of the car  I have been sorely disappointed with the workmanship involved  Realizing that  new models  are notorious for having  manufacturing bugs  I felt somewhat reassured that these would have been worked out from 1998 1999 during the first European release  I was wrong  My car has been in the repair shop a total of five times for manufacturers defects including a flooded passenger compartment  repaired twice to date  faulty master clutch cylinder  misaligned striker plate on seat back latch  broken break switch and cruise control  While I really love my car  I would hesitate to recommend it to any but my worst enemies  Time will tell if the problems my Focus has had are unique or are related to intrinsic design flaws 
Neg  We bought the Focus ZTS sedan because my wife needed an economical car to haul the grandkids around with  We traded in a  94 Explorer with a 5 speed just before the Firestone tire fiasco became public My wife loves driving the car  Although it is a bit small for me  6 1     290lbs  it is OK  The car handles great  and with the Zetec engine  it has adequate performance  although I wouldnt want any less   go   than its got Now for the problems   the main one of which is because I do my own oil changes  A particular sore point for me with most cars is that the manufacturers dont make it easy to change the oil and filter without creating a mess  This new Focus is particularly bad First  the owners manual indicates a Motocraft FL2005 filter  The car had an FL801 on it  which some ham fisted factory idiot had torqued to about a million foot pounds  I had to use some very large pliers and turn the filter almost 3 4 turn before it was loose enough to move by hand  Poor quality control The filter happens to be mounted in a horizontal position and is almost flush with the side of the engine  When I finally got it loose  oil ran down the side of the engine  onto the drive axle  onto the frame  down my arm  and all over the driveway  Very bad design  On other cars  I have been able to use a cut off soda bottle placed over the filter to catch the drips  On the Focus   it wont work The hood on this car is aluminum  It bends very easy   mine already has a dent in it   and I didnt have an accident  A minor problem is the power windows  They wont operate with the key in the accessory position  Tilt wheel also difficult to operate Bottom line   only 3 000 miles on this car  but its going to get traded off as soon as possible for a vehicle with a little more   substance   and which is easier to maintain  ive owned 9 Fords since 1986   still have 3  If all the newer Fords are made this way  the Focus may be the last Ford product I buy 
Neg  Recently I had the need to rent a car  I picked the Ford Focus  I was amazed with this car  I liked it better than my own  more expensive 1999 Toyota Corolla LE  The steering wheel is not only height adjustable  but also telescopes  something you do not normally find on such a reasonably priced car  The drivers seat also adjusted forward and back and in height  nice feature for someone tall like myself  The front seats were roomy and comfortable and the back seat had I think the most leg room I HAVE EVER SEEN in a compact car The stereo sounded good considering it was stock  and the face of the radio has an upward tilt to it so that it is driver friendly  All the bells and whistles were located within easy reach and the air worked well In addition to having a roomy trunk  there are 60 40 split rear seats  Child safety seat anchors and shoulder harness seat belts for 5 passengers I rented the 4 door sedan  but there are 3 body styles  The 4 door sedan  4 door wagon and a sporty little hatch back  I have read the safety ratings for the hatch back and from what I recall it got 5 stars This car is definately on my list of cars to consider purchasing in the near future  you should take a look at it too 
Pos   Cruising In My Big T  I have had my  91 Thunderbird for 4 years now  bought it way back in my freshman year and it has served me well throughout college  I am a horrible Northern driver and brutal on my vehicles  but this piece of Ford craftsmanship refuses to bail out on me  Its a rough and tumble vehicle that remains an incredible deal for the price  especially when bought used from a reputable dealer The Advantages  1 Seat Space These are big seats people  with the kind of legroom that only those pretentious you know whats in first class usually get their hands on  And that spaciousness isnt just about spoiling the people up front either  it extends to the back seat as well  which means that everyone feels just a little bit more comfortable and relaxed when you get to wherever you re going And not only are the seats big  but the generous amount of padding in each makes for an especially comfortable ride 2 Appearance ive gt to admit it to you  I just love the look of the Thunderbird  though it is an acquired taste to be sure  I can best describe the style as   Italian   sleek in a chunky way and available in colors  like burgundy  that make it look like a cross between a hit mobile and a hearse 3 Smooth Ride Riding in my Thunderbird has always seemed quite smooth to me  especially when you consider how low to the ground it is  Why so low  That kind of positioning allows the Thunderbird to provide the rider with great control  as your   feel   for the road is significantly enhanced In the same arena as the ride is the ease of use of the console  which for me equals smoothness and ease  The Thunderbirds radio and air console is incredibly well designed with everything within reach and intuitively organized  Seem trivial to you  Try changing the station at 75 miles an hour and see how important knob placement is 4 Trunk This is a important feature for me  as I seem to move every 3 6 months  The trunk on the Thunderbird is big enough for all of your luggage  not to mention the corpse of Vinnie The Chin from a rival family My Defense  I have read another review of this vehicle that criticizes the brake quality  and I have to vehemently disagree with it  I ride my brakes hard  and I have never had a lockup or other incident  The brakes do tend to squeek a bit  but the noise is no indication of a performance issue The Final Verdict  The bottom line is that the Thunderbird is a comfortable and well designed car at a reasonable price  As long as you like burgundy vehicles and live in an area thats at least 30  Italian  the Thunderbird is a great option 
Pos  I arrived in the states from Australia at the end of March 1999 to stay there for a year and come home at the end of March 2000  I stayed with friends in South Carolina who is a Ford man as I have always owned GM or Chevs  they lent me a 1979 red corvette until I bought myself a car so after 3 months I did buy a 1985 Z28 Camaro  350  to fix up and use for the 9 months after looking at it I thought this was a bad idea so looked in the local paper and found a red 1991 V8 Thunderbird with 114 000 miles on it for  3000 After taking it for a test drive offered the lady  2800 and drove it home  it had a slight water leak from the water pump  so while replacing it I installed a set of under drive pulleys  which I could notice the power increase the first time I drove it  put a K amp N air cleaner in as well I had a friend come over from Australia so we drove from Greenville SC across to Sequin TX  did about 3000 miles in that trip  we took the long way  and had no trouble at all and got 27 MPG sitting on 80  MPH had a radar detector  it has a highway ratio in it 2 75 My brother came over from Australia so we went from Greenville SC down to Daytona Beach and back  then drove across America to California  which we did about 4600 miles trouble free  When we left Williams AZ the car was buried under snow as we had a cold snap and snow dig the snow away and turn the key starting the engine at once and never missing a beat The car came with the premium sound system  but the radio cassette was playing up so replaced it with a Pioneer radio CD I went to the wreckers and bought an electric motor seat assembly for the right hand side so when converting to RHD will have an electric adjustment  Also bought a sports instrument cluster and centre handbrake assy out of a super coupe These cars were never made for export or for Right Hand Drive  so have to get all the parts needed now for conversion I did 18 000 miles in 9 months without the car stopping or letting me down I gave the car 4 oil changes  added fuel octane booster with every tank of gas  it has the factory 15   alloy wheels with Michelin tyres  I found the car very easy to drive and steer  but did experience brake shudder  which appears to be a common problem due to thin brake rotors I added a rear spoiler and had the windows tinted  which makes the car look sporty  as in Australia the only 2 door cars are mainly Jap imports  so in the end I shipped the car back to Australia where I have to convert it over to Right Hand Drive for our road rules  this cars owes me  10 000 Australia  5200 US  landed back at my house in Australia  which when converted to RHD they sell from  35 000 to  40 000  18 000 to  21 000 US BEST CAR I HAVE EVER OWNED 
Pos  This review is about  Ford Mustang 3 8L Coupe  with stick shift I test drove when I considered buying it  I say  considered  because I did not buy it and here is why Test Drive  The dealer talked too much during the test drive  They always try to do that to distract you  but I noticed the following things Styling  You can argue  but I think it could be better  The car looks bulky  the C pillars are thick  which increases  blind spots    I was afraid to run over somebody while backing up  the standard wheels look crude  The previous Mustang looked more balanced Engine  The 3 8L 193 hp engine does not seem all that powerful  even with stick  We went on the freeway onramp and I was disappointed  Strange  considering the 220  lb ft of torque rating at as low as 2800 rpm  European and Japanese manufacturers manage to extract more than 200 hp out of 3 0 liter engines Note  the A C was on during the test drive and was very efficient  It might eat some power  but not that much Transmission  The shifter has quite short travel  which is good  but the clutch does not provide any feedback   you cannot feel it engage by the pedal pressure  or the dealer talked too much  The clutch also engaged very high in the pedal travel  I drove some Eastern European cars for several years and never had complaints like this one  Or maybe im getting old and grumpy Suspension  The suspension is not only stiff  but creates a lot of unnecessary up and down motions  The car uses live axle in the rear  so I didnt expect much anyway Standard Equipment  The list of standard equipment looks good  It includes power windows  mirrors  locks and remote keyless entry  alloy  ugly  wheels  AM FM CD cassette player  A C  dual vanity mirrors  etc Interior  Interior  materials  fit and finish  looks cheap  I did not expect walnut for  16K  but Ford could have done better  As I said  the C pillars are wide  in coupe  and the interior room is smaller than Id like  The steering wheel tilts but does not telescope  which might be a problem for the tall people Insurance and Safety  Insurance rates are high  especially if you are a male younger than 25  The crash test results are not encouraging either   the overall rating is  Acceptable  with  Poor  death rate and  Marginal  injury rate Fuel Economy  I didnt get a chance to see the actual fuel consumption myself  but on paper its 19 MPG city   29 MPG highway  Not impressive for the car of this size with manual transmission Warranty and Reliability  Consumer Reports  magazine says that Mustang has poor reliability  Ford provides 36 000 mile   3 year warranty and 5 year corrosion warranty  Majority of other manufacturers offers 60 000 mile   5 year powertrain warranty  100 000 mile   10 year warranty for Hyundai Kia The last three  safety  fuel economy and reliability  also depend on the way you drive Pricing  The price was good  in theory  I know that you can get the car for less than  16K  at CarsDirect com  for example  but the particular dealership I went to wanted more than  17K and did not want to negotiate the price at all  Besides they were very pushy and rude  Needles to say  they did not earn my business  they didnt even try The dealer was constantly asking what monthly payment I can afford  Well  I can afford the payment I need to get better car  I walked  after which they called me several times asking how they can make me buy the car  today  I was unable to produce any kind of positive reply on this one I In car buying a lot depends on personal taste  If you like Mustangs styling and features and decide to buy it  it is a good deal  providing you with electric everything  remote keyless entry  radio CD cassette  V6 engine and alloy wheels for less than  16  If you want refinement  fit and finish  safety and reliability  get ready to pay more for something else I  

Code attempt:

from sklearn.feature_extraction.text import CountVectorizer
#instantiate the class
cv = CountVectorizer()

#list of sentences
for i in range(len(df['clean_Review'])):
    text=df.loc[i, "clean_Review"]
    #tokenize and build vocab
    cv.fit(text)
    print(cv.vocabulary_)
    #transform the text
    vector = cv.transform(text)
    print(vector.toarray())
    #df.loc[i,"porter"]=test
    i=i+1

Solution

  • You don't need the looping. From the documentation: list of strings is needed

    from sklearn.feature_extraction.text import CountVectorizer
    #instantiate the class
    cv = CountVectorizer()
    #vector is a sparse matrix storing individual words as "bag of words" model
    vector = cv.fit_transform(df["clean_Review"].copy())
    
    

    I assume that you have performed corpus cleaning step (lowercase, ascii encoding, stopword removal, etc) before using CountVectorizer to convert your model to bag of words, therefore I have kept the arguments of CountVectorizer() empty.

    Example:

    from sklearn.feature_extraction.text import CountVectorizer
    import pandas as pd
    
    #sample text corpus
    corpus = pd.Series(["aa bb cc dd ee","bb cc dd ee","cc dd ee","dd ee","ee","ee ff"])
    #instantiate the class
    cv = CountVectorizer()
    vector = cv.fit_transform(corpus)
    
    print(corpus)
    
    0    aa bb cc dd ee
    1       bb cc dd ee
    2          cc dd ee
    3             dd ee
    4                ee
    5             ee ff
    dtype: object
    
    print(vector.toarray())
    
    array([[1, 1, 1, 1, 1, 0],
           [0, 1, 1, 1, 1, 0],
           [0, 0, 1, 1, 1, 0],
           [0, 0, 0, 1, 1, 0],
           [0, 0, 0, 0, 1, 0],
           [0, 0, 0, 0, 1, 1]])