Can someone give an example of saving a the table from webpage to excel spreadsheet ? Let's say the page contains this code. Do we need to save each player one by one by css selector ? or we have some magic function which can copy the table class tag? Eventually, saving them to mysql is my goal. can someone show how to save to to excel spreadsheet ?
<table class="color-alt a-center sobre">
<tbody><tr class="color-5 negri a-bottom">
<td rowspan="2" class="color-5 " width="8%">Rk</td>
<td rowspan="2" class="color-5 a-left">
<div class="left" style="min-width: 120px; max-width:207px; width: 77%">
<div class="left">Player</div>
<div class="right"> (Team)</div>
</div></td>
<td colspan="2"><strong>3-Pointers Made</strong></td>
<td rowspan="2" class="color-5">Gms</td>
</tr>
<tr class="color-5 negri a-bottom">
<td class="negri">Total</td>
<td>Per Game</td>
</tr>
<tr class="a-top ">
<td class="a-center negri ">1</td>
<td class="a-left">
<div class="left negri " style="min-width: 150px">
<a href="/nba_players/stephen_curry.htm">Stephen Curry</a>
</div>
<div class="left margen-l2 " style="width:111px">(Warriors)</div>
</td>
<td class=" negri "><strong>337</strong></td>
<td>5.3</td>
<td class="">63</td>
</tr>
<tr class="a-top ">
<td class="a-center ">2</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/buddy_hield.htm">Buddy Hield</a>
</div>
<div class="left margen-l2 " style="width:111px">(Kings)</div>
</td>
<td class=""><strong>282</strong></td>
<td>4.0</td>
<td class="">71</td>
</tr>
<tr class="a-top ">
<td class="a-center ">3</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/damian_lillard.htm">Damian Lillard</a>
</div>
<div class="left margen-l2 " style="width:111px">(Trail Blazers)</div>
</td>
<td class=""><strong>275</strong></td>
<td>4.1</td>
<td class="">67</td>
</tr>
<tr class="a-top ">
<td class="a-center ">4</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/duncan_robinson.htm">Duncan Robinson</a>
</div>
<div class="left margen-l2 " style="width:111px">(Heat)</div>
</td>
<td class=""><strong>250</strong></td>
<td>3.5</td>
<td class="">72</td>
</tr>
<tr class="a-top ">
<td class="a-center ">5</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/terry_rozier.htm">Terry Rozier</a>
</div>
<div class="left margen-l2 " style="width:111px">(Hornets)</div>
</td>
<td class=""><strong>222</strong></td>
<td>3.2</td>
<td class="">69</td>
</tr>
<tr class="a-top ">
<td class="a-center ">6</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/joe_harris.htm">Joe Harris</a>
</div>
<div class="left margen-l2 " style="width:111px">(Nets)</div>
</td>
<td class=""><strong>211</strong></td>
<td>3.1</td>
<td class="">69</td>
</tr>
<tr class="a-top ">
<td class="a-center ">7</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/jordan_clarkson.htm">Jordan Clarkson</a>
</div>
<div class="left margen-l2 " style="width:111px">(Jazz)</div>
</td>
<td class=""><strong>208</strong></td>
<td>3.1</td>
<td class="">68</td>
</tr>
<tr class="a-top ">
<td class="a-center ">8</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/tim_hardaway_jr.htm">Tim Hardaway Jr.</a>
</div>
<div class="left margen-l2 " style="width:111px">(Mavericks)</div>
</td>
<td class=""><strong>207</strong></td>
<td>3.0</td>
<td class="">70</td>
</tr>
<tr class="a-top ">
<td class="a-center ">9</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/zach_lavine.htm">Zach LaVine</a>
</div>
<div class="left margen-l2 " style="width:111px">(Bulls)</div>
</td>
<td class=""><strong>200</strong></td>
<td>3.4</td>
<td class="">58</td>
</tr>
<tr class="a-top ">
<td class="a-center ">10</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/luka_doncic.htm">Luka Doncic</a>
</div>
<div class="left margen-l2 " style="width:111px">(Mavericks)</div>
</td>
<td class=""><strong>192</strong></td>
<td>2.9</td>
<td class="">66</td>
</tr>
<tr class="a-top ">
<td class="a-center ">11</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/jayson_tatum.htm">Jayson Tatum</a>
</div>
<div class="left margen-l2 " style="width:111px">(Celtics)</div>
</td>
<td class=""><strong>187</strong></td>
<td>2.9</td>
<td class="">64</td>
</tr>
<tr class="a-top ">
<td class="a-center ">12</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/joe_ingles.htm">Joe Ingles</a>
</div>
<div class="left margen-l2 " style="width:111px">(Jazz)</div>
</td>
<td class=""><strong>183</strong></td>
<td>2.7</td>
<td class="">67</td>
</tr>
<tr class="a-top ">
<td class="a-center ">13</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/bojan_bogdanovic.htm">Bojan Bogdanovic</a>
</div>
<div class="left margen-l2 " style="width:111px">(Jazz)</div>
</td>
<td class=""><strong>180</strong></td>
<td>2.5</td>
<td class="">72</td>
</tr>
<tr class="a-top ">
<td class="a-center ">14</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/devonte_graham.htm">Devonte' Graham</a>
</div>
<div class="left margen-l2 " style="width:111px">(Hornets)</div>
</td>
<td class=""><strong>179</strong></td>
<td>3.3</td>
<td class="">55</td>
</tr>
<tr class="a-top ">
<td class="a-center ">15</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/donovan_mitchell.htm">Donovan Mitchell</a>
</div>
<div class="left margen-l2 " style="width:111px">(Jazz)</div>
</td>
<td class=""><strong>178</strong></td>
<td>3.4</td>
<td class="">53</td>
</tr>
<tr class="a-top ">
<td class="a-center ">16</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/nikola_vucevic.htm">Nikola Vucevic</a>
</div>
<div class="left margen-l2 " style="width:111px">(2 teams)</div>
</td>
<td class=""><strong>176</strong></td>
<td>2.5</td>
<td class="">70</td>
</tr>
<tr class="a-top ">
<td class="a-center ">17</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/saddiq_bey.htm">Saddiq Bey</a>
</div>
<div class="left margen-l2 " style="width:111px">(Pistons)</div>
</td>
<td class=""><strong>175</strong></td>
<td>2.5</td>
<td class="">70</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/danny_green.htm">Danny Green</a>
</div>
<div class="left margen-l2 " style="width:111px">(76ers)</div>
</td>
<td class=""><strong>175</strong></td>
<td>2.5</td>
<td class="">69</td>
</tr>
<tr class="a-top ">
<td class="a-center ">19</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/fred_vanvleet.htm">Fred VanVleet</a>
</div>
<div class="left margen-l2 " style="width:111px">(Raptors)</div>
</td>
<td class=""><strong>174</strong></td>
<td>3.3</td>
<td class="">52</td>
</tr>
<tr class="a-top ">
<td class="a-center ">20</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/justin_holiday.htm">Justin Holiday</a>
</div>
<div class="left margen-l2 " style="width:111px">(Pacers)</div>
</td>
<td class=""><strong>173</strong></td>
<td>2.4</td>
<td class="">72</td>
</tr>
<tr class="a-top ">
<td class="a-center ">21</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/lonzo_ball.htm">Lonzo Ball</a>
</div>
<div class="left margen-l2 " style="width:111px">(Pelicans)</div>
</td>
<td class=""><strong>172</strong></td>
<td>3.1</td>
<td class="">55</td>
</tr>
<tr class="a-top ">
<td class="a-center ">22</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/anthony_edwards.htm">Anthony Edwards</a>
</div>
<div class="left margen-l2 " style="width:111px">(Timberwolves)</div>
</td>
<td class=""><strong>171</strong></td>
<td>2.4</td>
<td class="">72</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/paul_george.htm">Paul George</a>
</div>
<div class="left margen-l2 " style="width:111px">(Clippers)</div>
</td>
<td class=""><strong>171</strong></td>
<td>3.2</td>
<td class="">54</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/norman_powell.htm">Norman Powell</a>
</div>
<div class="left margen-l2 " style="width:111px">(2 teams)</div>
</td>
<td class=""><strong>171</strong></td>
<td>2.5</td>
<td class="">69</td>
</tr>
<tr class="a-top ">
<td class="a-center ">25</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/michael_porter_jr.htm">Michael Porter Jr.</a>
</div>
<div class="left margen-l2 " style="width:111px">(Nuggets)</div>
</td>
<td class=""><strong>170</strong></td>
<td>2.8</td>
<td class="">61</td>
</tr>
<tr class="a-top ">
<td class="a-center ">26</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/davis_bertans.htm">Davis Bertans</a>
</div>
<div class="left margen-l2 " style="width:111px">(Wizards)</div>
</td>
<td class=""><strong>169</strong></td>
<td>3.0</td>
<td class="">57</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/cj_mccollum.htm">C.J. McCollum</a>
</div>
<div class="left margen-l2 " style="width:111px">(Trail Blazers)</div>
</td>
<td class=""><strong>169</strong></td>
<td>3.6</td>
<td class="">47</td>
</tr>
<tr class="a-top ">
<td class="a-center ">28</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/gary_trent_jr.htm">Gary Trent Jr.</a>
</div>
<div class="left margen-l2 " style="width:111px">(2 teams)</div>
</td>
<td class=""><strong>165</strong></td>
<td>2.8</td>
<td class="">58</td>
</tr>
<tr class="a-top ">
<td class="a-center ">29</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/jaylen_brown.htm">Jaylen Brown</a>
</div>
<div class="left margen-l2 " style="width:111px">(Celtics)</div>
</td>
<td class=""><strong>163</strong></td>
<td>2.8</td>
<td class="">58</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/reggie_bullock.htm">Reggie Bullock</a>
</div>
<div class="left margen-l2 " style="width:111px">(Knicks)</div>
</td>
<td class=""><strong>163</strong></td>
<td>2.5</td>
<td class="">65</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/coby_white.htm">Coby White</a>
</div>
<div class="left margen-l2 " style="width:111px">(Bulls)</div>
</td>
<td class=""><strong>163</strong></td>
<td>2.4</td>
<td class="">69</td>
</tr>
<tr class="a-top ">
<td class="a-center ">32</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/patty_mills.htm">Patty Mills</a>
</div>
<div class="left margen-l2 " style="width:111px">(Spurs)</div>
</td>
<td class=""><strong>161</strong></td>
<td>2.4</td>
<td class="">68</td>
</tr>
<tr class="a-top ">
<td class="a-center ">33</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/julius_randle.htm">Julius Randle</a>
</div>
<div class="left margen-l2 " style="width:111px">(Knicks)</div>
</td>
<td class=""><strong>160</strong></td>
<td>2.3</td>
<td class="">71</td>
</tr>
<tr class="a-top ">
<td class="a-center ">34</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/bryn_forbes.htm">Bryn Forbes</a>
</div>
<div class="left margen-l2 " style="width:111px">(Bucks)</div>
</td>
<td class=""><strong>154</strong></td>
<td>2.2</td>
<td class="">70</td>
</tr>
<tr class="a-top ">
<td class="a-center ">35</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/kyrie_irving.htm">Kyrie Irving</a>
</div>
<div class="left margen-l2 " style="width:111px">(Nets)</div>
</td>
<td class=""><strong>152</strong></td>
<td>2.8</td>
<td class="">54</td>
</tr>
<tr class="a-top ">
<td class="a-center ">36</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/khris_middleton.htm">Khris Middleton</a>
</div>
<div class="left margen-l2 " style="width:111px">(Bucks)</div>
</td>
<td class=""><strong>151</strong></td>
<td>2.2</td>
<td class="">68</td>
</tr>
<tr class="a-top ">
<td class="a-center ">37</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/jae_crowder.htm">Jae Crowder</a>
</div>
<div class="left margen-l2 " style="width:111px">(Suns)</div>
</td>
<td class=""><strong>148</strong></td>
<td>2.5</td>
<td class="">60</td>
</tr>
<tr class="a-top ">
<td class="a-center ">38</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/bogdan_bogdanovic.htm">Bogdan Bogdanovic</a>
</div>
<div class="left margen-l2 " style="width:111px">(Hawks)</div>
</td>
<td class=""><strong>146</strong></td>
<td>3.3</td>
<td class="">44</td>
</tr>
<tr class="a-top ">
<td class="a-center ">39</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/malcolm_brogdon.htm">Malcolm Brogdon</a>
</div>
<div class="left margen-l2 " style="width:111px">(Pacers)</div>
</td>
<td class=""><strong>145</strong></td>
<td>2.6</td>
<td class="">56</td>
</tr>
<tr class="a-top ">
<td class="a-center ">40</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/brandon_ingram.htm">Brandon Ingram</a>
</div>
<div class="left margen-l2 " style="width:111px">(Pelicans)</div>
</td>
<td class=""><strong>143</strong></td>
<td>2.3</td>
<td class="">61</td>
</tr>
<tr class="a-top ">
<td class="a-center ">41</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/kevin_huerter.htm">Kevin Huerter</a>
</div>
<div class="left margen-l2 " style="width:111px">(Hawks)</div>
</td>
<td class=""><strong>140</strong></td>
<td>2.0</td>
<td class="">69</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/marcus_morris.htm">Marcus Morris</a>
</div>
<div class="left margen-l2 " style="width:111px">(Clippers)</div>
</td>
<td class=""><strong>140</strong></td>
<td>2.5</td>
<td class="">57</td>
</tr>
<tr class="a-top ">
<td class="a-center "></td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/andrew_wiggins.htm">Andrew Wiggins</a>
</div>
<div class="left margen-l2 " style="width:111px">(Warriors)</div>
</td>
<td class=""><strong>140</strong></td>
<td>2.0</td>
<td class="">71</td>
</tr>
<tr class="a-top ">
<td class="a-center ">44</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/mike_conley.htm">Mike Conley</a>
</div>
<div class="left margen-l2 " style="width:111px">(Jazz)</div>
</td>
<td class=""><strong>138</strong></td>
<td>2.7</td>
<td class="">51</td>
</tr>
<tr class="a-top ">
<td class="a-center ">45</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/kyle_kuzma.htm">Kyle Kuzma</a>
</div>
<div class="left margen-l2 " style="width:111px">(Lakers)</div>
</td>
<td class=""><strong>137</strong></td>
<td>2.0</td>
<td class="">68</td>
</tr>
<tr class="a-top ">
<td class="a-center ">46</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/trae_young.htm">Trae Young</a>
</div>
<div class="left margen-l2 " style="width:111px">(Hawks)</div>
</td>
<td class=""><strong>136</strong></td>
<td>2.2</td>
<td class="">63</td>
</tr>
<tr class="a-top ">
<td class="a-center ">47</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/robert_covington.htm">Robert Covington</a>
</div>
<div class="left margen-l2 " style="width:111px">(Trail Blazers)</div>
</td>
<td class=""><strong>135</strong></td>
<td>1.9</td>
<td class="">70</td>
</tr>
<tr class="a-top ">
<td class="a-center ">48</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/mikal_bridges.htm">Mikal Bridges</a>
</div>
<div class="left margen-l2 " style="width:111px">(Suns)</div>
</td>
<td class=""><strong>134</strong></td>
<td>1.9</td>
<td class="">72</td>
</tr>
<tr class="a-top ">
<td class="a-center ">49</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
<a href="/nba_players/carmelo_anthony.htm">Carmelo Anthony</a>
</div>
<div class="left margen-l2 " style="width:111px">(Trail Blazers)</div>
</td>
<td class=""><strong>133</strong></td>
<td>1.9</td>
<td class="">69</td>
</tr>
<tr class="a-top ">
<td class="a-center ">50</td>
<td class="a-left">
<div class="left " style="min-width: 150px">
Here is how you can save data in an Excel file:
import xlsxwriter
import mysql.connector
Here is the main parser:
def parse(self, response):
for x in response.xpath("//table//tr[@class='a-top']"):
serial_code = x.xpath("./td[1]/text()").get()
team_name = x.xpath("./td[2]/div[1]/a/text()").get()
team_href = x.xpath("./td[2]/div[1]/a/@href").get()
team_title = x.xpath("./td[2]/div[2]/text()").get()
total_points = x.xpath("./td[3]/strong/text()").get()
per_game = x.xpath("./td[4]/text()").get()
total_games = x.xpath("./td[5]/text()").get()
response.meta['serial_code'] = serial_code
response.meta['team_name'] = team_name
response.meta['team_href'] = team_href
response.meta['team_title'] = team_title
response.meta['total_points'] = total_points
response.meta['per_game'] = per_game
response.meta['total_games'] = total_games
yield response.meta
Here is the pipeline.py:
class PlayersExcelPipeline(object):
rowNumb = 0
workbook = xlsxwriter.Workbook('Players-Data.xlsx')
worksheet = workbook.add_worksheet("Final Data")
worksheet.set_zoom(80)
def __init__(self):
self.rowNumb += 1
self.worksheet.write( "A%s" % self.rowNumb, "serial_code" )
self.worksheet.write( "B%s" % self.rowNumb, "team_name" )
self.worksheet.write( "C%s" % self.rowNumb, "team_href" )
self.worksheet.write( "D%s" % self.rowNumb, "team_title" )
self.worksheet.write( "E%s" % self.rowNumb, "total_points" )
self.worksheet.write( "F%s" % self.rowNumb, "per_game" )
self.worksheet.write( "G%s" % self.rowNumb, "total_games" )
def process_item(self, item, spider):
self.rowNumb += 1
self.worksheet.write_string( "A%s" % self.rowNumb, item["serial_code"] )
self.worksheet.write_string( "B%s" % self.rowNumb, item["team_name"] )
self.worksheet.write_string( "C%s" % self.rowNumb, item["team_href"] )
self.worksheet.write_string( "D%s" % self.rowNumb, item["team_title"] )
self.worksheet.write_string( "E%s" % self.rowNumb, item["total_points"] )
self.worksheet.write_string( "F%s" % self.rowNumb, item["per_game"] )
self.worksheet.write_string( "G%s" % self.rowNumb, item["total_games"] )
def __del__(self):
self.workbook.close()
Once you visualize how data is saved in an Excel file, it should be relatively easier to save same in MySQL database. Here is how it may look:
class PlayersMySQLPipeline(object):
credentials = ["localhost","user","password","database"]
def __init__(self):
self.mydb = mysql.connector.connect(host=self.credentials[0], user=credentials[1], passwd=self.credentials[2], database=self.credentials[3])
self.mycursor = self.mydb.cursor()
def process_item(self, item, spider):
sql = """your insert sql query goes here"""
self.mycursor.execute(sql)
self.mydb.commit()
print ("record inserted")
def __del__(self):
self.mycursor.close()