rubyxmlrexml

REXML how to get the first sibling in a schema


I am attempting to parse the information available in MLB GameDay. Here is an example file that I'm working with:

http://gd2.mlb.com/components/game/mlb/year_2014/month_04/day_05/gid_2014_04_05_anamlb_houmlb_1/game_events.xml

Specifically, here's an example XML output:

<bottom>
<action b="0" s="0" o="0" des="Kole Calhoun remains in the game as the right fielder. " des_es="Kole Calhoun permanece en el juego como el jardinero derecho. " event="Defensive Switch" tfs="010840" tfs_zulu="2014-04-06T01:08:40Z" player="594777" pitch="5"/>
<atbat num="50" b="4" s="0" o="0" start_tfs="010846" start_tfs_zulu="2014-04-06T01:08:46Z" batter="514888" pitcher="572140" des="Jose Altuve walks. " des_es="Jose Altuve recibe base por bolas. " event="Walk" b1="514888" b2="" b3="">
<pitch sv_id="140405_200702" des="Ball" des_es="Bola mala" type="B" start_speed="90.6" pitch_type="FT"/>
<pitch sv_id="140405_200718" des="Ball" des_es="Bola mala" type="B" start_speed="90.8" pitch_type="FF"/>
<pitch sv_id="140405_200738" des="Ball" des_es="Bola mala" type="B" start_speed="91.5" pitch_type="FT"/>
<pitch sv_id="140405_200757" des="Ball" des_es="Bola mala" type="B" start_speed="90.2" pitch_type="FF"/>
</atbat>
<atbat num="51" b="1" s="1" o="2" start_tfs="011113" start_tfs_zulu="2014-04-06T01:11:13Z" batter="461882" pitcher="572140" des="Jesus Guzman grounds into a double play, third baseman John McDonald to second baseman Howie Kendrick to first baseman Albert Pujols. Jose Altuve out at 2nd. " des_es="Jesus Guzman batea rodado batea para doble matanza, tercera base John McDonald a segunda base Howie Kendrick a primera base Albert Pujols. Jose Altuve a cabo a 2da. " event="Grounded Into DP" b1="" b2="" b3="">
<pitch sv_id="140405_200849" des="Called Strike" des_es="Strike cantado" type="S" start_speed="90.2" pitch_type="FT"/>
<pitch sv_id="140405_200915" des="Ball" des_es="Bola mala" type="B" start_speed="74.0" pitch_type="CU"/>
<pitch sv_id="140405_200941" des="In play, out(s)" des_es="En juego, out(s)" type="X" start_speed="83.7" pitch_type="CH"/>
</atbat>
<atbat num="52" b="2" s="3" o="3" start_tfs="011242" start_tfs_zulu="2014-04-06T01:12:42Z" batter="474892" pitcher="572140" des="Chris Carter called out on strikes. " des_es="Chris Carter se poncha sin tirarle. " event="Strikeout" b1="" b2="" b3="">
<pitch sv_id="140405_201027" des="Ball" des_es="Bola mala" type="B" start_speed="73.5" pitch_type="CU"/>
<pitch sv_id="140405_201044" des="Ball" des_es="Bola mala" type="B" start_speed="91.2" pitch_type="FT"/>
<pitch sv_id="140405_201111" des="Called Strike" des_es="Strike cantado" type="S" start_speed="91.1" pitch_type="FT"/>
<pitch sv_id="140405_201132" des="Swinging Strike" des_es="Strike tirĂ¡ndole" type="S" start_speed="91.6" pitch_type="FT"/>
<pitch sv_id="140405_201155" des="Called Strike" des_es="Strike cantado" type="S" start_speed="92.5" pitch_type="FT"/>
</atbat>
</bottom>

I am trying to parse the first sibling <atbat> markup that follows an <action> tag. Here's how I grab all of the <action> tags I need:

 def set_bottom_actions
      @xml_doc.elements.each("inning/bottom/action") { |element| 
       action = Action.new
       action.init(element, @gid, @num)
       @bottom_actions.push action
     }
    end

Ideally, Action#init would have a new initializer called atbat. The question is, how do I grab the first atbat tag that follows the action I'm instantiating?

In the example above, if I were to grab the first action, it should also be able to grab the next atbat sibling which is the following:

<atbat num="50" b="4" s="0" o="0" start_tfs="010846" start_tfs_zulu="2014-04-06T01:08:46Z" batter="514888" pitcher="572140" des="Jose Altuve walks. " des_es="Jose Altuve recibe base por bolas. " event="Walk" b1="514888" b2="" b3="">

Solution

  • This is untested (I use and would recommend using Nokogiri for parsing XML), but should work:

    @xml_doc.elements.each("inning/bottom/action") { |element| 
      at_bat = REXML::XPath.first(elem, 'following-sibling::atbat[1]')
      action = Action.new
      action.init(element, at_bat, @gid, @num)
      @bottom_actions.push action
    }
    # ...