rubyparsingweb-scrapingnokogiri

How to merge 3 hashes?


I have been trying to get some information from a table into a hash so this is the code I have a HTML table like below, and Im extracting party_names and types and merging them in the single hash. Now I need to merge another hash with party addresses. I am able to get the address but the table structure is a bit unusual so I'm not sure how to merge the party address with the party names the one who has the address.

    require 'nokogiri'

html = '    <table class="detailRecordTable"><tbody><tr>
                                                        <td width="3%" class="detailSeperator" style="width:3%;"></td>
                                                        <td width="30%" class="detailSeperator" style="width:30%;text-align:left">
                                                                SMALL   , DANIEL, Appellant&nbsp;&nbsp;&nbsp            </td>       <td width="20%" class="detailSeperator" style="width:20%;font-weight: normal">  represented by&nbsp;&nbsp;&nbsp;
                                                        </td>
                                                        <td width="47%" class="detailSeperator" style="width:47%;text-align:left">
                                                                KELLY   , MARK EDWARD
                                                                , Attorney for Appellant
                                                        </td>
                                                    </tr>
                                                    <tr>
                                                        <td width="3%" class="detailData" style="width:3%;text-align:right">
                                                        </td>
                                                        <td width="30%" class="detailData">

                                                        </td>   <td width="20%" class="detailData">

                                                        </td><td width="47%" class="detailData">
                                                                    134 N WATER STREET<br>
                                                                    LIBERTY,
                                                                    MO
                                                                    64068<br>   <br>
                                                            <p></p>
                                                        </td>
                                                    </tr>
                                                    <tr>
                                                        <td width="3%" class="detailData">&nbsp;</td>
                                                        <td width="30%" class="detailData">&nbsp;</td>
                                                        <td width="20%" class="detailData">&nbsp;</td>
                                                        <td width="47%" class="detailData"></td>
                                                    </tr>

                                                    <tr>
                                                        <td class="detailSeperator" style="width:3%;text-align:right"></td>
                                                        <td class="detailSeperator" style="width:30%;text-align:left"></td>
                                                        <td class="detailSeperator" style="width:20%;font-weight: normal">co-counsel</td>
                                                        <td class="detailSeperator" style="width:47%;text-align:left">
                                                            PITTMAN     , KRISTI LANAE  , Co-Counsel for Appellant</td>
                                                    </tr>
                                                    <tr>
                                                        <td width="3%" class="detailData">&nbsp;</td>
                                                        <td width="30%" class="detailData">&nbsp;</td>
                                                        <td width="20%" class="detailData">&nbsp;</td>
                                                        <td width="47%" class="detailData">
                                                                134 NORTH WATER STREET<br>
                                                                LIBERTY,
                                                                MO
                                                                64068<br>               <br>

                                                        </td>
                                                    </tr>
                                                    <tr>
                                                        <td width="3%" class="detailSeperator" style="width:3%;">&nbsp;
                                                        </td>
                                                        <td width="30%" class="detailSeperator" style="width:30%;text-align:left">
                                                                RED SIMPSON, INC.
                                                                , Respondent&nbsp;&nbsp;&nbsp;
                                                        </td>

                                                        <td width="20%" class="detailSeperator" style="width:20%;font-weight: normal">  represented by&nbsp;&nbsp;&nbsp;
                                                        </td>
        <td width="47%" class="detailSeperator" style="width:47%;text-align:left">
                                                                GREENWALD   , DOUGLAS   MARK
                                                                , Attorney for Respondent
                                                        </td>
                                                    </tr>
                                                    <tr>
                                                        <td width="3%" class="detailData" style="width:3%;text-align:right">
                                                        </td>
                                                        <td width="30%" class="detailData">
                                                        </td>
                                                        <td width="20%" class="detailData">

                                                        </td>

                                                        <td width="47%" class="detailData">

                                                                    10 EAST CAMBRIDGE CIRCLE DRIVE<br>
                                                                    KANSAS CITY,
                                                                    KS
                                                                    66103<br><br>
                                                            <p></p>
                                                        </td>
                                                    </tr>
                                                    <tr>
                                                        <td width="3%" class="detailData">&nbsp;</td>
                                                        <td width="30%" class="detailData">&nbsp;</td>
                                                        <td width="20%" class="detailData">&nbsp;</td>
                                                        <td width="47%" class="detailData"></td>
                                                    </tr>

                                                    <tr>
                                                        <td class="detailSeperator" style="width:3%;text-align:right"></td>
                                                        <td class="detailSeperator" style="width:30%;text-align:left"></td>
                                                        <td class="detailSeperator" style="width:20%;font-weight: normal">co-counsel</td>
                                                        <td class="detailSeperator" style="width:47%;text-align:left">
                                                            BENJAMIN, SAMANTHA  NICOLE
                                                            , Co-Counsel for Respondent</td>
                                                    </tr>
                                                    <tr>
                                                        <td width="3%" class="detailData">&nbsp;</td>
                                                        <td width="30%" class="detailData">&nbsp;</td>
                                                        <td width="20%" class="detailData">&nbsp;</td>
                                                        <td width="47%" class="detailData">

                                                                MCANANY VAN CLEVE AND PHILLIPS<br>

                                                                10 E CAMBRIDGE CIRCLE DR<br>

                                                                STE 300<br>

                                                                KANSAS CITY,
                                                                KS
                                                                66103<br>
                                                            <b>Business: </b>
                                                            (913)
                                                            573-3319 <br>   <br>

                                                        </td>
                                                    </tr>

                                </tbody></table>'

doc = Nokogiri::HTML(html)
rows = doc.xpath("//table[@class='detailRecordTable']//tr")
# address2 = doc.css('td:nth-of-type(4)').text.strip

# puts address2

@party_names = []
@party_types = []
@party_des = []

rows.each do |row|
  nodes = row.css('.detailSeperator:nth-of-type(2), .detailSeperator:nth-of-type(4)')
  nodes.each do |node|
    name = node.text.strip.gsub("\n", '').gsub("\t", '')
    parts = name.split(',')
    name = if parts.length == 3
             "#{parts[0]}, #{parts[1]}"
           else
             parts[0]
           end
    party_type = parts[-1].strip if parts && parts.length >= 2
    addr = ("#{parts[0]}, #{parts[1]}" if parts.length == 2)
    @party_names << name
    @party_types << party_type
    @party_des   <<  addr
  end

  address = row.css('td:nth-of-type(2),td:nth-of-type(4)')
  address.each do |node|
    addr = node.text.strip.gsub("\n", '').gsub("\t", '')
    parts = addr.split(',')
    addr = ("#{parts[0]}, #{parts[1]}" if parts.length == 2)
    @party_des << addr
  end
end
@party_names.compact!
@party_names.reject(&:empty?)
@party_types.compact!
@party_des.compact!        
@party_names_and_types = @party_names.zip(@party_types).map { |name, type| { part_name: name, party_type: type } }

The out put I have currrently is like this

{:part_name=>"SMALL,  DANIEL", :party_type=>"Appellant  &nbsp"}
{:part_name=>"KELLY,  MARK EDWARD", :party_type=>"Attorney for Appellant"}
{:part_name=>"PITTMAN,  KRISTI LANAE", :party_type=>"Co-Counsel for Appellant"}
{:part_name=>"RED SIMPSON,  INC.", :party_type=>"Respondent   "}
{:part_name=>"GREENWALD,  DOUGLASMARK", :party_type=>"Attorney for Respondent"}
{:part_name=>"BENJAMIN,  SAMANTHA NICOLE", :party_type=>"Co-Counsel for Respondent"}

how I am able to get the party address but how can I merge it with @party_names_and_types so I have the output like this

{:part_name=>"SMALL,  DANIEL", :party_type=>"Appellant  &nbsp"}
{:part_name=>"KELLY,  MARK EDWARD", :party_type=>"Attorney for Appellant", :party_address => "134 N WATER STREETLIBERTY,MO 64068"}
{:part_name=>"PITTMAN,  KRISTI LANAE", :party_type=>"Co-Counsel for Appellant",:party_address => "134 N WATER STREETLIBERTY,MO 64068"}
{:part_name=>"RED SIMPSON,  INC.", :party_type=>"Respondent  "}
{:part_name=>"GREENWALD,  DOUGLASMARK", :party_type=>"Attorney for Respondent", :party_address => " 10 EAST CAMBRIDGE CIRCLE DRIVE KANSAS CITY,KS 66103"}
{:part_name=>"BENJAMIN,  SAMANTHA NICOLE", :party_type=>"Co-Counsel for Respondent", :party_address => "    MCANANY VAN CLEVE AND PHILLIPS  10 E CAMBRIDGE CIRCLE DR STE 300 KANSAS CITY,KS 66103", :party_des => "Business:(913) 573-3319"}


                                                    

Solution

  • You were right about the table structure being "a bit unusual". The logic that you implemented, I won't say it was wrong, but for this table, I won't go with it since the associated values (like party name and party address) were in different rows.

    Here is the code that I wrote to get the expected output as mentioned by you

    require 'nokogiri'
    
    # html = 'your provided html code...'
    
    doc = Nokogiri::HTML(html)
    rows = doc.xpath("//table[@class='detailRecordTable']//tr")
    
    @party_names_and_types = []
    
    start = 0
    step  = 5
    
    def format_text(text)
      text.strip.gsub("  ", "").gsub("\n", ' ').gsub("\t", '')
    end
    
    def get_party_name_and_type(text)
      parts = text.split(',')
      name = parts.length == 3 ? "#{parts[0]}, #{parts[1]}" : parts[0]
      party_type = format_text(parts[-1].strip) if parts && parts.length >= 2
      { party_name: name, party_type: party_type }
    end
    
    while start < rows.count
      data_rows = rows.slice(start, step)
      [0, 3].each do |row_num|
        if row_num == 0
          [1, 3].each do |col_num|
            party_details = get_party_name_and_type(
              format_text(data_rows[row_num].children.filter("td")[col_num].text)
            )
            address = data_rows[row_num+1].children.filter("td")[3].text if col_num == 3
            party_details[:party_address] = format_text(address) unless address.nil? || address.empty?
            @party_names_and_types << party_details
          end
        else
          party_details = get_party_name_and_type(
            format_text(data_rows[3].children.filter("td")[3].text)
          )
          address = data_rows[row_num+1].children.filter("td")[3].text
          party_details[:party_address] = format_text(address) unless address.nil? || address.empty?
          @party_names_and_types << party_details
        end
      end
      start += step
    end
    
    puts "======@party_names_and_types======"
    puts @party_names_and_types
    
    

    Output:

    ======@party_names_and_types======
    {:party_name=>"SMALL ,  DANIEL", :party_type=>"Appellant  &nbsp"}
    {:party_name=>"KELLY ,  MARK EDWARD ", :party_type=>"Attorney for Appellant", :party_address=>"134 N WATER STREETLIBERTY, MO 64068"}
    {:party_name=>"PITTMAN ,  KRISTI LANAE ", :party_type=>"Co-Counsel for Appellant", :party_address=>"134 NORTH WATER STREETLIBERTY, MO 64068"}
    {:party_name=>"RED SIMPSON,  INC. ", :party_type=>"Respondent   "}
    {:party_name=>"GREENWALD ,  DOUGLAS MARK ", :party_type=>"Attorney for Respondent", :party_address=>"10 EAST CAMBRIDGE CIRCLE DRIVEKANSAS CITY, KS 66103"}
    {:party_name=>"BENJAMIN,  SAMANTHA NICOLE ", :party_type=>"Co-Counsel for Respondent", :party_address=>"MCANANY VAN CLEVE AND PHILLIPS10 E CAMBRIDGE CIRCLE DRSTE 300KANSAS CITY, KS 66103  Business:(913) 573-3319"}
    

    I'll update the answer to explain the logic in some time.
    Hope this helps.