Hello stackoverflow I am having continuing problems with XML and Ruby.
# rexml
xmlfile = File.new("sample.xml")
xmldoc = REXML::Document.new xmlfile
root = xmldoc.root
# count = 0
XPath.each( xmldoc, "//CRDoc/speech/speaking") do |element|
# puts element.attributes['name']
# puts element.text
File.open(file_name + "_" + element.attributes['name'] + "-" + year + ".xml", 'a+') do |f|
f.write("<speaker>" + element.attributes['name'] + "</speaker>")
f.write("<speech>" + doc.xpath("//speech/speaking[@name='#{element.attributes['name']}']").text + "</speech>" + "\n")
# f.write("<speaker>" + element.attributes['name'] + "</speaker>")
# f.write("<speech>" + doc.xpath('//CRDoc/speech/speaking').text + "</speech>" + "\n")
#f.wrtie("<speech>" + doc.xpath("//CRDoc/speech/speaking[@name=#{element.attributes['name']}]").text + "</speech>" + "\n")
end
end
The above code reads in an XML file that will be included in this post. The problem I am having with this code is it selects the same speech over and over, as opposed to writing each individual speech in the XML file to a new XML file.
<?xml version="1.0" encoding="UTF-8"?>
<CRDoc>
<volume>141</volume>
<number>1</number>
<weekday>Wednesday</weekday>
<month>January</month>
<day>4</day>
<year>1995</year>
<chamber>House</chamber>
<pages>H3</pages>
<congress>104</congress>
<session>1</session>
<document_title>(Applause, the Members rising.)</document_title>
<title>TRIBUTE TO THE HONORABLE DONNALD K. ANDERSON</title>
<recorder>(Mr. BOEHNER asked and was given permission to address the House for
1 minute.)</recorder>
<speech><speaker name="Mr. BOEHNER">Mr. BOEHNER</speaker>.<speaking name="Mr. BOEHNER">Mr. Clerk, before we proceed with the nominations for
Speaker of the House, on behalf of Republican Members of the House, we
want to thank you for your 35 years of service to this institution, and
your 35 years of service to the American people. You have done your job
ably on behalf of all Members on both sides of the aisle.</speaking>
<speaking name="Mr. BOEHNER">And to the other officers of the House, who have served the House so
ably and the American people so ably, we want to thank them as well for
their service in this House.</speaking>
<speaking name="Mr. BOEHNER">Farewell, and best wishes from all of us.</speaking></speech>
<speech> <speaker name="Mr. FAZIO">Mr. FAZIO</speaker>.<speaking name="Mr. FAZIO">Will the gentleman yield?</speaking></speech>
<speech><speaker name="Mr. BOEHNER">Mr. BOEHNER</speaker>.<speaking name="Mr. BOEHNER">I yield to my friend, the gentleman from California [Mr.
Fazio].</speaking></speech>
<speech> <speaker name="Mr. FAZIO">Mr. FAZIO</speaker>.<speaking name="Mr. FAZIO">I appreciate my friend yielding.</speaking>
<speaking name="Mr. FAZIO">I, too, would like to add a few words of tribute to our friend.</speaking>
<speaking name="Mr. FAZIO">When the 103d Congress came to an official close on noon Tuesday, the
House literally lived on for the next 24 hours in the person of the
gentleman from Sacramento, CA, the Clerk of the House, Donnald K.
Anderson. In serving as the first presiding officer for the purpose of
organizing the 104th Congress, he fulfilled his last ministerial duty
to this institution. After four successive terms as Clerk and a career
with the House that began as a Page when Dwight Eisenhower was
President and Sam Rayburn sat in the Speaker's chair, Donn Anderson now
leaves a distinguished career of public service.</speaking>
<speaking name="Mr. FAZIO">On a personal level for many of us in this Chamber, it was only
natural for Donn Anderson to have been the thread of continuity from
one Congress to the next. For over 30 years, Donn has embodied every
good virtue of this House. He has been its memory, its defender, its
champion and often its conscience. He understood perhaps better than
anyone here the meaning of the word ``bipartisanship'' and he lived it
daily in his work with the Members. In his 8 years as the second
highest ranking officer of the House, he worked tirelessly to move the
House into the information age and so greatly benefited our
constituents, the American people.</speaking>
<speaking name="Mr. FAZIO">As chairman of the Subcommittee on Legislative Appropriations, I
looked forward to our annual ritual of hearings knowing that I could
always count on the Clerk for the most splendid testimony. Although
Donn himself admitted to his preference for Victorian manners, there
was nothing old-fashioned about the direction of his office. He was
thoroughly modern in his vision for the future of the House, and he
fought hard to keep us current with the times. Just as Donn could
explain the artistic nuances of paintings in the Rotunda, he could just
as easily give you the technical lowdown of cameras in this Chamber and
on this floor. As the House moves forward today with the institutional
reforms and the reorganization, we do so with the solid foundation left
behind by Donn Anderson.</speaking>
<speaking name="Mr. FAZIO">Perhaps in parting we can borrow a phrase from our late and great
Speaker Tip O'Neill. He simply said on so many occasions, ``So long,
old pal.''</speaking>
<speaking name="Mr. FAZIO">Thank you, Donn Anderson.</speaking></speech>
</CRDoc>
So if one was to execute the above code it would create 1 XML file for each speaker in the sample. Then it will create several speech/speaker nodes with the SAME speech. What I don't understand is why REXML is not traversing the nodes and putting in each speech as opposed to the same speech over and over.
I am sure there is a better way to write the code but I am new to working with XML and XPath.
Thanks!
The expected output would be in the BOEHNER.xml file:
<speech><speaker name="Mr. BOEHNER">Mr. BOEHNER</speaker>.<speaking name="Mr. BOEHNER">Mr. Clerk, before we proceed with the nominations for
Speaker of the House, on behalf of Republican Members of the House, we
want to thank you for your 35 years of service to this institution, and
your 35 years of service to the American people. You have done your job
ably on behalf of all Members on both sides of the aisle.</speaking>
<speaking name="Mr. BOEHNER">And to the other officers of the House, who have served the House so
ably and the American people so ably, we want to thank them as well for
their service in this House.</speaking>
<speaking name="Mr. BOEHNER">Farewell, and best wishes from all of us.</speaking></speech>
<speech><speaker name="Mr. BOEHNER">Mr. BOEHNER</speaker>.<speaking name="Mr. BOEHNER">I yield to my friend, the gentleman from California [Mr.
Fazio].</speaking></speech>
As you can see Mr. Boehner has 4 different speeches in his XML file. This corresponds to the 4 different speeches in the sample.xml file as posted above.
So each speech in the sample.xml file goes to a new file with the speakers name.
<speaker>Mr. BOEHNER</speaker>
<speech>Mr. BOEHNERMr. Clerk, before we proceed with the nominations for
Speaker of the House, on behalf of Republican Members of the House, we
want to thank you for your 35 years of service to this institution, and
your 35 years of service to the American people. You have done your job
ably on behalf of all Members on both sides of the aisle.And to the other officers of the House, who have served the House so
ably and the American people so ably, we want to thank them as well for
their service in this House.Farewell, and best wishes from all of us.Mr. BOEHNERI yield to my friend, the gentleman from California [Mr.
Fazio].</speech>
It will be in the above format. Every speech made by a speaker will be placed in [speakername].xml In the above format.
I believe the code below will give you what you want:
doc.xpath("//speech/speaking/@name").map(&:text).uniq.each do |name|
File.open(file_name + "_" + name + "-" + year + ".xml", 'a+') do |f|
doc.xpath('//speech').each do |speech|
f.write '<speech>'
f.write "<speaker name=\"#{name}\">#{name}</speaker>."
speech.xpath("speaking[@name='#{name}']").each do |speaking|
f.write "<speaking name=\"#{name}\">#{speaking.text}</speaking>"
end
f.write '</speech>'
end
end
end
(note I'm using only nokogiri, not rexml...)
Also, you could actually use nokogiri to build you XMLs...
doc.xpath("//speech/speaking/@name").map(&:text).uniq.each do |name|
speaker = Nokogiri::XML('<root/>')
doc.xpath('//speech').each do |speech|
speech_node = Nokogiri::XML('<speech/>')
speech.xpath("*[@name='#{name}']").each do |speaking|
speech_node.root.add_child(speaking)
end
speaker.root.add_child(speech_node.root) unless speech_node.root.children.empty?
end
File.open(file_name + "_" + name + "-" + year + ".xml", 'a+') do |f|
f.write speaker.root.children
end
end