Given is:
a XML structure like
<span class="abbreviation">AGB<span class"explanation">Allgemeine Geschäftsbedingungen</span></span>
and the result after the transformation should be:
<abbr title="Allgemeine Geschäftsbedingungen">AGB</abbr>
I know that SAX is an event-based XML-parser, and with methods like
#startElement(...)
#endElement(...)
I can capture events (like open-a-tag
, close-a-tag
) and with
#characters
I can extract the text between the tags.
My Question is:
Can i create a transformation mentioned above (is it possible)?
My Problem is:
The answer is yes it's possible!
The main argument/hint you can get from this StackOverflow-link
here is what has to be done:
#character
method)abbr
-tagFor completeness here is the source code of the coremedia cae filter:
import com.coremedia.blueprint.cae.richtext.filter.FilterFactory;
import com.coremedia.xml.Filter;
import org.apache.commons.lang3.StringUtils;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.AttributesImpl;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class GlossaryFilter extends Filter implements FilterFactory {
private static final String SPAN = "span";
private static final String CLASS = "class";
private boolean isAbbreviation = false;
private boolean isExplanation = false;
private String abbreviation;
private String currentUri;
private boolean spanExplanationClose = false;
private boolean spanAbbreviationClose = false;
@Override
public Filter getInstance(final HttpServletRequest request, final HttpServletResponse response) {
return new GlossaryFilter();
}
@Override
public void startElement(final String uri, final String localName, final String qName,
final Attributes attributes) throws SAXException {
if (isSpanAbbreviationTag(qName, attributes)) {
isAbbreviation = true;
} else if (isSpanExplanationTag(qName, attributes)) {
isExplanation = true;
currentUri = uri;
} else {
super.startElement(uri, localName, qName, attributes);
}
}
private boolean isSpanExplanationTag(final String qName, final Attributes attributes) {
//noinspection OverlyComplexBooleanExpression
return StringUtils.isNotEmpty(qName) && qName.equalsIgnoreCase(SPAN) && (
attributes.getLength() > 0) && attributes.getValue(CLASS).equals("explanation");
}
private boolean isSpanAbbreviationTag(final String qName, final Attributes attributes) {
//noinspection OverlyComplexBooleanExpression
return StringUtils.isNotEmpty(qName) && qName.equalsIgnoreCase(SPAN) && (
attributes.getLength() > 0) && attributes.getValue(CLASS).equals("abbreviation");
}
@Override
public void endElement(final String uri, final String localName, final String qName)
throws SAXException {
if (spanExplanationClose) {
spanExplanationClose = false;
} else if (spanAbbreviationClose) {
spanAbbreviationClose = false;
} else {
super.endElement(uri, localName, qName);
}
}
@Override
public void characters(final char[] ch, final int start, final int length) throws SAXException {
if (isAbbreviation && isExplanation) {
final String explanation = new String(ch, start, length);
final AttributesImpl newAttributes = createAttributes(explanation);
writeAbbrTag(newAttributes);
changeState();
} else if (isAbbreviation && !isExplanation) {
abbreviation = new String(ch, start, length);
} else {
super.characters(ch, start, length);
}
}
private void changeState() {
isExplanation = false;
isAbbreviation = false;
spanExplanationClose = true;
spanAbbreviationClose = true;
}
@SuppressWarnings("TypeMayBeWeakened")
private void writeAbbrTag(final AttributesImpl newAttributes) throws SAXException {
super.startElement(currentUri, "abbr", "abbr", newAttributes);
super.characters(abbreviation.toCharArray(), 0, abbreviation.length());
super.endElement(currentUri, "abbr", "abbr");
}
private AttributesImpl createAttributes(final String explanation) {
final AttributesImpl newAttributes = new AttributesImpl();
newAttributes.addAttribute(currentUri, "title", "abbr:title", "CDATA", explanation);
return newAttributes;
}
}
The interesting stuff is in the methods:
startElement(...)
endElement(...)
characters(...)
Here you store the state at which tag the sax-parser is located (more detailed: you store the state, which span-tag (the "class=abbreviation" or "class=explanation") was opened.
isAbbreviation
for an opened span-tag with "class=abbreviation"isExplanation
for an opened span-tag with "class=explanation"You only store states. The mentioned span-tags will not be processed/filtered (the result is, they would be removed). Every other tag is processed with no filtering, they will be applied without modification (that's the else
-block).
Here you want only process every tag except (the mentioned span-tags). All these tags are applied without modification (the else
-block). If the sax parser is located at a closed span-tag (with "class=abbreviation" or "class=explanation") you want to do nothing (except store the state)
In this method the magic (creating a tag with the parser) happens. Depending on the state:
(isAbbreviation && isExplanation)
isAbbreviation && !isExplanation
)else
for state 3.
simply copy the text you find
for state 2.
extract the content of the span-tag with "class=abbreviation" for later use
for state 3.
abbr
-tag (title=....
)abbr
-tag (instead of the two span-tags)