Thornton Rose
5/23/2002
Published as "Jumping into JAXP", 5/31/2002, Gamelan.com.
Copyright © 2001, Thornton Rose
Java plus XML is a combination of skills that is currently much in demand. For Java programmers that want jump into the XML fray, this article shows you the basics of using the Java API for XML Processing (JAXP).
To try the examples in this article, you will need the following tools:
JAXP gives Java programmers a standardized API for working with XML documents, independent of the actual XML parser that is used. The classes and interfaces that comprise JAXP can be divided into three general categories:
The Document Object Model (DOM) is a standardized API that is used to represent, navigate, and manipulate the structure and content of structured documents, such as valid HTML and XML. Documents are represented in DOM as trees, where each document contains one root node, which has zero or more child nodes, which in turn can be the root node of a tree.
To create a DOM document from an XML file, you use classes in the javax.xml.parsers package. DocumentBuilderFactory is used to create instances of DocumentBuilder, which is used to create DOM documents from XML sources. To navigate and manipulate the document, you use the classes in the org.w3c.dom package, such as Document and Node.
DomPrint is an example of using the JAXP DOM classes. It creates a DOM document from a given XML file then prints the content as plain text, with indentation to indicate nested elements. Even though it is recursive, the algorithm for DomPrint is straightforward:
Check command-line arguments. If not enough arguments, print usage message, then exit.
Create a File object from the first command-line argument.
Get an instance of DocumentBuilderFactory and configure it.
Get a DocumentBuilder from the DocumentBuilderFactory.
Tell the DocumentBuilder to parse the given file and return a DOM Document.
Print the tree, starting from the root node:
Print indentation for the given nesting level (0 = no indentation).
Print the node name.
If the node has attributes, print them, one per line, indented under the node name.
Print the node value on the next line after the node name.
If the node has children:
Increment indentation level.
For each child: print the tree, starting from the child.
Running DomPrint on an Ant project file produces this output. Running it on a DocBook article produces this output .
The Simple API for XML (SAX) is an event-based API for processing XML documents. As a document is parsed, events, such as document start or element start, are reported to an application. In order to handle these events, the application implements event handling interfaces.
To parse an XML document with SAX, you use the classes in the java.xml.parsers package. SAXParserFactory is used to create instances of SAXParser, which is used to parse XML documents. To handle parsing events, you extend org.xml.sax.helpers.DefaultHandler or implement org.xml.sax.ContentHandler.
SaxPrint is an example of using SAX to parse an XML document. It parses a given XML file and prints the content as block- structured text. Here is the algorithm:
Get command-line arguments. If not enough arguments, print usage message, then exit.
Create File from first command-line argument.
Get an instance of SAXParserFactory and configure it.
Get a SAXParser from the SAXParserFactory.
Tell the SAXParser to parse the given file.
Handle events:
When startDocument: print "BEGIN DOCUMENT".
When endDocument: print "END DOCUMENT".
When startElement:
Print "BEGIN" + element name.
If element has attributes, print them, indented under element name.
When endElement: print "END " + element name.
Running SaxPrint on an Ant project file produces this output. Running it on a DocBook article produces this output .
The XML Stylesheet Language for Transformations (XSLT) classes are used to transform XML documents into other forms, such as other XML structures, HTML, or plain text. Transformation is accomplished by applying instructions (rules) in an XSL stylesheet to an input source and creating an output result. Both the input source and the output result can be an a DOM document, SAX events, or an XML stream.
To transform an XML document with XSLT, you use the classes in the javax.xml.transform package. TransformerFactory is used to create instances of Transformer, which is used to run transformations. Input sources and output results are created with the classes in the package that corresponds to the type or source or result. For example, stream sources are created with the classes in the javax.xml.transform.stream package.
Transform is an example of transforming a given XML file with a given XSL stylesheet. Both the input and the result are streams. Here is the algorithm:
Running Transform on article.xml using article2html.xsl produces this output .