Thursday, December 07, 2006

JAXP Introduction

JAXP is very adaptable. It essentially functions as an abstraction layer between your code and different vendors' XML processor implementations.

It allows you to plug in different DOM and SAX parsers, and XSLT (Extensible Stylesheet Language Transformation) transformers, as you require without needing to change your code. This is known as a pluggablility layer. The processor can be set by changing the appropriate environment variable. Attention, it doesn't mean you can swap between DOM and SAX.

JAXP also comes with its own default parsers that implement SAX and DOM functionality.

Simple API for XML (SAX) is an API for event-based parsing of XML documents. This means a SAX parser reads the XML document from beginning to end using a data stream. Any time it encounters a new element, it throws an event to notify the application running it. The application can then handle these events as required.

SAX comes with a number of methods that you can use to recognize an event. You then respond to these events by either implementing a specific interface or extending the default handler and overriding the appropriate method. SAX does not allow you to modify the XML document, it can only read it.

The Document Object Model (DOM) API is a World Wide Web Consortium (W3C) specification for parsing XML. It builds a representation of an XML document in memory in a tree structure. You can navigate this tree to search for elements, which correspond to branches. You can also insert new elements into the tree and remove elements.

When deciding which type of parser to use you should bear in mind your needs. The SAX API is fast, as it examines the XML serially. The DOM API is much more memory-intensive and CPU-intensive, as it must load the whole document. However, the DOM API allows you to modify the XML structure and has greater flexibility. You should choose the API that provides the best tradeoff between your requirements and the limitations of your system.

JAXP also supports XSLT, a language used for transforming XML documents into other documents using stylesheets. For example, using XSLT, you could transform an XML document into HTML, or you could make an XML document based on one schema conform to another schema.

No comments: