dsharman's picture
From dsharman rss RSS  subscribe Subscribe

O%27reilly Java%20&%20xslt 

 

 
 
Tags:  linux  dedicated  hosting 
Views:  358
Published:  April 28, 2010
 
0
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
Adsense 4.1

Adsense 4.1

From: repa54
Views: 326 Comments: 0

 
ucv

ucv

From: ceani
Views: 380 Comments: 0

 
Tribes QA

Tribes QA

From: bamasio
Views: 431 Comments: 0

 
See all 
 
More from this user
Is There A Free Reverse Cell Phone Directory

Is There A Free Reverse Cell Phone Directory

From: dsharman
Views: 196
Comments: 0

Business cases for software security

Business cases for software security

From: dsharman
Views: 313
Comments: 0

Software Engineering [6th Edition] Ian Sommerville

Software Engineering [6th Edition] Ian Sommerville

From: dsharman
Views: 388
Comments: 0

Cheap Travellers Auto Insurance

Cheap Travellers Auto Insurance

From: dsharman
Views: 157
Comments: 0

ITS 2007 Conference Paper Abstract

ITS 2007 Conference Paper Abstract

From: dsharman
Views: 58
Comments: 0

Oracle eBusiness Suite Primer for PeopleSoft Users and Implementers

Oracle eBusiness Suite Primer for PeopleSoft Users and Implementers

From: dsharman
Views: 42
Comments: 0

See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: Java and XSLT Eric M. Burke Publisher: O'Reilly First Edition September 2001 ISBN: 0-596-00143-6, 528 pages By GiantDino Learn how to use XSL transformations in Java programs ranging from stand-alone applications to servlets. Java and XSLT introduces XSLT and then shows you how to apply transformations in realworld situations, such as developing a discussion forum, transforming documents from one form to another, and generating content for wireless devices. Copyright Table of Contents Index Full Description About the Author Reviews Reader reviews Errata Java and XSLT Preface Audience Software and Versions Organization Conventions Used in This Book How to Contact Us Acknowledgments 1. Introduction 1.1 Java, XSLT, and the Web 1.2 XML Review 1.3 Beyond Dynamic Web Pages 1.4 Getting Started 1.5 Web Browser Support for XSLT 2. XSLT Part 1 -- The Basics 2.1 XSLT Introduction 2.2 Transformation Process 2.3 Another XSLT Example, Using XHTML 2.4 XPath Basics 2.5 Looping and Sorting 2.6 Outputting Dynamic Attributes 3. XSLT Part 2 -- Beyond the Basics 3.1 Conditional Processing 3.2 Parameters and Variables 3.3 Combining Multiple Stylesheets
Slide 2: 3.4 Formatting Text and Numbers 3.5 Schema Evolution 3.6 Ant Documentation Stylesheet 4. Java-Based Web Technologies 4.1 Traditional Approaches 4.2 The Universal Design 4.3 XSLT and EJB 4.4 Summary of Key Approaches 5. XSLT Processingwith Java 5.1 A Simple Example 5.2 Introduction to JAXP 1.1 5.3 Input and Output 5.4 Stylesheet Compilation 6. Servlet Basics and XSLT 6.1 Servlet Syntax 6.2 WAR Files and Deployment 6.3 Another Servlet Example 6.4 Stylesheet Caching Revisited 6.5 Servlet Threading Issues 7. Discussion Forum 7.1 Overall Process 7.2 Prototyping the XML 7.3 Making the XML Dynamic 7.4 Servlet Implementation 7.5 Finishing Touches 8. Additional Techniques 8.1 XSLT Page Layout Templates 8.2 Session Tracking Without Cookies 8.3 Identifying the Browser 8.4 Servlet Filters 8.5 XSLT as a Code Generator 8.6 Internationalization with XSLT 9. Development Environment, Testing, and Performance 9.1 Development Environment 9.2 Testing and Debugging 9.3 Performance Techniques 10. Wireless Applications 10.1 Wireless Technologies 10.2 The Wireless Architecture 10.3 Java, XSLT, and WML 10.4 The Future of Wireless A. Discussion Forum Code B. JAXP API Reference
Slide 3: C. XSLT Quick Reference Colophon Preface Java and Extensible Stylesheet Language Transformations (XSLT) are very different technologies that complement one another, rather than compete. Java's strengths are portability, its vast collection of standard libraries, and widespread acceptance by most companies. One weakness of Java, however, is in its ability to process text. For instance, Java may not be the best technology for merely converting XML files into another format such as XHTML or Wireless Markup Language (WML). Using Java for such a task requires skilled programmers who understand APIs such as DOM, SAX, or JDOM. For web sites in particular, it is desirable to simplify the page generation process so nonprogrammers can participate. XSLT is explicitly designed for XML transformations. With XSLT, XML data can be transformed into any other text format, including HTML, XHTML, WML, and even unexpected formats such as Java source code. In terms of complexity and sophistication, XSLT is harder than HTML but easier than Java. This means that page authors can probably learn how to use XSLT successfully but will require assistance from programmers as pages are developed. XSLT processors are required to interpret and execute the instructions found in XSLT stylesheets. Many of these processors are written in Java, making Java an excellent choice for applications that must interoperate with XML and XSLT. For web sites that utilize XSLT, Java servlets and EJBs are still required to intercept client requests, fetch data from databases, and implement business logic. XSLT may be used to generate each of the XHTML web pages, but this cannot be done without a language like Java acting as the coordinator. This book explains the most important concepts behind the XSLT markup language but is not a comprehensive reference on that subject. Instead, the focus is on interoperability with Java, with particular emphasis on servlets and web applications. Every concept is backed by working examples, all of which work on widely available, free tools. Audience Java programmers who want to learn how to use XSLT comprise the target audience for this book. Java programming experience is essential, and basic familiarity with XML terminology is helpful, but not required. Since so many of the examples revolve around web applications and servlets, Chapter 4 and 6 are devoted to this topic, offering a fast-paced tutorial to servlet technology. Chapter 2 and Chapter 3 contain a detailed XSLT tutorial, so no prior knowledge of XSLT is required. This book is particularly well-suited for readers who may have read a lot about these technologies but have not used everything together in a complete application. Chapter 7, for example, presents the implementation of a web-based discussion forum from start to finish. Fully worked examples can be found in every chapter, ranging from an Ant build file documentation stylesheet in Chapter 3 to internationalization techniques in Chapter 8. Software and Versions Keeping up with the latest technologies is always a challenge, particularly when writing about XML-related tools. The set of tools listed in Table P-1 is sufficient to run just about every example in this book. Table P-1. Software and versions
Slide 4: Tool Crimson JAXP 1.1 JDK 1.2.x JDOM beta 6 JUnit 3.7 Tomcat 4.0 Xalan URL Included with JAXP 1.1 http://java.sun.com/xml http://java.sun.com http://www.jdom.org http://www.junit.org http://jakarta.apache.org Included with JAXP 1.1 Description XML parser from Apache Java API for XML Processing Any Java 2 Standard Edition SDK Open source alternative to DOM Open source unit testing framework Open source servlet container XSLT processor There are certainly other tools, most notably the SAXON XSLT processor available from http://users.iclway.co.uk/mhkay/saxon. This can easily be substituted for Xalan because of the vendor-independence that JAXP offers. All of the examples, as well as JAR files for the tools listed in Table P-1, are available for download from http://www.javaxslt.com and from the O'Reilly web site at http://www.oreilly.com/catalog/javaxslt. The included README.txt file contains instructions for compiling and running the examples. Organization This book consists of 10 chapters and 3 appendixes, as follows: Chapter 1 Provides a broad overview of the technologies covered in this book and explains how XML, XSLT, Java, and other APIs are related. Also reviews basic XML concepts for readers who are familiar with Java but do not have a lot of XML experience. Chapter 2 Introduces XSLT syntax through a series of small examples and descriptions. Describes how to produce HTML and XHTML output and explains how XSLT works as a language. XPath syntax is also introduced in this chapter. Chapter 3 Continues with material presented in the previous chapter, covering more sophisticated XSLT language features such as conditional logic, parameters and variables, text and number formatting, and producing XML output. This chapter concludes with a more sophisticated example that produces summary reports for Ant build files. Chapter 4 Offers comparisons between popular web development technologies, comparing each with the Java and XSLT approach. The model-view-controller architecture is discussed in detail, and the relationship between XSLT web applications and EJB is touched upon. Chapter 5 Shows how to use XSLT processors with Java applications and servlets. Older Xalan and SAXON APIs are mentioned, but the primary focus is on Sun's JAXP. Key examples show how to use XSLT and SAX to transform non-XML files and data sources, how to
Slide 5: improve performance through caching techniques, and how to interoperate with DOM and JDOM. Chapter 6 Provides a detailed review of Java servlet programming techniques. Shows how to create web applications and WAR files, how to deploy XML and XSLT files within these web applications, and how to perform XSLT transformations from servlets. Chapter 7 Implements a complete web application from start to finish. In this chapter, a web-based discussion forum is designed and implemented using Java, XML, and XSLT techniques. The relationship between CSS and XSLT is presented, and XHTML Strict is used for all web pages. Chapter 8 Covers important Java and XSLT programming techniques that build upon concepts presented in earlier chapters, concluding with a detailed discussion of XSLT internationalization. Other topics include XSLT page layout templates, servlet session tracking without cookies, browser identification, and servlet filters. Chapter 9 Offers practical advice for making a wide range of XML parsers, XSLT processors, and various other Java tools work together. Shows how to resolve conflicts with incompatible XML JAR files, how to write simple unit tests with JUnit, and how to write custom JAXP error handlers. Also discusses performance techniques and the relationship between XSLT and EJB. Chapter 10 Describes the world of wireless technologies, with emphasis on Wireless Markup Language (WML). Shows how to detect wireless devices from a servlet, how to write XSLT stylesheets for these devices, and how to test using a variety of cell phone simulators. An online movie theater application is developed to reinforce the concepts. Appendix A Contains all of the remaining code from the discussion forum example presented in Chapter 7. Appendix B Lists and briefly describes each of the classes in Version 1.1 of the JAXP API. Appendix C Contains a quick reference for the XSLT language. Lists all XSLT elements along with required and optional attributes and allowable content within each element. Also cross references each element with the W3C XSLT specification. Conventions Used in This Book Italic is used for: • • • Pathnames, filenames, and program names New terms where they are defined Internet addresses, such as domain names and URLs Constant width is used for:
Slide 6: • • • Anything that appears literally in a Java program, including keywords, datatypes, constants, method names, variables, class names, and interface names All Java code listings HTML, XML, and XSLT documents, tags, and attributes Constant width italic is used for: • General placeholders that indicate that an item is replaced by some actual value in your own program Constant width bold is used for: • • Command-line entries Emphasis within a Java or XML source file How to Contact Us We have tested and verified the information in this book to the best of our ability, but you may find that features have changed (or even that we have made mistakes!). Please let us know about any errors you find, as well as your suggestions for future editions, by writing to: O'Reilly & Associates, Inc. 101 Morris Street Sebastopol, CA 95472 (800) 998-9938 (in the U.S. or Canada) (707) 829-0515 (international/local) (707) 829-0104 (FAX) There is a web page for this book, which lists errata, examples, or any additional information. You can access this page at: http://www.oreilly.com/catalog/javaxslt To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com For more information about books, conferences, software, Resource Centers, and the O'Reilly Network, see the O'Reilly web site at: http://www.oreilly.com Acknowledgments I would like to thank my wife Jennifer for tolerating my absence during the past six months, as I have locked myself in the basement researching, writing, and thinking. I also feel fortunate that my two-year-old son Aidan goes to bed early; a vast majority of this book was written well after 8:30 P.M.! Coming up with a list of people to thank is a difficult job because so many have influenced the material in this book. I only hope that I do not leave anyone out. All of the technical reviewers did an amazing amount of work, each offering a unique perspective and useful advice. The official reviewers were Dean Wette, Kevin Heifner, Paul Jensen, Shane Curcuru, and Tim Brown. I would also like to thank Weiqi Gao, Shu Zhu, Santosh Shanbhag, and Suman Ganesh for help with the internationalization example in Chapter 8. A technical article by Dan Troesser inspired my servlet filter implementation, and Justin Michel and Brent Roberts reviewed some of the first chapters that I wrote.
Slide 7: There are two companies that I really want to thank. O'Reilly has this little link on their home page called "Write for Us." This book came into existence because I casually clicked on that link one day and decided to submit a proposal. Although my original idea was not accepted, Mike Loukides and I exchanged several emails after that in a virtual brainstorming session, and eventually the proposal for this book emerged. I am still amazed that an unknown visitor to a web site can become an O'Reilly author. The other company I would like to thank is Object Computing, Inc. (OCI), my employer. They have a remarkable group of highly talented software engineers, all of whom are always available to answer questions, offer advice, and inspire me to learn more. These people are the reason I work for OCI and are the reason this book was possible. Finally, I would like to thank Mark Volkmann of OCI for teaching me about XML in the first place and for answering countless questions during the past five years. Chapter 1. Introduction When XML first appeared, people widely believed that it was the imminent successor to HTML. This viewpoint was influenced by a variety of factors, including media hype, wishful thinking, and simple confusion about the number of new technologies associated with XML. The reality is that millions of web sites are written in HTML, and no widely used browser fully supports XML and its related standards. Even when browser vendors incorporate full support for XML and its family of related technologies, it will take years before enough people use these new versions to justify rewriting most web sites in XML. Although maintaining compatibility with older browsers is essential, companies should not hesitate to move forward with XML and related technologies on the server. From the browser perspective, HTML will remain dominant on the Web for many years to come. Looking beneath the hood will reveal a much different picture, however, in which HTML is used only during the last instant of presentation. Web applications must support a multitude of browsers, and the easiest way to do this is to simply transform data into HTML before sending it to the client. On the server side, XML is the preferred way to process and exchange data because it is portable, standard, and easy to work with. This is where Java and XSLT enter the picture. 1.1 Java, XSLT, and the Web Extensible Stylesheet Language Transformations (XSLT) is designed to transform XML data into some other form, most commonly HTML, XHTML, or another XML format. An XSLT processor , such as Apache's Xalan, performs transformations using one or more XSLT stylesheets , which are also XML documents. As Figure 1-1 illustrates, XSLT can be utilized on the web tier while web browsers on the client tier deal only with HTML. Figure 1-1. XSLT transformation
Slide 8: Typically in an XSLT- and Java-based web application, XML data is generated dynamically based on database queries. Although some newer databases can export data directly as XML, you will often write custom Java code to extract data using JDBC and convert it to XML. This XML data, such as a customized list of benefit elections or perhaps an airline schedule for a specific time window, may be different for each client using the application. In order to display this XML data on most browsers, it must first be converted to HTML. As Figure 1-1 shows, the XML data is fed into the processor as one input, and an XSLT stylesheet is provided as a second input. The output is then sent directly to the web browser as a stream of HTML. The XSLT stylesheet produces HTML formatting instructions, while the XML provides raw data. 1.1.1 What's Wrong with HTML? One of the fundamental problems with HTML is its haphazard implementation. Although the specification for HTML is available from the World Wide Web Consortium (W3C), its evolution was driven mostly by competition between Netscape and Microsoft rather than a thoughtful design process and open standards. This resulted in a bloated language littered with browserspecific tags and varying support for standards. Since no two browsers support the exact same set of HTML features, web authors often limit themselves to a subset of HTML. Another approach is to create and maintain separate copies of each web page, which take advantage of the unique features found in a particular browser. The limitations of HTML are compounded for dynamic sites, in which Java programs are often responsible for accessing enterprise data sources and presenting that information through the browser. Extracting information from back-end data sources is much more difficult than simple web page authoring. This requires skilled developers who know how to interact with Enterprise JavaBeans or relational databases. Since skilled Java developers are a scarce and expensive resource, it makes sense to let them work on the back-end data sources and business logic while web page developers and less experienced programmers work on the HTML user interface. As we will see in Chapter 4, this can be difficult with traditional Java servlet approaches because Java code is often cluttered with HTML generation code. 1.1.2 Keeping Data and Presentation Separate HTML does not separate data from presentation. For example, the following fragment of HTML displays some information about a customer. In it, data fields such as "Aidan" and "Burke" are clearly intertwined with formatting elements such as <tr> and <td>: <h3>Customer Information</h3> <table border="1" cellpadding="2" cellspacing="0 "> <tr><td>First Name:</td><td>Aidan</td></tr> <tr><td>Last Name:</td><td>Burke</td></tr> <!-- etc... --> </table> Traditionally, this sort of HTML is generated dynamically using println( ) statements in a servlet, or perhaps through a JavaServer Page (JSP). Both require Java programmers, and neither technology explicitly keeps business logic and data separated from the HTML generation code. To support multiple incompatible browsers, you have to be careful to avoid duplication of a lot of Java code and the HTML itself. This places additional burdens on Java developers who should be working on more important problems. There are ways to keep programming logic separate from the HTML generation, but extracting meaningful data from HTML pages is next to impossible. This is because the HTML does not clearly indicate how its data is structured. A human can look at HTML and determine what its fields mean, but it is quite difficult to write a computer program that can reliably extract meaningful data. Although you can search for text patterns such as First Name: followed by <td>, this
Slide 9: approach[1] fails as soon as the presentation is modified. For example, changing the page as follows would cause this approach to fail: [1] This approach is commonly known as "screen scraping." <tr><td>Full Name:</td><td>Aidan Burke</td></tr> 1.1.3 The XSLT Solution XSLT makes it possible to define clearly the roles of Java, XML, XSLT, and HTML. Java is used for business logic, database queries and updates, and for creating XML data. The XML is responsible for raw data, while XSLT transforms the XML into HTML for viewing by a browser. A key advantage of this approach is the clean separation between the XML data and the HTML views. In order to support multiple browsers, multiple XSLT stylesheets are written, but the same XML data is reused on the server. In the previous example, the XML data for the customer did not contain any formatting instructions: <customer> <firstName>Aidan</firstName> <lastName>Burke</lastName> </customer> Since XML contains only data, it is almost always much simpler than HTML. Additionally, XML can be created using a Java API such as JDOM (http://www.jdom.org). This facilitates error checking and validation, something that cannot be achieved if you are simply printing HTML as text using PrintWriter and println( ) statements in a servlet. Best of all, the XML-generation code has to be written only once. The XML data can then be transformed by any number of XSLT stylesheets in order to support different browsers, alternate languages, or even nonbrowser devices such as web-enabled cell phones. 1.2 XML Review In a nutshell, XML is a format for storing structured data. Although it looks a lot like HTML, XML is much more strict with quotes, properly terminated tags, and other such details. XML does not define tag names, so document authors must invent their own set of tags or look towards a standards organization that defines a suitable XML markup language. A markup language is essentially a set of custom tags with semantic meaning behind each tag; XSLT is one such markup language, since it is expressed using XML syntax. The terms element and tag are often used interchangeably, and both are used in this book. Speaking from a more technical viewpoint, element refers to the concept being modeled, while tag refers to the actual markup that appears in the XML document. So <account> is a tag that represents an account element in a computer program. 1.2.1 SGML, XML, and Markup Languages Standard Generalized Markup Language (SGML) forms the basis for HTML, XHTML, XML, and XSLT, but in very different ways for each. Figure 1-2 illustrates the relationships between these technologies. Figure 1-2. SGML heritage
Slide 10: SGML is a very sophisticated metalanguage designed for large and complex documentation. As a metalanguage, it defines syntax rules for tags but does not define any specific tags. HTML, on the other hand, is a specific markup language implemented using SGML. A markup language defines its own set of tags, such as <h1> and <p>. Because HTML is a markup language instead of a metalanguage, you cannot add new tags and are at the mercy of the browser vendor to properly implement those tags. XML, as shown in Figure 1-2, is a subset of SGML. XML documents are compatible with SGML documents, however XML is a much smaller language. A key goal of XML is simplicity, since it has to work well on the Web where bandwidth and limited client processing power is a concern. Because of its simplicity, XML is easier to parse and validate, making it a better performer than SGML. XML is also a metalanguage, which explains why XML does not define any tags of its own. XSLT is a particular markup language implemented using XML, and will be covered in detail in the next two chapters. XHTML, like XSLT, is also an XML-based markup language. XHTML is designed to be a replacement for HTML and is almost completely compatible with existing web browsers. Unlike HTML, however, XHTML is based strictly on XML, and the rules for well-formed documents are very clearly defined. This means that it is much easier for vendors to develop editors and programming tools to deal with XHTML, because the syntax is much more predictable and can be validated just like any other XML document. Many of the examples in this book use XHTML instead of HTML, although XSLT can easily handle either format. XHTML Basics XHTML is a W3C Recommendation that represents the future of HTML. Based on HTML 4.0, XHTML is designed to be compatible with existing web browsers while complying fully with XML. This means that a properly written XHTML document is always a well-formed XML document. Furthermore, XHTML documents must adhere to one or more of the XHTML DTDs, therefore XHTML pages can be validated using today's XML parsers such as Apache's Crimson. XHTML is designed to be modular; therefore, subsets can be extracted and utilized for wireless devices such as cell phones. XHTML Basic, also a W3C Recommendation, is one such modularization effort, and will likely become a force to be reckoned with in the wireless space. Here is an example XHTML document: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Slide 11: Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Hello, World!</title> </head> <body> <p>Hello, World!</p> </body> </html> Some of the most important XHTML rules include: • XHTML documents must be well-formed XML and must adhere to one of the XHTML DTDs. As expected with XML, all elements must be properly terminated, attribute values must be quoted, and elements must be properly nested. The <!DOCTYPE ...> tag is required. Unlike HTML, tags must be lowercase. The root element must be <html> and must designate the XHTML namespace as shown in the previous example. <head> and <body> are required. • • • • The preceding document adheres to the strict DTD, which eliminates deprecated HTML tags and many style-related tags. Two other DTDs, transitional and frameset, provide more compatibility with existing web browsers but should be avoided when possible. For full information, refer to the W3C's specifications and documentation at http://www.w3.org. As we look at more advanced techniques for processing XML with XSLT, we will see that XML is not always dealt with in terms of a text file containing tags. From a certain perspective, XML files and their tags are really just a serialized representation of the underlying XML elements. This serialized form is good for storing XML data in files but may not be the most efficient format for exchanging data between systems or programmatically modifying the underlying data. For particularly large documents, a relational or object database offers far better scalability and performance than native XML text files. 1.2.2 XML Syntax Example 1-1 shows a sample XML document that contains data about U.S. Presidents. This document is said to be well-formed because it adheres to several basic rules about proper XML formatting. Example 1-1. presidents.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE presidents SYSTEM "presidents.dtd">
Slide 12: <presidents> <president> <term from="1789" to="1797"/> <name> <first>George</first> <last>Washington</last> </name> <party>Federalist</party> <vicePresident> <name> <first>John</first> <last>Adams</last> </name> </vicePresident> </president> <president> <term from="1797" to="1801"/> <name> <first>John</first> <last>Adams</last> </name> <party>Federalist</party> <vicePresident> <name> <first>Thomas</first> <last>Jefferson</last> </name> </vicePresident> </president> <!-- remaining presidents omitted --> </presidents> In HTML, a missing tag here and there or mismatched quotes are not disastrous. Browsers make every effort to go ahead and display these poorly formatted documents anyway. This makes the Web a much more enjoyable environment because users are not bombarded with constant syntax errors. Since the primary role of XML is to represent structured data, being well-formed is very important. When two banking systems exchange data, if the message is corrupted in any way, the receiving system must reject the message altogether or risk making the wrong assumptions. This is important for XSLT programmers to understand because XSLT itself is expressed using XML. When writing stylesheets, you must always adhere to the basic rules for well-formed documents. All well-formed XML documents must have exactly one root element . In Example 1-1, the root element is <presidents>. This forms the base of a tree data structure in which every other element has exactly one parent and zero or more children. Elements must also be properly terminated and nested: <name> <first>George</first> <last>Washington</last> </name> Although whitespace (spaces, tabs, and linefeeds) between elements is typically irrelevant, it can make documents more readable if you take the time to indent consistently. Although XML parsers preserve whitespace, it does not affect the meaning of the underlying elements. In this example,
Slide 13: the <first> tag must be terminated with a corresponding </first>. The following XML would be illegal because the tags are not properly nested: <name> <first>George <last>Washington</first> </last> </name> XML provides an alternate syntax for terminating elements that do not have children, formally known as empty elements . The <term> element is one such example: <term from="1797" to="1801"/> The closing slash indicates that this element does not contain any content , although it may contain attributes. An attribute is a name/value pair, such as from="1797". Another requirement for well-formed XML is that all attribute values be enclosed in quotes ("") or apostrophes (''). Most presidents had middle names, some did not have vice presidents, and others had several vice presidents. For our example XML file, these are known as optional elements. Ulysses Grant, for example, had two vice presidents. He also had a middle name: <president> <term from="1869" to="1877"/> <name> <first>Ulysses</first> <middle>Simpson</middle> <last>Grant</last> </name> <party>Republican</party> <vicePresident> <name> <first>Schuyler</first> <last>Colfax</last> </name> </vicePresident> <vicePresident> <name> <first>Henry</first> <last>Wilson</last> </name> </vicePresident> </president> Capitalization is also important in XML. Unlike HTML, all XML tags are case sensitive. This means that <president> is not the same as <PRESIDENT>. It does not matter which capitalization scheme you use, provided you are consistent. As you might guess, since XHTML documents are also XML documents, they too are case sensitive. In XHTML, all tags must be lowercase, such as <html>, <body>, and <head>. The following list summarizes the basic rules for a well-formed XML document: • • It must contain exactly one root element; the remainder of the document forms a tree structure, in which every element is contained within exactly one parent. All elements must be properly terminated. For example, <name>Eric</name> is properly terminated because the <name> tag is terminated with </name>. In XML, you can also create empty elements like <married/>.
Slide 14: • Elements must be properly nested. This is legal: <b><i>bold and italic</i></b> But this is illegal: <b><i>bold and italic</b></i> • • Attributes must be quoted using either quotes or apostrophes. For example: <date month="march" day='01' year="1971"/> Attributes must contain name/value pairs. Some HTML elements contain marker attributes, such as <td nowrap>. In XHTML, you would write this as <td nowrap="nowrap"/>. This is compatible with XML and should work in existing web browsers. This is not the complete list of rules but is sufficient to get you through the examples in this book. Clearly, most HTML documents are not well-formed. Many tags, such as <br> or <hr>, violate the rule that all elements must be properly terminated. In addition, browsers do not complain when attribute values are not quoted. This will have interesting ramifications for us when we write XSLT stylesheets, which are themselves written in XML but often produce HTML. What this basically means is that the stylesheet must contain well-formed XML, so it is difficult to produce HTML that is not well-formed. XHTML is certainly a more natural fit because it is also XML, just like the XSLT stylesheet. 1.2.3 Validation A well-formed XML document adheres to the basic syntax guidelines just outlined. A valid XML document goes one step further by adhering to either a Document Type Definition (DTD) or an XML Schema. In order to be considered valid, an XML document must first be well-formed. Stated simply, DTDs are the traditional approach to validation, and XML Schemas are the logical successor. XML Schema is another specification from the W3C and offers much more sophisticated validation capabilities than DTDs. Since XML Schema is very new, DTDs will continue to be used for quite some time. You can learn more about XML Schema at http://www.w3.org/XML/Schema. The second line of Example 1-1 contains the following document type declaration: <!DOCTYPE presidents SYSTEM "presidents.dtd"> This refers to the DTD that exists in the same directory as the presidents.xml file. In many cases, the DTD will be referenced by a URI instead: <!DOCTYPE presidents SYSTEM "http://www.javaxslt.com/dtds/presidents.dtd"> Regardless of where the DTD is located, it contains rules that define the allowable structure of the XML data. Example 1-2 shows the DTD for our list of presidents. Example 1-2. presidents.dtd <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT presidents (president+)> president (term, name, party, vicePresident*)> name (first, middle*, last, nickname?)> vicePresident (name)> first (#PCDATA)> last (#PCDATA)> middle (#PCDATA)> nickname (#PCDATA)> party (#PCDATA)> term EMPTY>
Slide 15: <!ATTLIST term from CDATA #REQUIRED to CDATA #REQUIRED > The first line in the DTD says that the <presidents> element can contain one or more <president> elements as children. The <president>, in turn, contains one each of <term>, <name>, and <party> in that order. It then may contain zero or more <vicePresident> elements. If the XML data did not adhere to these rules, the XML parser would have rejected it as invalid. The <name> element can contain the following content: exactly one <first>, followed by zero or more <middle>, followed by exactly one <last>, followed by zero or one <nickname>. If you are wondering why <middle> can occur many times, consider this former president: <name> <first>George</first> <middle>Herbert</middle> <middle>Walker</middle> <last>Bush</last> </name> Elements such as <first>George</first> are said to contain #PCDATA , which stands for parsed character data. This is ordinary text that can contain markup, such as nested tags. The CDATA type, which is used for attribute values, cannot contain markup. This means that < characters appearing in attribute values will have to be encoded in your XML documents as &lt;. The <term> element is EMPTY, meaning that it cannot have content. This is not to say that it cannot contain attributes, however. This DTD specifies that <term> must have from and to attributes: <term from="1869" to="1877"/> We will not cover the remaining syntax rules for DTDs in this book, primarily because they do not have much impact on our code as we apply XSLT stylesheets. DTDs are primarily used during the parsing process, when XML data is read from a file into memory. When generating XML for a web site, you generally produce new XML rather than parse existing XML, so there is much less need to validate. One area where we will use DTDs, however, is when we examine how to write unit tests for our Java and XSLT code. This will be covered in Chapter 9. 1.2.4 Java and XML Java APIs for XML such as SAX, DOM, and JDOM will be used throughout this book. Although we will not go into a great deal of detail on specific parsing APIs, the Java-based XSLT tools do build on these technologies, so it is important to have a basic understanding of what each API does and where it fits into the XML landscape. For in-depth information on any of these topics, you might want to pick up a copy of Java & XML by Brett McLaughlin (O'Reilly). A parser is a tool that reads XML data into memory. The most common pattern is to parse the XML data from a text file, although Java XML parsers can also read XML from any Java InputStream or even a URL. If a DTD or Schema is used, then validating parsers will ensure that the XML is valid during the parsing process. This means that once your XML files have been successfully parsed into memory, a lot less custom Java validation code has to be written. 1.2.4.1 SAX In the Java community, Simple API for XML (SAX) is the most commonly used XML parsing method today. SAX is a free API available from David Megginson and members of the XML-DEV mailing list (http://www.xml.org/xml-dev). It can be downloaded[2] from
Slide 16: http://www.megginson.com/SAX. Although SAX has been ported to several other languages, we will focus on the Java features. SAX is only responsible for scanning through XML data top to bottom and sending event notifications as elements, text, and other items are encountered; it is up to the recipient of these events to process the data. SAX parsers do not store the entire document in memory, therefore they have the potential to be very fast for even huge files. [2] One does not generally need to download SAX directly because it is supported by and included with all of the popular XML parsers. Currently, there are two versions of SAX: 1.0 and 2.0. Many changes were made in version 2.0, and the SAX examples in this book use this version. Most SAX parsers should support the older 1.0 classes and interfaces, however, you will receive deprecation warnings from the Java compiler if you use these older features. Java SAX parsers are implemented using a series of interfaces. The most important interface is org.xml.sax.ContentHandler , which has methods such as startDocument( ) , startElement( ) , characters( ) , endElement( ) , and endDocument( ) . During the parsing process, startDocument( ) is called once, then startElement( ) and endElement( ) are called once for each tag in the XML data. For the following XML: <first>George</first> the startElement( ) method will be called, followed by characters( ), followed by endElement( ). The characters( ) method provides the text "George" in this example. This basic process continues until the end of the document, at which time endDocument( ) is called. Depending on the SAX implementation, the characters( ) method may break up contiguous character data into several chunks of data. In this case, the characters( ) method will be called several times until the character data is entirely parsed. Since ContentHandler is an interface, it is up to your application code to somehow implement this interface and subsequently do something when the parser invokes its methods. SAX does provide a class called DefaultHandler that implements the ContentHandler interface. To use DefaultHandler, create a subclass and override the methods that interest you. The other methods can safely be ignored, since they are just empty methods. If you are familiar with AWT programming, you may recognize that this idiom is identical to event adapter classes such as java.awt.event.WindowAdapter. Getting back to XSLT, you may be wondering where SAX fits into the picture. It turns out that XSLT processors typically have the ability to gather input from a series of SAX events as an alternative to static XML files. Somewhat nonintuitively, it also turns out that you can generate your own series of SAX events rather easily -- without using a SAX parser. Since a SAX parser just calls a series of methods on the ContentHandler interface, you can write your own pseudo-parser that does the same thing. We will explore this in Chapter 5 when we talk about using SAX and an XSLT processor to apply transformations to non-XML data, such as results from a database query or content of a comma separated values (CSV) file. 1.2.4.2 DOM
Slide 17: The Document Object Model (DOM) is an API that allows computer programs to manipulate the underlying data structure of an XML document. DOM is a W3C Recommendation, and implementations are available for many programming languages. The in-memory representation of XML is typically referred to as a DOM tree because DOM is a tree data structure. The root of the tree represents the XML document itself, using the org.w3c.dom.Document interface. The document root element, on the other hand, is represented using the org.w3c.dom.Element interface. In the presidents example, the <presidents> element is the document root element. In DOM, almost every interface extends from the org.w3c.dom.Node interface; Document and Element are no exception. The Node interface provides numerous methods to navigate and modify the DOM tree consistently. Strangely enough, the DOM Level 2 Recommendation does not provide standard mechanisms for reading or writing XML data. Instead, each vendor implementation does this a little bit differently. This is generally not a big problem because every DOM implementation out there provides some mechanism for both parsing and serializing, or writing out XML files. The unfortunate result, however, is that reading and writing XML will cause vendor-specific code to creep into any application you write. At the time of this writing, a new W3C document called "Document Object Model (DOM) Level 3 Content Models and Load and Save Specification" was in the working draft status. Once this specification reaches the recommendation status, DOM will provide a standard mechanism for reading and writing XML. Since DOM does not specify a standard way to read XML data into memory, most DOM (if not all) implementations delegate this task to a dedicated parser. In the case of Java, SAX is the preferred parsing technology. Figure 1-3 illustrates the typical interaction between SAX parsers and DOM implementations. Figure 1-3. DOM and SAX interaction Although it is important to understand how these pieces fit together, we will not go into detailed parsing syntax in this book. As we progress to more sophisticated topics, we will almost always be generating XML dynamically rather than parsing in static XML data files. For this reason, let's look at how DOM can be used to generate a new document from scratch. Example 1-3 contains XML for a personal library. Example 1-3. library.xml
Slide 18: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE library SYSTEM "library.dtd"> <library> <!-- This is an XML comment --> <publisher id="oreilly"> <name>O'Reilly</name> <street>101 Morris Street</street> <city>Sebastopol</city> <state>CA</state> <postal>95472</postal> </publisher> <book publisher="oreilly" isbn="1-56592-709-5"> <edition>1</edition> <publicationDate mm="10" yy="1999"/> <title>XML Pocket Reference</title> <author>Robert Eckstein</author> </book> <book publisher="oreilly" isbn="0-596-00016-2"> <edition>1</edition> <publicationDate mm="06" yy="2000"/> <title>Java and XML</title> <author>Brett McLaughlin</author> </book> </library> As shown in library.xml, a <library> consists of <publisher> elements and <book> elements. To generate this XML, we will use Java classes called Library, Book, and Publisher. These classes are not shown here, but they are really simple. For example, here is a portion of the Book class: public class Book { private String author; private String title; ... public String getAuthor( return this.author; } public String getTitle( return this.title; } ... } Each of these three helper classes is merely used to hold data. The code that creates XML is encapsulated in a separate class called LibraryDOMCreator, which is shown in Example 1-4. Example 1-4. XML generation using DOM package chap1; import import import import /** java.io.*; java.util.*; org.w3c.dom.Document; org.w3c.dom.Element; ){ ){
Slide 19: * An example from Chapter 1. Creates the library XML file using the * DOM API. */ public class LibraryDOMCreator { /** * Create a new DOM org.w3c.dom.Document object from the specified * Library object. * * @param library an application defined class that * provides a list of publishers and books. * @return a new DOM document. */ public Document createDocument(Library library) throws javax.xml.parsers.ParserConfigurationException { // Use Sun's Java API for XML Parsing to create the // DOM Document javax.xml.parsers.DocumentBuilderFactory dbf = javax.xml.parsers.DocumentBuilderFactory.newInstance( ); javax.xml.parsers.DocumentBuilder docBuilder = dbf.newDocumentBuilder( ); Document doc = docBuilder.newDocument( ); // NOTE: DOM does not provide a factory method for creating: // <!DOCTYPE library SYSTEM "library.dtd"> // Apache's Xerces provides the createDocumentType method // on their DocumentImpl class for doing this. Not used here. // create the <library> document root element Element root = doc.createElement("library"); doc.appendChild(root); // add <publisher> children to the <library> element Iterator publisherIter = library.getPublishers().iterator( while (publisherIter.hasNext( )) { Publisher pub = (Publisher) publisherIter.next( ); Element pubElem = createPublisherElement(doc, pub); root.appendChild(pubElem); } // now add <book> children to the <library> element Iterator bookIter = library.getBooks().iterator( ); while (bookIter.hasNext( )) { Book book = (Book) bookIter.next( ); Element bookElem = createBookElement(doc, book); root.appendChild(bookElem); } return doc; } private Element createPublisherElement(Document doc, Publisher pub) { Element pubElem = doc.createElement("publisher"); // set id="oreilly" attribute pubElem.setAttribute("id", pub.getId( ); ));
Slide 20: Element name = doc.createElement("name"); name.appendChild(doc.createTextNode(pub.getName( pubElem.appendChild(name); ))); Element street = doc.createElement("street"); street.appendChild(doc.createTextNode(pub.getStreet( pubElem.appendChild(street); Element city = doc.createElement("city"); city.appendChild(doc.createTextNode(pub.getCity( pubElem.appendChild(city); Element state= doc.createElement("state"); state.appendChild(doc.createTextNode(pub.getState( pubElem.appendChild(state); ))); ))); ))); Element postal = doc.createElement("postal"); postal.appendChild(doc.createTextNode(pub.getPostal( pubElem.appendChild(postal); return pubElem; } ))); private Element createBookElement(Document doc, Book book) { Element bookElem = doc.createElement("book"); bookElem.setAttribute("publisher", book.getPublisher().getId( )); bookElem.setAttribute("isbn", book.getISBN( )); Element edition = doc.createElement("edition"); edition.appendChild(doc.createTextNode( Integer.toString(book.getEdition( )))); bookElem.appendChild(edition); Element publicationDate = doc.createElement("publicationDate"); publicationDate.setAttribute("mm", Integer.toString(book.getPublicationMonth( ))); publicationDate.setAttribute("yy", Integer.toString(book.getPublicationYear( ))); bookElem.appendChild(publicationDate); Element title = doc.createElement("title"); title.appendChild(doc.createTextNode(book.getTitle( bookElem.appendChild(title); Element author = doc.createElement("author"); author.appendChild(doc.createTextNode(book.getAuthor( bookElem.appendChild(author); return bookElem; } public static void main(String[] args) throws IOException, javax.xml.parsers.ParserConfigurationException { Library lib = new Library( ); ))); )));
Slide 21: LibraryDOMCreator ldc = new LibraryDOMCreator( Document doc = ldc.createDocument(lib); ); // write the Document using Apache Xerces // output the Document with UTF-8 encoding; indent each line org.apache.xml.serialize.OutputFormat fmt = new org.apache.xml.serialize.OutputFormat(doc, "UTF -8", true); org.apache.xml.serialize.XMLSerializer serial = new org.apache.xml.serialize.XMLSerializer(System.out, fmt); serial.serialize(doc.getDocumentElement( )); } } This example starts with the usual series of import statements. Notice that org.w3c.dom.* is imported, but packages such as org.apache.xml.serialize.* are not. The code is written this way in order to make it obvious that many of the classes you will use are not part of the standard DOM API. These nonstandard classes all use fully qualified class and package names in the code. Although DOM itself is a W3C recommendation, many common tasks are not covered by the spec and can only be accomplished by reverting to vendor-specific code. The workhorse of this class is the createDocument method, which takes a Library as a parameter and returns an org.w3c.dom.Document object. This method could throw a ParserConfigurationException, which indicates that Sun's Java API for XML Parsing (JAXP) could not locate an XML parser: public Document createDocument(Library library) throws javax.xml.parsers.ParserConfigurationException { The Library class simply stores data representing a personal library of books. In a real application, the Library class might also be responsible for connecting to a back-end data source. This arrangement provides a clear separation between XML generation code and the underlying database. The sole purpose of LibraryDOMCreator is to crank out DOM trees, making it easy for one programmer to work on this class while another focuses on the implementation of Library, Book, and Publisher. The next step is to begin constructing a DOM Document object: javax.xml.parsers.DocumentBuilderFactory dbf = javax.xml.parsers.DocumentBuilderFactory.newInstance( javax.xml.parsers.DocumentBuilder docBuilder = dbf.newDocumentBuilder( ); Document doc = docBuilder.newDocument( ); ); This code relies on JAXP because the standard DOM API does not provide any support for creating a new Document object in a standard way. Different parsers have their own proprietary way of doing this, which brings us to the whole point of JAXP: it encapsulates differences between various XML parsers, allowing Java programmers to use a consistent API regardless of which parser they use. As we will see in Chapter 5, JAXP 1.1 adds a consistent wrapper around various XSLT processors in addition to standard SAX and DOM parsers. JAXP provides a DocumentBuilderFactory to construct a DocumentBuilder, which is then used to construct new Document objects. The Document class is a part of DOM, so most of the remaining code is defined by the DOM specification. In DOM, new XML elements must always be created using factory methods, such as createElement(...), on an instance of Document. These elements must then be added to
Slide 22: either the document itself or one of the elements within the document before they actually become part of the XML: // create the <library> document root element Element root = doc.createElement("library"); doc.appendChild(root); At this point, the <library/> element is empty, but it has been added to the document. The code then proceeds to add all <publisher> children: // add <publisher> children to the <library> element Iterator publisherIter = library.getPublishers().iterator( while (publisherIter.hasNext( )) { Publisher pub = (Publisher) publisherIter.next( ); Element pubElem = createPublisherElement(doc, pub); root.appendChild(pubElem); } ); For each instance of Publisher, a <publisher> Element is created and then added to <library>. The createPublisherElement method is a private helper method that simply goes through the tedious DOM steps required to create each XML element. One thing that may not seem entirely obvious is the way that text is added to elements, such as O'Reilly in the <name>O'Reilly</name> tag: Element name = doc.createElement("name"); name.appendChild(doc.createTextNode(pub.getName( pubElem.appendChild(name); ))); The first line is pretty obvious, simply creating an empty <name/> element. The next line then adds a new text node as a child of the name object rather than setting the value directly on the name. This is indicative of the way that DOM represents XML: any parsed character data is considered to be a child of a node, rather than part of the node itself. DOM uses the org.w3c.dom.Text interface, which extends from org.w3c.dom.Node, to represent text nodes. This is often a nuisance because it results in at least one extra line of code for each element you wish to generate. The main() method in Example 1-4 creates a Library object, converts it into a DOM tree, then prints the XML text to System.out. Since the standard DOM API does not provide a standard way to convert a DOM tree to XML, we introduce Xerces specific code to convert the DOM tree to text form: // write the document using Apache Xerces // output the document with UTF-8 encoding; indent each line org.apache.xml.serialize.OutputFormat fmt = new org.apache.xml.serialize.OutputFormat(doc, "UTF -8", true); org.apache.xml.serialize.XMLSerializer serial = new org.apache.xml.serialize.XMLSerializer(System.out, fmt); serial.serialize(doc.getDocumentElement( )); As we will see in Chapter 5, JAXP 1.1 does provide a mechanism to perform this task using its transformation APIs, so we do not technically have to use the Xerces code listed here. The JAXP approach maximizes portability but introduces the overhead of an XSLT processor when all we really need is DOM. 1.2.4.3 JDOM DOM is specified in the language independent Common Object Request Broker Architecture Interface Definition Language (CORBA IDL), allowing the same interfaces and concepts to be utilized by many different programming languages. Though valuable from a specification perspective, this approach does not take advantage of specific Java language features. JDOM is
Slide 23: a Java-only API that can be used to create and modify XML documents in a more natural way. By taking advantage of Java features, JDOM aims to simplify some of the more tedious aspects of DOM programming. JDOM is not a W3C specification, but is open source software[3] available at http://www.jdom.org. JDOM is great from a programming perspective because it results in much cleaner, more maintainable code. Since JDOM has the ability to convert its data into a standard DOM tree, it integrates nicely with any other XML tool. JDOM can also utilize whatever XML parser you specify and can write out XML to any Java output stream or file. It even features a class called SAXOutputter that allows the JDOM data to be integrated with any tool that expects a series of SAX events. [3] Sun has accepted JDOM as Java Specification Request (JSR) 000102; see http://java.sun.com/aboutJava/communityprocess/. The code in Example 1-5 shows how much easier JDOM is than DOM; it does the same thing as the DOM example, but is about fifty lines shorter. This difference would be greater for more complex applications. Example 1-5. XML generation using JDOM package com.oreilly.javaxslt.chap1; import import import import import import java.io.*; java.util.*; org.jdom.DocType; org.jdom.Document; org.jdom.Element; org.jdom.output.XMLOutputter; /** * An example from Chapter 1. Creates the library XML file. */ public class LibraryJDOMCreator { public Document createDocument(Library library) { Element root = new Element("library"); // JDOM supports the <!DOCTYPE...> DocType dt = new DocType("library", "library.dtd"); Document doc = new Document(root, dt); // add <publisher> children to the <library> element Iterator publisherIter = library.getPublishers().iterator( while (publisherIter.hasNext( )) { Publisher pub = (Publisher) publisherIter.next( ); Element pubElem = createPublisherElement(pub); root.addContent(pubElem); } // now add <book> children to the <library> element Iterator bookIter = library.getBooks().iterator( ); while (bookIter.hasNext( )) { Book book = (Book) bookIter.next( ); Element bookElem = createBookElement(book); root.addContent(bookElem); } return doc; );
Slide 24: } private Element createPublisherElement(Publisher pub) { Element pubElem = new Element("publisher"); pubElem.addAttribute("id", pub.getId( )); pubElem.addContent(new Element("name").setText(pub.getName( ))); pubElem.addContent(new Element("street").setText(pub.getStreet( ))); pubElem.addContent(new Element("city").setText(pub.getCity( ))); pubElem.addContent(new Element("state").setText(pub.getState( ))); pubElem.addContent(new Element("postal").setText(pub.getPostal( ))); return pubElem; } private Element createBookElement(Book book) { Element bookElem = new Element("book"); // add publisher="oreilly" and isbn="1234567" attributes // to the <book> element bookElem.addAttribute("publisher", book.getPublisher().getId( )) .addAttribute("isbn", book.getISBN( )); // now add an <edition> element to <book> bookElem.addContent(new Element("edition").setText( Integer.toString(book.getEdition( )))); Element pubDate = new Element("publicationDate"); pubDate.addAttribute("mm", Integer.toString(book.getPublicationMonth( ))); pubDate.addAttribute("yy", Integer.toString(book.getPublicationYear( ))); bookElem.addContent(pubDate); bookElem.addContent(new Element("title").setText(book.getTitle( ))); bookElem.addContent(new Element("author").setText(book.getAuthor( return bookElem; } public static void main(String[] args) throws IOExce ption { Library lib = new Library( ); LibraryJDOMCreator ljc = new LibraryJDOMCreator( ); Document doc = ljc.createDocument(lib); // Write the XML to System.out, indent two spaces, include // newlines after each element new XMLOutputter(" ", true, "UTF-8").output(doc, System.out); } )));
Slide 25: } The JDOM example is structured just like the DOM example, beginning with a method that converts a Library object into a JDOM Document: public Document createDocument(Library library) { The most striking difference in this particular method is the way in which the Document and its Elements are created. In JDOM, you simply create Java objects to represent items in your XML data. This contrasts with the DOM approach, which relies on interfaces and factory methods. Creating the Document is also easy in JDOM: Element root = new Element("library"); // JDOM supports the <!DOCTYPE...> DocType dt = new DocType("library", "library.dtd"); Document doc = new Document(root, dt); As this comment indicates, JDOM allows you to refer to a DTD, while DOM does not. This is just another odd limitation of DOM that forces you to include implementation-specific code in your Java applications. Another area where JDOM shines is in its ability to create new elements. Unlike DOM, text is set directly on the Element objects, which is more intuitive to Java programmers: private Element createPublisherElement(Publisher pub) { Element pubElem = new Element("publisher"); pubElem.addAttribute("id", pub.getId( )); pubElem.addContent(new Element("name").setText(pub.getName( ))); pubElem.addContent(new Element("street").setText(pub.getStreet( ))); pubElem.addContent(new Element("city").setText(pub.getCity( ))); pubElem.addContent(new Element("state").setText(pub.getState( ))); pubElem.addContent(new Element("postal").setText(pub.getPostal( ))); return pubElem; } Since methods such as addContent( ) and addAttribute( ) return a reference to the Element instance, the code shown here could have been written as one long line. This is similar to StringBuffer.append( ), which can also be "chained" together: buf.append("a").append("b").append("c"); In an effort to keep the JDOM code more readable, however, our example adds one element per line. The final piece of this pie is the ability to print out the contents of JDOM as an XML file. JDOM includes a class called XMLOutputter, which allows us to generate the XML for a Document object in a single line of code: new XMLOutputter(" ", true, "UTF-8").output(doc, System.out); The three arguments to XMLOutputter indicate that it should use two spaces for indentation, include linefeeds, and encode its output using UTF-8. 1.2.4.4 JDOM and DOM interoperability Current XSLT processors are very flexible, generally supporting any of the following sources for XML or XSLT input: • a DOM tree or output from a SAX parser
Slide 26: • • any Java InputStream or Reader a URI, file name, or java.io.File object JDOM is not directly supported by some XSLT processors, although this is changing fast.[4] For this reason, it is typical to convert a JDOM Document instance to some other format so it can be fed into an XSLT processor for transformation. Fortunately, the JDOM package provides a class called DOMOutputter that can easily make the transformation: [4] As this book went to press, Version 6.4 of SAXON was released with beta support for transforming JDOM trees. Additionally, JDOM beta 7 introduces two new classes, JDOMSource and JDOMResult, that interoperate with any JAXP-compliant XSLT processor. org.jdom.output.DOMOutputter outputter = new org.jdom.output.DOMOutputter( ); org.w3c.dom.Document domDoc = outputter.output(jdomDoc); The DOM Document object can then be used with any of the XSLT processors or a whole host of other XML libraries and tools. JDOM also includes a class that can convert a Document into a series of SAX events and another that can send XML data to an OutputStream or Writer. In time, it seems likely that tools will begin offering native support for JDOM, making extra conversions unnecessary. The details of all these techniques are covered in Chapter 5. 1.3 Beyond Dynamic Web Pages You probably know a little bit about servlets already. Essentially, they are Java classes that run on the web tier, offering a high-performance, portable alternative to CGI scripts. Java servlets are great for extracting data from a database and then generating XHTML for the browser. They are also good for validating HTTP POST or GET requests from browsers, allowing people to fill out job applications or order books online. But more powerful techniques are required when you create web applications instead of simple web sites. 1.3.1 Web Development Challenges When compared to GUI applications based on Swing or AWT, developing for the Web can be much more difficult. Most of the difficulties you will encounter can be traced to one of the following: • • • • Hypertext Transfer Protocol (HTTP) HTML limitations browser compatibility problems concurrency issues HTTP is a fairly simple protocol that enables a client to communicate with a server. Web browsers almost always use HTTP to communicate with web servers, although they may use other protocols such as HTTPS for secure connections or even FTP for file downloads. HTTP is a request/response protocol, and the browser must initiate the request. Each time you click on a hyperlink, your browser issues a new request to a web server. The server processes the request and sends a response, thus finishing the exchange. This request/response cycle is easy to understand but makes it tedious to develop an application that maintains state information as the user moves through a complex web application. For example, as a user adds items to a shopping cart, a servlet must store that data somewhere while waiting for the client to make another request. When that request arrives, the servlet has to associate the cart with that particular client, since the servlet could be dealing with hundreds or
Slide 27: thousands of concurrent clients. Other than establishing a timeout period, the servlet has no idea when the client abandons the cart, deciding to shop on a competitor's site instead. The HTTP protocol makes it impossible for the server to initiate a conversation with the client, so the servlet cannot periodically ping the client as it can with a "normal" client/server application. HTML itself can be another hindrance to web application development. It was not designed to compete with feature-rich GUI toolkits, yet customers are increasingly demanding that applications of all sorts become "web enabled." This presents a significant challenge because HTML offers only a small set of primitive GUI components. Sophisticated HTML generation is not the subject of this book, but we will see how to use XSLT to separate complex HTML generation code from underlying programming logic and servlet code. As HTML grows ever more complex, the benefits of a clean separation become increasingly obvious. As you probably well know, browsers are not entirely compatible with one another. As a web application developer, this generally means that you have to test on a wide variety of platforms. XSLT offers support in this area because you can write reusable stylesheets for the consistent parts of HTML and import or include browser-specific stylesheet fragments to work around browser incompatibilities. Of course, the underlying XML data and programming logic is shared across all browsers, even though you may have multiple stylesheets. Finally, we have the issue of concurrency. In the servlet model, a single servlet instance must handle multiple concurrent requests. Although you can explicitly synchronize access to a servlet, this often results in performance degradation as individual client requests queue up, waiting for their turn. Processing requests in parallel will be an important part of our XSLT-based servlet designs in later chapters. 1.3.2 Web Applications The difference between a "web site" and a "web application" is subjective. Although some of the technologies are the same, web applications tend to be far more interactive and more difficult to create than typical web sites. For example, a web site is mostly read-only, with occasional forms for submitting information. For this, simple technologies such as HTML combined with JavaServer Pages (JSPs) can do the job. A web application, on the other hand, is typically a custom application intended to perform a specific business or technical function. They are often written as replacements for existing systems in an effort to enable browser-based access. When replacing existing systems, developers are typically asked to duplicate all of the existing functionality, using a web browser and HTML. This is difficult at best because of HTML's limited support for sophisticated GUI components. Most of the screens in a web application are dynamically generated and customized on a per-user basis, while many pages on a typical web site are static. Java, XML, and XSLT are suitable for web applications because of the high degree of modularity they offer. While one programmer develops the back-end data access code, a graphic designer can be working on the HTML user interface. Yet another servlet expert can be working on the web tier, while someone else is defining and creating the XML data. Programmers and graphic designers will typically work together to define the XSLT stylesheets, although the current lack of interactive tools may make this more of a programming task. Another reason XML is suitable for web applications is its unique ability to interoperate with backend business systems and databases. Once an XML layer has been added to your data tier, the web tier can extract that data in XML form regardless of which operating system or hardware platform is used. XSLT can then convert that XML into HTML without a great deal of custom coding, resulting in less work for your development team. 1.3.3 Nonbrowser Clients While web sites typically deliver HTML to browsers, web applications may be asked to interoperate with applications other than browsers. It is typical to provide feature-rich Swing GUI
Slide 28: clients for use within a company, while remote workers access the system via an XHTML interface through a web browser. An XML approach is key in this environment because the raw XML can be sent to the Swing client, while XSLT can be used to generate the XHTML views from the same XML data. If your XML is not in the correct format, XSLT can also be used to transform it into another variant of XML. For example, a client application may expect to see: <name>Eric Burke</name> But the XML data on the web tier deals with the data as: <firstName>Eric</firstName><lastName>Burke</lastName> In this case, XSLT can be used to transform the XML into the simplified format that the client expects. 1.3.3.1 SOAP Sending raw XML data to clients is a good approach because it interoperates with any operating system, hardware platform, or programming language. Allowing Visual Basic clients to extract XML data from a web application allows existing client software to be salvaged while enabling remote access to enterprise data using a more portable solution such as Java. But defining a custom XML format is tedious because it requires you to manually write code that encodes and decodes messages between the client and the web application. Simple Object Access Protocol (SOAP) is a standardized protocol for exchanging data using XML messages. SOAP was originally introduced by Microsoft but has been submitted to the W3C for standardization and is endorsed by many companies. SOAP is fairly simple, allowing vendors to quickly create tools that simplify data exchange between web applications and any type of client. Since SOAP messages are implemented using XML, they can be created and updated using XSLT stylesheets. This means that data can be extracted from a relational database as XML, transformed with XSLT into a standard SOAP message, and then delivered to a client application written in any language. For more information on SOAP standardization efforts, visit http://www.w3.org/TR/SOAP. 1.3.4 Wireless Cell phones, personal digital assistants (PDAs), and other handheld devices seem to be the next big thing. From a marketing perspective, it is not entirely clear how the business model of the Web will translate to the world of wireless. It is also unclear which technologies will be used for this new generation of devices. One currently popular technology is Wireless Application Protocol (WAP), which uses an XML markup language called Wireless Markup Language (WML) to render pages. Other languages have been proposed, such as Compact HTML (CHTML), but perhaps the most promising prospect is XHTML Basic. XHTML Basic is backed by the W3C and is primarily based on several XHTML modules. Its designers had the luxury of coming after WML, so they could incorporate many WML concepts and build on that experience. Because of the uncertainties in the wireless arena, an XML and XSLT approach is the safest available today. Encoding your data in XML enables flexibility to support any markup language or protocol on the client, hopefully without rewriting major pieces of Java code. Instead, new XSLT stylesheets are written to support new devices and protocols. An added benefit of XSLT is its ability to support both traditional browser clients and newer wireless clients from the same underlying XML data and Java business logic. 1.4 Getting Started
Slide 29: The best way to get started with new technologies is to experiment. For example, if you do not know XSLT, you should experiment with plenty of stylesheets as you work through the next two chapters. Aside from trying out the examples that appear in this book, you may want to invent a simple XML data file that represents something of interest to you, such as your personal music collection or family tree. Using XSLT stylesheets, try to create web pages that show your data in many different formats. Once the basics of XSLT are out of the way, servlets will be your next big challenge. Although the servlet API is not particularly difficult to learn, configuration and deployment issues can make it difficult to debug and test your applications. The best advice is to start small, writing a very basic application that proves your environment is configured correctly before moving on to more sophisticated examples. Apache's Tomcat is probably the best servlet container for beginners because it is free, easy to configure, and is the official reference implementation for Sun's servlet API. A servlet container is the server that runs servlets. Chapter 6 covers the essentials of the servlet API, but for all the details you will want to pick up a copy of Java Servlet Programming by Jason Hunter (O'Reilly). You definitely want to get the second edition because it covers the dramatic changes that were introduced in Version 2.2 of the servlet API. 1.4.1 Java XSLT Processor Choices Although this book uses primarily Sun's JAXP and Apache's Xalan, many other XSLT processors are available. Processors based on other languages may offer much higher performance when invoked from the command line, primarily because they do not incur the overhead of a Java Virtual Machine (JVM) at application startup time. When using XSLT from a servlet, however, the JVM is already running, so startup time is no longer an issue. Pure Java processors are great for servlets because of the ease with which they can be embedded into the web application. Simply adding a JAR file to the CLASSPATH is generally all that must be done. Putting an up-to-date list of XSLT processors into a book is futile because the market is maturing too fast. Some of the currently popular Java-based processors are listed here, but a quick web search for "XSLT Processors" would be prudent before you decide to standardize on a particular tool, as new processors are constantly appearing. We will see how to use Xalan in the next chapter; a few other choices are listed here. 1.4.1.1 XT XT was one of the earliest XSLT processors, written by James Clark. If you read the XSLT specification, you may recognize him as the editor of the XSLT specification. As the XSLT specification evolved, XT followed a parallel path of evolution, making it a leader in terms of standards compliance. At the time of this writing, however, XT had not been updated as recently as some of the other Java- based processors. Version 19991105 of XT implements the W3C's proposed-recommendation (PR-xslt-19991008) version of XSLT and is available at http://www.jclark.com/xml/xt.html. Like the other processors listed here, XT is free. 1.4.1.2 LotusXSL LotusXSL is a Java XSLT processor from IBM Alphaworks available at http://www.alphaworks.ibm.com. In November 1999 IBM donated LotusXSL to Apache, forming the basis for Xalan. LotusXSL continued to exist as a separate product. However, it is currently a thin wrapper around the Xalan processor. Future versions of LotusXSL may add features above and beyond those offered by Xalan, but there doesn't seem to be a compelling reason to choose LotusXSL unless you are already using it. 1.4.1.3 SAXON The SAXON XSLT processor from Michael Kay is available at http://saxon.sourceforge.net. SAXON is open source software in accordance with the Mozilla Public License and is a very
Slide 30: popular alternative to Xalan. SAXON provides full support for the current XSLT specification and is very well documented. It also provides several value-added features such as the ability to output multiple result trees from the same transformation and update the values of variables within stylesheets. To transform a document using SAXON, first include saxon.jar in your CLASSPATH. Then type java com.icl.saxon.StyleSheet -? to list all available options. The basic syntax for transforming a stylesheet is as follows: java com.icl.saxon.StyleSheet [options] source -doc style-doc [ params...] To transform the presidents.xml file and send the results to standard output, type the following: java com.icl.saxon.StyleSheet presidents.xml presidents.xslt 1.4.1.4 JAXP Version 1.1 of Sun's Java API for XML Processing (JAXP) contains support for XSLT transformations, a notable omission from earlier versions of JAXP. It can be downloaded from http://java.sun.com/xml. Parsing XML and transforming XSLT are not the primary focus of JAXP. Instead, the key goal is to provide a standard Java interface to a wide variety of XML parsers and XSLT processors. Although JAXP does include reference implementations of XML parsers and an XSLT processor, its key benefit is the choice of tools afforded to Java developers. Vendor lock-in should be much less of an issue thanks to JAXP. Since JAXP is primarily a Java-based API, we will cover its programmatic interfaces in depth as we talk about XSLT programming techniques in Chapter 5. JAXP currently includes Apache's Xalan as its default XSLT processor, so the Xalan instructions presented in Chapter 2 will also apply to JAXP. 1.5 Web Browser Support for XSLT In a web application environment, performing XSLT transformations on the client instead of the server is valuable for a number of reasons. Most importantly, it reduces the workload on the server machine, allowing a greater number of clients to be served. Once a stylesheet is downloaded to the client, subsequent requests will presumably use a cached copy, therefore only the raw XML data will need to be transmitted with each request. This has the potential to greatly reduce bandwidth requirements. Even more interesting tricks are possible when JavaScript is introduced into the equation. You can programmatically modify either the XML data or the XSLT stylesheet on the client side, reapply the stylesheet, and see the results immediately without requesting a new document from the server. Microsoft introduced XSLT support into Version 5.0 of Internet Explorer, but the XSLT specification was not finalized at the time. Unfortunately, significant changes were made to XSLT before it was finally promoted to a W3C Recommendation, but IE had already shipped using the older version of the specification. Although Microsoft has done a good job updating its MSXML parser with full support for the final XSLT Recommendation, millions of users will probably stick to IE 5.0 or 5.5 for quite some time, making it very difficult to perform portable XSLT transformations on the client. For IE 5.0 or 5.5 users, the MSXML parser is available as a separate download from Microsoft. Once downloaded, installed, and configured using a separate program called xmlinst, the browser will be compliant with Version 1.0 of the XSLT recommendation. This is something that developers will want to do, but probably very few end users will have the technical skills to go through these steps. At the time of this writing, Netscape had not introduced support for XSLT into its browsers. We hope this changes by the time this book is published. Although their implementation will be
Slide 31: released much later than Microsoft's, it should be compliant with the latest XSLT Recommendation. Yet another alternative is to utilize a browser plug-in that supports XSLT, although this approach is probably most effective within the confines of a corporation. In this environment, the browser can be controlled to a certain extent, allowing client-side transformations much sooner than possible on public web sites. Because XSLT transformation on the client will likely be mired in browser compatibility issues for several years, the role of Java with respect to XSLT will continue to be important. One use will be to detect the browser using a Java servlet, and then deliver the appropriate stylesheet to the client only if a compliant browser is in use. Otherwise, the servlet will drive the transformation process by invoking the XSLT processor on the web server. Once we finish with XSLT syntax in the next two chapters, the role of Java and XSLT will be covered throughout the remainder of this book. Chapter 2. XSLT Part 1 -- The Basics Extensible Stylesheet Language (XSL) is a specification from the World Wide Web Consortium (W3C) and is broken down into two complementary technologies: XSL Formatting Objects and XSL Transformations (XSLT). XSL Formatting Objects, a language for defining formatting such as fonts and page layout, is not covered in this book. XSLT, on the other hand, was primarily designed to transform a well-formed XML document into XSL Formatting Objects. Even though XSLT was designed to support XSL Formatting Objects, it has emerged as the preferred technology for all sorts of transformations. Transformation from XML to HTML is the most common, but XSLT can also be used to transform well-formed XML into just about any text file format. This will give XML- and XSLT-based web sites a major leg up as wireless devices become more prevalent because XSLT can also be used to transform XML into Wireless Markup Language or some other stripped-down format that wireless devices will require. 2.1 XSLT Introduction Why is transformation so important? XML provides a simple syntax for defining markup, but it is up to individuals and organizations to define specific markup languages. There is no guarantee that two organizations will use the exact same markup; in fact, you may struggle to agree on consistent formats within the same group or company. One group may use <employee>, while others may use <worker> or <associate>. In order to share data, the XML data has to be transformed into a common format. This is where XSLT shines -- it eliminates the need to write custom computer programs to transform data. Instead, you simply create one or more XSLT stylesheets. An XSLT processor is an application that applies an XSLT stylesheet to an XML data source. Instead of modifying the original XML data, the result of the transformation is copied into something called a result tree, which can be directed to a static file, sent directly to an output stream, or even piped into another XSLT processor for further transformations. Figure 2-1 illustrates the transformation process, showing how the XML input, XSLT stylesheet, XSLT processor, and result tree relate to one another. Figure 2-1. XSLT transformation
Slide 32: The XML input and XSLT stylesheet are normally two separate entities.[1] For the examples in this chapter, the XML will always reside in a text file. In future chapters, however, we will see how to improve performance by dealing with the XML as an in-memory object tree. This makes sense from a Java/XSLT perspective because most web applications will generate XML dynamically rather than deal with a series of static files. Since the XML data and XSLT stylesheet are clearly separated, it is very plausible to write several different stylesheets that convert the same XML into radically different formats. [1] Section 2.7 of the XSLT specification covers embedded stylesheets. XSLT transformation can occur on either the client or server, although server-side transformations are currently dominant. Since a vast majority of Internet users do not use XSLTcompliant browsers (at the time of this writing), the typical model is to transform XML into HTML on the web server so the browser sees only the resulting HTML. In a closed corporate environment where the browser feature set can be controlled, moving the XSLT transformation process to the browser can improve scalability and reduce network traffic. It should be noted that XSLT stylesheets do not perform the same function as Cascading Style Sheets (CSS), which you may be familiar with. In the CSS model, style elements are applied to HTML or XML on the web browser, affecting formatting such as fonts and colors. CSS do not produce a separate result tree and cannot be applied in advance using a standalone processor as XSLT can. The CSS processing model operates on the underlying data in a top down fashion in a single pass, while XSLT can iterate and perform conditional logic on the XML data. Although XSLT can produce style instructions, its true role is that of a transformation language rather than a style language. XSL Formatting Objects, on the other hand, is a style language that is much more comparable to CSS. For wireless applications, HTML is not typically generated. Instead, Wireless Markup Language (WML) is the current standard for cell phones and other wireless devices. In the future, new standards such as XHTML Basic may be used. When using an XSLT approach, the same XML data can be transformed into many forms, all via different stylesheets. Regardless of how many stylesheets are used, the XML data will remain unchanged. A typical web site might have the following stylesheets for a single XML home page: homeBasic.xslt For older web browsers homeIE5.xslt Takes advantage of newer Internet Explorer features homeMozilla.xslt Takes advantage of newer Netscape features homeWML.xslt Transforms into Wireless Markup Language homeB2B.xslt Transforms the XML into another XML format, suitable for "B2B-style" XML data feeds to customers
Slide 33: Schema evolution implies an upgrade to an existing data source where the structure of the data must be modified. When the data is stored in XML format, XSLT can be used to support schema evolution. For example, Version 1.0 of your application may store all of its files in XML format, but Version 2.0 might add new features that cannot be supported by the old 1.0 file format. A perfect solution is to write a single stylesheet to transform all of the old 1.0 XML files to the new 2.0 file format. 2.1.1 An XSLT Example You need three components to perform XSLT transformations: an XML data source, an XSLT stylesheet, and an XSLT processor. The XSLT stylesheet is actually a well-formed XML document, so the XSLT processor will also include or use an XML parser. Apache's Xalan is used for most of the examples in this book; the previous chapter listed several other processors that you may want to investigate. You can download Xalan from http://xml.apache.org. It uses and includes Apache's Xerces parser, but can be configured to use other parsers. The ability to swap out parsers is important because this gives you the flexibility to use the latest innovations as competing (and perhaps faster) parsers are released. Example 2-1 represents an early prototype of a discussion forum home page. The complete discussion forum application will be developed in Chapter 7. This is the raw XML data, without any formatting instructions or HTML. As you can see, the home page simply lists the message boards that the user can choose to view. Example 2-1. discussionForumHome.xml <?xml version="1.0" encoding="UTF-8"?> <discussionForumHome> <messageBoard id="1" name="Java Programming"/> <messageBoard id="2" name="XML Programming"/> <messageBoard id="3" name="XSLT Questions"/> </discussionForumHome> It is assumed that this data will be generated dynamically as the result of a database query, rather than hardcoded as a static XML file. Regardless of its origin, the XML data says nothing about how to actually display the web page. For clarity, we will keep the XSLT stylesheet fairly simple at this point. The beauty of an XML/XSLT approach is that you can beef up the stylesheet later on without compromising any of the underlying XML data structures. Even more importantly, the Java code that will generate the XML data does not have to be cluttered up with HTML and user interface logic; it just produces the basic XML data. Once the format of the data has been defined, a Java programmer can begin working on the database logic and XML generation code, while another team member begins writing the XSLT stylesheets. Example 2-2 lists the XSLT stylesheet that produces the home page. Don't worry if not everything in this first example makes sense. XSLT is, after all, a completely new language. We will cover everything in detail throughout the remainder of this and the next chapter. Example 2-2. discussionForumHome.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <!-- match the document root --> <xsl:template match="/"> <html> <head>
Slide 34: <title>Discussion Forum Home Page</title> </head> <body> <h1>Discussion Forum Home Page</h1> <h3>Please select a message board to view:</h3> <ul> <xsl:apply-templates select="discussionForumHome/messageBoard"/> </ul> </body> </html> </xsl:template> <!-- match a <messageBoard> element --> <xsl:template match="messageBoard"> <li> <a href="viewForum?id={@id}"> <xsl:value-of select="@name"/> </a> </li> </xsl:template> </xsl:stylesheet> The filename extension for XSLT stylesheets is irrelevant. In this book,.xslt is used. Many stylesheet authors prefer .xsl. The first thing that should jump out immediately is the fact that the XSLT stylesheet is also a wellformed XML document. Do not let the xsl: namespace prefix fool you -- everything in this document adheres to the same basic rules that every other XML document must follow. Like other XML files, the first line of the stylesheet is an XML declaration: <?xml version="1.0" encoding="UTF-8"?> Unless you are dealing with internationalization issues, this will remain unchanged for every stylesheet you write. This line is immediately followed by the document root element, which contains the remainder of the stylesheet: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> The <xsl:stylesheet> element has two attributes in this case. The first, version="1.0", specifies the version of the XSLT specification. Although this is the current version at the time of this writing, the next version of the XSLT specification is well underway and may be finished by the time you read this. You can stay abreast of the latest XSLT developments by visiting the W3C home page at http://www.w3.org. The next attribute declares the XML namespace, defining the meaning of the xsl: prefix you see on all of the XSLT elements. The prefix xsl is conventional, but could be anything you choose. This is useful if your document already uses the xsl prefix for other elements, and you do not want to introduce a naming conflict. This is really the entire point of namespaces: they help to avoid name conflicts. In XML, <a:book> and <b:book> can be discerned from one another because each book has a different namespace prefix. Since you pick the namespace prefix, this avoids the possibility that two vendors will use conflicting prefixes.
Slide 35: In the case of XSLT, the namespace prefix does not have to be xsl, but the value does have to be http://www.w3.org/1999/XSL/Transform. The value of a namespace is not necessarily a real web site, but the syntax is convenient because it helps ensure uniqueness. In the case of XSLT, 1999 represents the year that the URL was allocated for this purpose, and is not related to the version number. It is almost certain that future versions of XSLT will continue to use this same URL. Even the slightest typo in the namespace will render the stylesheet useless for most processors. The text must match http://www.w3.org/1999/XSL/Transform exactly, or your stylesheet will not be processed. Spelling or capitalization errors are a common mistake and should be the first thing you check when things are not working as you expect. The next line of the stylesheet simply indicates that the result tree should be treated as an HTML document instead of an XML document: <xsl:output method="html"/> In Version 1.0 of XSLT, processors are not required to fully support this element. Xalan does, however, so we will include this in all of our stylesheets. Since the XSLT stylesheet itself must be written as well-formed XML, some HTML tags are difficult to include. Instead of writing <hr>, you must write <hr/> in your stylesheet. When the output method is html, processors such as Xalan will remove the slash (/) character from the result tree, which produces HTML that typical web browsers expect. The remainder of our stylesheet consists of two templates . Each matches some pattern in the XML input document and is responsible for producing output to the result tree. The first template is repeated as follows: <xsl:template match="/"> <html> <head> <title>Discussion Forum Home Page</title> </head> <body> <h1>Discussion Forum Home Page</h1> <h3>Please select a message board to view:</h3> <ul> <xsl:apply-templates select="discussionForumHome/messageBoard"/> </ul> </body> </html> </xsl:template> When the XSLT processor begins its transformation process, it looks in your stylesheet for a template that matches the "/" pattern. This pattern matches the source XML document that is being transformed. You may recall from Chapter 1 that DOM uses the Document interface to represent the document, which is what we are matching here. This is always the starting point for processing, so nearly every stylesheet you write will contain a template similar to this one. Since this is the first template to be instantiated, it is also where we create the framework for the resulting HTML document. The second template, which matches the "messageBoard" pattern, is currently ignored. This is because the processor is only looking at the root of the XML document, and the <messageBoard> element is nested beneath the <discussionForumHome> element.
Slide 36: Most of the tags in this template do not start with <xsl:, so they are simply copied to the result tree. In fact, the only dynamic content in this particular template is the following line, which tells the processor to continue the transformation process: <xsl:apply-templates select="discussionForumHome/messageBoard"/> Without this line, the transformation process would be complete because the "/" pattern was already located and a corresponding template was instantiated. The <xsl:apply-templates> element tells the XSLT processor to begin a new search for elements in the source XML document that match the "discussionForumHome/messageBoard" pattern and to instantiate an additional template that matches. As we will see shortly, the transformation process is recursive and must be driven by XSLT elements such as <xsl:apply-templates>. Simply including one or more <xsl:template> elements in a stylesheet does not mean that they will be instantiated. In this example, the <xsl:apply-templates> element tells the XSLT processor to first select all <discussionForumHome> elements of the current node. The current node is "/" , or the top of the document, so it only selects the <discussionForumHome> element that occurs at the document's root level. If another <discussionForumHome> element is deeply nested within the XML document, it will not be selected by this pattern. Assuming that the processor locates the <discussionForumHome> element, it then searches for all of its <messageBoard> children. The select attribute in <xsl:apply-templates> does not have to be the same as the match attribute in <xsl:template>. Although the stylesheet presented in Example 2-2 could have specified <xsl:template match="discussionForumHome/messageBoard"> for the second template, this would limit the reusability of the template. Specifically, it could only be applied to <messageBoard> elements that occur as direct children of <discussionForumHome> elements. Since our template matches only "messageBoard", it can be reused for <messageBoard> elements that appear anywhere in the XML document. For each <messageBoard> child, the processor looks for the template in your stylesheet that provides the best match. Since our stylesheet contains a template that matches the "messageBoard" pattern exactly, it is instantiated for each of the <messageBoard> elements. The job of this template is to produce a single HTML list item tag for each <messageBoard> element: <xsl:template match="messageBoard"> <li> <a href="viewForum?id={@id}"> <xsl:value-of select="@name"/> </a> </li> </xsl:template> As you can see, the list item must be properly terminated; HTML-style standalone <li> tags are not allowed because they break the requirement that XSLT stylesheets be well-formed XML. Terminating the element with </li> also works with HTML, so this is the approach you must
Slide 37: take. The hyperlink is a best guess at this point in the design process because the servlet has not been defined yet. Later, when we develop a servlet to actually process this web page, we will update the link to point to the correct servlet. In the stylesheet, @ is used to select the values of attributes. Curly braces ({}) are known as an attribute value template and will be discussed in Chapter 3. If you look back at Example 2-1, you will see that each message board has two attributes, id and name: <messageBoard id="1" name="Java Programming"/> When the stylesheet processor is executed and the result tree generated, we end up with the HTML shown in Example 2-3. The HTML is minimal at this point, which is exactly what you want. Fancy changes to the page layout can be added later; the important concept is that programmers can get started right away with the underlying application logic because of the clean separation between data and presentation that XML and XSLT provide. Example 2-3. discussionForumHome.html <html> <head> <title>Discussion Forum Home Page</title> </head> <body> <h1>Discussion Forum Home Page</h1> <h3>Please select a message board to view:</h3> <ul> <li> <a href="viewForum?id=1">Java Programming</a> </li> <li> <a href="viewForum?id=2">XML Programming</a> </li> <li> <a href="viewForum?id=3">XSLT Questions</a> </li> </ul> </body> </html> 2.1.2 Trying It Out To try things out, download the examples for this book and locate discussionForumHome.xml and discussionForumHome.xslt. They can be found in the chap1 directory. If you would rather type in the examples, you can use any text editor or a dedicated XML editor such as Altova's XML Spy (http://www.xmlspy.com). After downloading and unzipping the Xalan distribution from Apache, simply add xalan.jar and erces.jar to your CLASSPATH. The transformation can then be initiated with the following command: java org.apache.xalan.xslt.Process -IN discussionForumHome.xml -XSL discussionForumHome.xslt This will apply the stylesheet, sending the resulting HTML content to standard output. Adding OUTfilename to the command will cause Xalan to send the result tree directly to a file. To see the complete list of Xalan options, just type java org.apache.xalan.xslt.Process. For example, the -TT option allows you to see (trace) which templates are being called.
Slide 38: Xalan's -IN and -XSL parameters accept URLs as arguments rather than as file names. A simple filename will work if the files are in the current working directory, but you may need to use a full URL syntax, such as file:///path/file.ext, when the file is located elsewhere. In Chapter 5, we will show how to invoke Xalan and other XSLT processors from Java code, which is far more efficient because a separate Java Virtual Machine (JVM) does not have to be invoked for each transformation. Although it can take several seconds to start the JVM, the actual XSLT transformations will usually occur in milliseconds. Another option is to find a web browser that supports XSLT, which allows you to edit your stylesheet and hit the "Reload" button to view the transformation. 2.2 Transformation Process Now that we have seen an example, let's back up and talk about some basics. In particular, it is important to understand the relationship between <xsl:template match=...> and <xsl:apply-templates select=...>. This should help to solidify your understanding of the previous example and lay the groundwork for more sophisticated processing. Although XSLT is a language, it is not intended to be a general-purpose programming language. Because of its specialized mission as a transformation language,[2] the design of XSLT works in the way that XML is structured, which is fundamentally a tree data structure. [2] XSLT is declarative in nature, while mainstream programming languages tend to be more procedural. 2.2.1 XML Tree Data Structure Every well-formed XML document forms a tree data structure. The document itself is always the root of the tree, and every element within the document has exactly one parent. Since the document itself is the root, it has no parent. As you learn XSLT, it can be helpful to draw pictures of your XML data that show its tree structure. Figure 2-2 illustrates the tree structure for discussionForumHome.xml. Figure 2-2. Tree structure for discussionForumHome.xml The document itself is the root of the tree and may contain processing instructions, the document root element, and even comments. XSLT has the ability to select any of these items, although you will probably want to select elements and attributes when transforming to HTML. As mentioned earlier, the "/" pattern matches the document itself, which is the root node of the entire tree.
Slide 39: A tree data structure is fundamentally recursive because it consists of leaf nodes and smaller trees. Each of these smaller trees, in turn, also consist of leaf nodes and still smaller trees. Algorithms that deal with tree structures can almost always be expressed recursively, and XSLT is no exception. The processing model adopted by XSLT is explicitly designed to take advantage of the recursive nature of every well-formed XML document. This means that most stylesheets can be broken down into highly modular, easily understandable pieces, each of which processes a subset of the overall tree (i.e., a subtree). Two important concepts in XSLT are the current node and current node list. The current node is comparable to the current working directory on a file system. The <xsl:value-of select="."/> element is similar to printing the name of the current working directory. The current node list is similar to the list of subdirectories. The key difference is that in XSLT, the current node appears in your source XML document. The current node list is a collection of nodes. As processing proceeds, the current node and current node list are constantly changing as you traverse the source tree, looking for patterns in the data. 2.2.2 Recursive Processing with Templates Most transformation in XSLT is driven by two elements: <xsl:template> and <xsl:applytemplates> . In XSLT lingo, a node can represent anything that appears within your XML data. Nodes are typically elements such as <message> or element attributes such as id="123". Nodes can also be XML processing instructions, text, or even comments. XSLT transformation begins with a current node list that contains a single entry: the root node. This is the XML document and is represented by the "/" pattern. Processing proceeds as follows: • For each node "X" in the current node list, the processor searches for all <xsl:template match="pattern"> elements in your stylesheet that potentially match that node. From this list of templates, the one with the best match[3] is selected. [3] See section 5.5 of the XSLT specification for conflict -resolution rules. • The selected <xsl:template match="pattern"> is instantiated using node "X" as its current node. This template typically copies data from the source document to the result tree or produces brand new content in combination with data from the source. If the template contains <xsl:apply-templates select="newPattern"/>, a new current node list is created and the process repeats recursively. The select pattern is relative to node "X", rather than the document root. • As the XSLT transformation process continues, the current node and current node list are constantly changing. This is a good thing, since you do not want to constantly search for patterns beginning from the document root element. You are not limited to traversing down the tree, however; you can iterate over portions of the XML data many times or navigate back up through the document tree structure. This gives XSLT a huge advantage over CSS because CSS is limited to displaying the XML in the order in which it appears in the document. Comparing <xsl:template> to <xsl:applytemplates> One way to understand the difference between <xsl:template> and <xsl:apply-templates> is to think about the difference between a Java method and the code that invokes the method. For example, a method in Java is declared as follows:
Slide 40: public void printMessageBoard(MessageBoard board) { // print information about the message board } In XSLT, the template plays a similar role: <xsl:template match="messageBoard"> <!-- print information about the message board </xsl:template> In order to invoke the Java method, use the following Java code: someObject.printMessageBoard(currentBoard); And in XSLT, use: <xsl:apply-templates select="..."/> to instantiate the template using the current <messageBoard> node. While this is a good comparison to help illustrate the difference between <xsl:template> and <xsl:apply-templates>, it is important to remember that the XSLT model is not really a method call. Instead, <xsl:apply-templates> instructs the processor to scan through the XML document again, looking for nodes that match a pattern. If matching nodes are found, the best matching template is instantiat ed. In the next chapter, we will see that XSLT also has <xsl:calltemplate>, which works similarly to a Java method call. Let's suppose that your source document contains the following XML: <school> <name>SIUC</name> <city>Carbondale</city> <state>Illinois</state> </school> The following template could be used to match the <school> element and output its contents: <xsl:template match="school"> <b><xsl:value-of select="name"/> is located in <xsl:value-of select="city"/>, <xsl:value-of select="state"/>.</b> </xsl:template> The result will be something like: <b>SIUC is located in Carbondale, Illinois.</b> As you can see, elements that do not start with xsl: are simply copied to the result tree, as is plain text such as "is located in."[4] We do not show this here, but if you try the example you will see that whitespace characters (spaces, tabs, and linefeeds) are also copied to the result tree. When the destination is HTML, it is usually safe to ignore this issue because the browser will collapse that whitespace. If you view the actual source code of the generated HTML, it can look pretty ugly. An alternative to simply including "is located in" is to use: [4] Technically, elements that do not belong to the XSLT namespace are simply copied to the result tree; the namespace prefix might not be xsl:. <xsl:text> is located in </xsl:text>.
Slide 41: This provides explicit control over how whitespace and linefeeds are treated. <xsl:value-of> copies the value of something in the XML source tree to the result tree. In this case, the current node is <school>, so <xsl:value-of select="name"/> selects the text content of the <name> element contained within <school>. This is the simplest usage of XPath, which will be introduced shortly. XPath is not limited to the current node, so it can also be used to locate elements in other parts of the source document. It can even select attributes, processing instructions, or anything else that can occur in XML. 2.2.3 Built-in Template Rules All XSLT processors must include four built-in template rules that have lower precedence than any other rules, so they can be overridden by simply writing a new template rule that matches the same pattern. The best way to think about built-in rules is to assume they are always in the background, ready to be applied if no other rule is found that matches a node. The first rule allows recursive processing to continue in case an explicit rule does not match the current node or the root node: <xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template> This template matches all elements (*) and the root node (/), i.e., the document itself. It will not match processing instructions, comments, attributes, or text. The <xsl:apply-templates/> causes all children that are not attribute nodes or processing instruction nodes to be processed. The second built-in rule is identical to the first, except it applies to each mode used in the stylesheet: <xsl:template match="*|/" mode="m"> <xsl:apply-templates mode="m"/> </xsl:template> Template modes are discussed in the next chapter, so we will not go into details here. The third built-in rule simply copies all text and attribute nodes to the result tree: <xsl:template match="text( )|@*"> <xsl:value-of select="."/> </xsl:template> And finally, the built-in rule for processing instructions and comments does nothing. This is why comments and processing instructions in the input XML data do not automatically show up in the result tree: <xsl:template match="processing-instruction()|comment( )"/> 2.2.4 A Skeleton Stylesheet As your XML documents get more complex, you will most likely want to break up your stylesheets into several templates. The starting point is a template that matches the "/" pattern: <xsl:template match="/"> ...content </xsl:template> This template matches the document itself and is usually where you output the basic <html>, <head>, and <body> elements. Somewhere within this template, you must tell the processor to continue searching for additional patterns, thus beginning the recursive transformation process. In a typical stylesheet, <xsl:apply-templates> is used for this purpose, instructing the processor to search for additional content in the XML data.
Slide 42: It should be stressed that this is not the only way to write a stylesheet, but it is a very natural way to handle the recursive nature of XML. Example 2-4 contains a skeleton XSLT stylesheet that you can use as a starting point for most of your projects. Example 2-4. Skeleton stylesheet <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <!--**************************************************************** ** "/" template matches the document and is the starting point ************************************************************* --> <xsl:template match="/"> <html> <head> <title>[title goes here]</title> </head> <body> <xsl:apply-templates select="[some XPath expression]"/> </body> </html> </xsl:template> <!--*************************************************************** * ** "[???]" template ************************************************************* --> <xsl:template match="???"> [continue the process...] <xsl:apply-templates select="[another XPath expression]"/> [you can also include more content here...or even include multiple apply-templates...] </xsl:template> </xsl:stylesheet> Deciding how to modularize the stylesheet is a subjective process. One suggestion is to look for moderately sized chunks of XML data repeated numerous times throughout a document. For example, a <customer> element may contain a name, address, and phone number. Creating a template that matches "customer" is probably a good idea. You may even want to create another template for the <name> element, particularly if the name is broken down into subelements, or if the name is reused in other contexts such as <employee> and <manager>. When you need to produce HTML tables or unordered lists in the result tree, two templates (instead of one) can make the job very easy. The first template will produce the <table> or <ul> element, and the second will produce each table row or list item. The following fragment illustrates this basic pattern: <!-- the outer template produces the unordered list --> <!-- (note: plural 'customers') --> <xsl:template match="customers"> <ul> <xsl:apply-templates select="customer"/> </ul> </xsl:template> <!-- the inner template is repeated for each customer --> <xsl:template match="customer"> <li><xsl:value-of select="name"/></li>
Slide 43: </xsl:template> 2.3 Another XSLT Example, Using XHTML Example 2-5 contains XML data from an imaginary scheduling program. A schedule has an owner followed by a list of appointments. Each appointment has a date, start time, end time, subject, location, and optional notes. Needless to say, a true scheduling application probably has a lot more data, such as repeating appointments, alarms, categories, and many other bells and whistles. Assuming that the scheduler stores its data in XML files, we can easily add features later by writing a stylesheet to convert the existing XML files to some new format. Example 2-5. schedule.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="schedule.xslt"?> <schedule> <owner> <name> <first>Eric</first> <last>Burke</last> </name> </owner> <appointment> <when> <date month="03" day="15" year="2001"/> <startTime hour="09" minute="30"/> <endTime hour="10" minute="30"/> </when> <subject>Interview potential new hire</subject> <location>Rm 103</location> <note>Ask Bob for an updated resume.</note> </appointment> <appointment> <when> <date month="03" day="15" year="2001"/> <startTime hour="15" minute="30"/> <endTime hour="16" minute="30"/> </when> <subject>Dr. Appointment</subject> <location>1532 Main Street</location> </appointment> <appointment> <when> <date month="03" day="16" year="2001"/> <startTime hour="11" minute="30"/> <endTime hour="12" minute="30"/> </when> <subject>Lunch w/Boss</subject> <location>Pizza Place on First Capitol Drive</location> </appointment> </schedule> As you can see, the XML document uses both attributes (month="03") and child elements to represent its data. XSLT has the ability to search for and transform both types of data, as well as comments, processing instructions, and text. In our current document, the appointments are stored in chronological order. Later, we will see how to change the sort order using <xsl:sort>.
Slide 44: Unlike the earlier example, the second line of Example 2-5 contains a reference to the XSLT stylesheet: <?xml-stylesheet type="text/xsl" href="schedule.xslt"?> This processing instruction is entirely optional. When viewing the XML document in a web browser that supports XSLT, this is the stylesheet that is used. If you apply the stylesheet from the command line or from a server-side process, however, you normally specify both the XML document and the XSLT document as parameters to the processor. Because of this capability, the processing instruction shown does not force that particular stylesheet to be used. From a development perspective, including this line quickly displays your work because you simply load the XML document into a compatible web browser, and the stylesheet is loaded automatically. In this book, the xml-stylesheet processing instruction uses type="text/xsl". However, some processors use type="text/xml", which does not work with Microsoft Internet Explorer. The XSLT specification contains one example, which uses "text/xml". Figure 2-3 shows the XHTML output from an XSLT transformation of schedule.xml. As you can see, the stylesheet is capable of producing content that does not appear in the original XML data, such as "Subject:". It can also selectively copy element content and attribute values from the XML source to the result tree; nothing requires every piece of data to be copied. Figure 2-3. XHTML output
Slide 45: The XSLT stylesheet that produces this output is shown in Example 2-6. As mentioned previously, XSLT stylesheets must be well-formed XML documents. Once again, we use .xslt as the filename extension, but .xsl is also common. This stylesheet is based on the skeleton document presented in Example 2-4. However, it produces XHTML instead of HTML. Example 2-6. schedule.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1 transitional.dtd"/> <!--**************************************************************** ** "/" template ************************************************************* --> <xsl:template match="/"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Schedule</title> </head> <body> <h2 align="center"> <xsl:value-of select="schedule/owner/name/first"/> <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="schedule/owner/name/last"/>'s Schedule</h2> <xsl:apply-templates select="schedule/appointment"/> </body> </html> </xsl:template> <!--*************************************************************** ** "appointment" template *********************************************** *************--> <xsl:template match="appointment"> <hr/> <h3>Appointment</h3> <xsl:apply-templates select="when"/> <table> <tr> <td>Subject:</td> <td> <xsl:value-of select="subject"/> </td> </tr> <tr> <td>Location:</td> <td> <xsl:value-of select="location"/> </td> </tr> <tr> <td>Note:</td> <td> <xsl:value-of select="note"/>
Slide 46: </td> </tr> </table> </xsl:template> <!--**************************************************************** ** "when" template ************************************************************* --> <xsl:template match="when"> <p> <xsl:value-of select="date/@month"/> <xsl:text>/</xsl:text> <xsl:value-of select="date/@day"/> <xsl:text>/</xsl:text> <xsl:value-of select="date/@year"/> from <xsl:value-of select="startTime/@hour"/> <xsl:text>:</xsl:text> <xsl:value-of select="startTime/@minute"/> until <xsl:value-of select="endTime/@hour"/> <xsl:text>:</xsl:text> <xsl:value-of select="endTime/@minute"/> </p> </xsl:template> </xsl:stylesheet> The first part of this stylesheet should look familiar. The first four lines are typical of just about any stylesheet you will write. Next, the output method is specified as xml because this stylesheet is producing XHTML instead of HTML: <xsl:output method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1 transitional.dtd"/> The <xsl:output> element produces the following XHTML content: <?xml version="1.0" encoding="UTF-16"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> Moving on, the first template in the stylesheet matches "/" and outputs the skeleton for the XHTML document. Another requirement for XHTML is the namespace attribute on the <html> element: <html xmlns="http://www.w3.org/1999/xhtml"> The remainder of schedule.xslt consists of additional templates, each of which matches a particular pattern in the XML input.
Slide 47: Because of its XML syntax, XSLT stylesheets can be hard to read. If you prefix each template with a distinctive comment block as shown in Example 2-6, it is fairly easy to see the overall structure of the stylesheet. Without consistent indentation and comments, the markup tends to run together, making the stylesheet much harder to understand and maintain. The <xsl:text> element is used to insert additional text into the result tree. Although plain text is allowed in XSLT stylesheets, the <xsl:text> element allows more explicit control over whitespace handling. As shown here, a nonbreaking space is inserted into the result tree: <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text> Unfortunately, the following syntax does not work: <!-- does not work... --> <xsl:text>&nbsp;</xsl:text> This is because &nbsp; is not one of the five built-in entities supported by XML. Since XSLT stylesheets are always well-formed XML, the parser complains when &nbsp; is found in the stylesheet. Replacing the first ampersand character with &amp; allows the XML parser to read the stylesheet into memory. The XML parser interprets this entity and sends the following markup to the XSLT processor: <!-- this is what the XSLT processor sees, after the XML parser interprets the &amp; entity --> <xsl:text disable-output-escaping="yes">&nbsp;</xsl:text> The second piece of this solution is the disable-output-escaping="yes" attribute. Without this attribute the XSLT processor may attempt to escape the nonbreaking space by converting it into an actual character. This causes many web browsers to display question marks because they cannot interpret the character. Disabling output escaping tells the XSLT processor to pass &nbsp; to the result tree. Web browsers then interpret and display the nonbreaking space properly. In the final template shown in Example 2-6, you may notice the element <xsl:value-of select="date/@month"/>. The @ character represents an attribute, so in this case the stylesheet is outputting the value of the month attribute on the date element. For this element: <date month="03" day="15" year="2001"/>, the value "03" is copied to the result tree. 2.4 XPath Basics XPath is another recommendation from the W3C and is designed for use by XSLT and another technology called XPointer. The primary goal of XPath is to define a mechanism for addressing portions of an XML document, which means it is used for locating element nodes, attribute nodes, text nodes, and anything else that can occur in an XML document. XPath treats these nodes as part of a tree structure rather than dealing with XML as a text string. XSLT also relies on the tree structure that XPath defines. In addition to addressing, XPath contains a set of functions to format text, convert to and from numbers, and deal with booleans.
Slide 48: Unlike XSLT, XPath itself is not expressed using XML syntax. A simplified syntax makes sense when you consider that XPath is most commonly used inside of attribute values within other XML documents. XPath includes both a verbose syntax and a set of abbreviations, which end up looking a lot like path names on a file system or web site. 2.4.1 How XSLT Uses XPath XSLT uses XPath in three basic ways: • To select and match patterns in the original XML data. Using XPath in this manner is the focus of this chapter. You see this most often in <xsl:template match="pattern"> and <xsl:apply-templates select="node-set-expression"/>. In either case, XPath syntax is used to locate various types of nodes. To support conditional processing. We will see the exact syntax of <xsl:if> and <xsl:choose> in the next chapter, both of which rely on XPath's ability to represent boolean values of true and false. To generate text. A number of string formatting instructions are provided, giving you the ability to concatenate strings, manipulate substrings, and convert from other data types to strings. Again, this will be covered in the next chapter. • • 2.4.2 Axes Whenever XSLT uses XPath, something in the XML data is considered to be the current context node. XPath defines seven different types of nodes, each representing a different part of the XML data. These are the document root, elements, text, attributes, processing instructions, comments, and nodes representing namespaces. An axis represents a relationship to the current context node, which may be any one of the preceding seven items. A few examples should clear things up. One axis is child, representing all immediate children of the context node. From our earlier schedule.xml example, the child axis of <name> includes the <first> and <last> elements. Another axis is parent, which represents the immediate parent of the context node. In many cases the axis is empty. For example, the document root node has no parent axis. Figure 2-4 illustrates some of the other axes. Figure 2-4. XPath axes
Slide 49: As you can see, the second <department> element is the context node. The diagram illustrates how some of the more common axes relate to this node. Although the names are singular, in most cases the axes represent node sets rather than individual nodes. The code: <xsl:apply-templates select="child::team"/> selects all <team> children, not just the first one. Table 2-1 lists the available axes in alphabetical order, along with a brief description of each. Table 2-1. Axes summary Axis name ancestor ancestor-orself attribute child Description The parent of the context node, its parent, and so on until the root node is reached. The ancestor of the root is an empty node set. The same as ancestor, with the addition of the context node. The root node is always included. All attributes of the context node. All immediate children of the context node. Attributes and namespace nodes are not included. All children, grandchildren, and so forth. Attribute and namespace nodes are not considered descendants of element nodes. Same as descendant, with the addition of the context node. descendant descendantor-self
Slide 50: following followingsibling namespace parent preceding All elements in the document that occur after the context node. Descendants of the context node are not included. All following nodes in the document that have the same parent as the context node. The namespace nodes of the context node. The immediate parent of the context node, if a parent exists. All nodes in the document that occur before the context node, except for ancestors, attribute nodes, and namespace nodes. All nodes in the document that occur before the context node and have the same parent. This axis is empty if the context node is an attribute node or a namespace node. The context node itself. precedingsibling self 2.4.3 Location Steps As you may have guessed, an axis alone is only a piece of the puzzle. A location step is a more complex construct used by XPath and XSLT to select a node set from the XML data. Location steps have the following syntax: axis::node-test[predicate-1]...[predicate-n] The axis and node-test are separated by double colons and are followed by zero or more predicates. As mentioned, the job of the axis is to specify the relationship between the context node and the node-test. The node-test allows you to specify the type of node that will be selected, and the predicates filter the resulting node set. Once again, discussion of XSLT and XPath tends to sound overly technical until you see a few basic examples. Let's start with a basic fragment of XML: <message> <header> <!-- the context node --> <subject>Hello, World</subject> <date mm="03" dd="01" yy="2002"/> <sender>pres@whitehouse.gov</sender> <recipient>burke_e@ociweb.com</recipient> <recipient>burke_e@yahoo.com</recipient> <recipient>aidan@burke.com</recipient> </header> <body> ... </body> </message> If the <header> is the context node, then child::subject will select the <subject> node, child::recipient will select the set of all <recipient> nodes, and child::* will select all children of <header>. The asterisk (*) character is a wildcard that represents all nodes of the principal node type. Each axis has a principal node type, which is always element unless the axis is attribute or namespace. If <date> is the context node, then attribute::yy will select the yy attribute, and attribute::* will select all attributes of the <date> element.
Slide 51: Without any predicates, a location step can result in zero or more nodes. Adding a predicate simply filters the resulting node set, generally reducing the size of the resulting node set. Adding additional predicates applies additional filters. For example, child::recipient[position( )=1] will initially select all <recipient> elements from the previous example then filter (reduce) the list down to the first one: burke_e@ociweb.com. Positions start at 1, rather than 0. As Example 2-8 will show, predicates can contain any XPath expression and can become quite sophisticated. 2.4.4 Location Paths Location paths consist of one or more location steps, separated by slash (/) characters. An absolute location path begins with the slash (/) character and is relative to the document root. All other types of location paths are relative to the context node. Paths are evaluated from left to right, just like a path in a file system or a web site. The XML shown in Example 2-7 is a portion of a larger file containing basic information about U.S. presidents. This is used to demonstrate a few more XSLT and XPath examples. Example 2-7. presidents.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="xpathExamples.xslt"?> <presidents> <president> <term from="1789" to="1797"/> <name> <first>George</first> <last>Washington</last> </name> <party>Federalist</party> <vicePresident> <name> <first>John</first> <last>Adams</last> </name> </vicePresident> </president> <president> <term from="1797" to="1801"/> <name> <first>John</first> <last>Adams</last> </name> <party>Federalist</party> <vicePresident> <name> <first>Thomas</first> <last>Jefferson</last> </name> </vicePresident> </president> /** * remaining presidents omitted */ The complete file is too long to list here but is included with the downloadable files for this book. The <vicePresident> element can occur many times or not at all because some presidents
Slide 52: did not have vice presidents. Names can also contain optional <middle> elements. Using this XML data, the XSLT stylesheet in Example 2-8 shows several location paths. Example 2-8. Location paths <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" /> <xsl:template match="/"> <html> <body> <h1>XPath Examples</h1> The third president was: <ul> <xsl:apply-templates select="presidents/president[position( 3]/name"/> </ul> )= Presidents without vice presidents were: <ul> <xsl:apply-templates select="presidents/president[count(vicePresident) = 0]/name"/> </ul> Presidents elected before 1800 were: <ul> <xsl:apply-templates select="presidents/president[term/@from &lt; 1800]/name"/> </ul> Presidents with more than one vice president were: <ul> <xsl:apply-templates select="descendant::president[count(vicePresident) > 1]/name"/> </ul> Presidents named John were: <ul> <xsl:apply-templates select="presidents/president/name[child::first='John']"/> </ul> Presidents elected between 1800 and 1850 were: <ul> <xsl:apply-templates select="presidents/president[(term/@from > 1800) and (term/@from &lt; 1850)]/name"/> </ul> </body> </html> </xsl:template>
Slide 53: <xsl:template match="name"> <li> <xsl:value-of select="first"/> <xsl:text> </xsl:text> <xsl:value-of select="middle"/> <xsl:text> </xsl:text> <xsl:value-of select="last"/> </li> </xsl:template> </xsl:stylesheet> In the first <xsl:apply-templates> element, the location path is as follows: presidents/president[position( ) = 3]/name This path consists of three location steps separated by slash (/) characters, but the final step is what we want to select. This path is read from left to right, so it first selects the <presidents> children of the current context. The next step is relative to the <presidents> context and selects all <president> children. It then filters the list according to the predicate. The third <president> element is now the context, and its <name> children are selected. Since each president has only one <name>, the template that matches "name" is instantiated only once. This location path shows how to perform basic numeric comparisons: presidents/president[term/@from &lt; 1800]/name Since the less-than (<) character cannot appear in an XML attribute value, the &lt; entity must be substituted. In this particular example, we use the @ abbreviated syntax to represent the attribute axis. 2.4.5 Abbreviated Syntax Using descendant::, child::, parent::, and other axes is very verbose, requiring a lot of typing. Fortunately, XPath supports an abbreviated syntax for many of these axes that requires a lot less effort. The abbreviated syntax has the added advantage in that it looks like you are navigating the file system, so it tends to be somewhat more intuitive. Table 2-2 compares the abbreviated syntax to the verbose syntax. The abbreviated syntax is almost always used and will be used throughout the remainder of this book. Table 2-2. Abbreviated syntax Abbreviation // . .. @ descendant self parent attribute child Axis In the last row, the abbreviation for the child axis is blank, indicating that child:: is an implicit part of a location step. This means that vicePresident/name is equivalent to child::vicePresident/child::name. Additional explanations follow: • • vicePresident selects the vicePresident children of the context node. vicePresident/name selects all name children of vicePresident children of the context node.
Slide 54: • • • //name selects all name descendants of the context node. . selects the context node. ../term/@from selects the from attribute of term children of the context node's parent. 2.5 Looping and Sorting As shown throughout this chapter, you can use <xsl:apply-templates ...> to search for patterns in an XML document. This type of processing is sometimes referred to as a " data driven" approach because the data of the XML file drives the selection process. Another style of XSLT programming is called "template driven," which means that the template's code tends to drive the selection process. 2.5.1 Looping with <xsl:for-each> Sometimes it is convenient to explicitly drive the selection process with an <xsl:for-each> element, which is reminiscent of traditional programming techniques. In this approach, you explicitly loop over a collection of nodes without instantiating a separate template as <xsl:apply-templates> does. The syntax for <xsl:for-each> is as follows: <xsl:for-each select="president"> ...content for each president element </xsl:for-each> The select attribute can contain any XPath location path, and the loop will iterate over each element in the resulting node set. In this example, the context is <president> for all content within the loop. Nested loops are possible and could be used to loop over the list of <vicePresident> elements. 2.5.2 Sorting Sorting can be applied in either a data-driven or template-driven approach. In either case, <xsl:sort> is added as a child element to something else. By adding several consecutive <xsl:sort> elements, you can accomplish multifield sorting. Each sort can be in ascending or descending order, and the data type for sorting is either "number" or "text". The sort order defaults to ascending. Some examples of <xsl:sort> include: <xsl:sort <xsl:sort <xsl:sort <xsl:sort first"/> select="first"/> select="last" order="descending"/> select="term/@from" order="descending" data -type="number"/> select="name/first" data-type="text" case-order="upper- In the last line, the case-order attribute specifies that uppercase letters should be alphabetized before lowercase letters. The other accepted value for this attribute is lower-first. According to the specification, the default behavior is "language dependent." 2.5.3 Looping and Sorting Examples The easiest way to learn about looping and sorting is to play around with a lot of small examples. The code in Example 2-9 applies numerous different looping and sorting strategies to our list of presidents. Comments in the code indicate what is happening at each step. Example 2-9. Looping and sorting <?xml version="1.0" encoding="UTF-8"?>
Slide 55: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/"> <html> <body> <h1>Sorting Examples</h1> <xsl:apply-templates select="presidents"/> </body> </html> </xsl:template> <!-******************************************************************** ** presidents template ***************************************************************** -> <xsl:template match="presidents"> <!-************************************************* **************** ** Sorting using xsl:for-each ************************************************************** -> <h2>All presidents sorted by first name using xsl:for -each</h2> <xsl:for-each select="president"> <xsl:sort select="name/first"/> <xsl:apply-templates select="name"/> </xsl:for-each> <!-***************************************************************** ** Sorting using xsl:apply-templates ************************************************* *************-> <h2>All presidents sorted by first name using xsl:apply templates</h2> <xsl:apply-templates select="president/name"> <xsl:sort select="first"/> </xsl:apply-templates> <h2>All presidents sorted by date using xsl:apply -templates</h2> <xsl:apply-templates select="president/name"> <xsl:sort select="../term/@from" data -type="number" order="descending"/> </xsl:apply-templates> <!-***************************************************************** ** Multi-field sorting ************************************************************** -> <h2>Multi-field sorting example</h2> <xsl:apply-templates select="president/name"> <xsl:sort select="last"/> <xsl:sort select="first" order="descending"/> </xsl:apply-templates> <!-***************************************************************** ** Nested xsl:for-each loops ************************************************************** ->
Slide 56: <h2>All presidents and vice presidents using xsl:for-each</h2> <ul> <xsl:for-each select="president"> <xsl:sort select="name/first" order="descending"/> <li> <xsl:apply-templates select="name"/> </li> <ul> <xsl:for-each select="vicePresident"> <xsl:sort select="name/first"/> <li> <xsl:apply-templates select="name"/> </li> </xsl:for-each> </ul> </xsl:for-each> </ul> <!-***************************************************************** ** Same as previous, only using xsl:apply -templates ************************************************************** -> <h2>All presidents and vice presidents using xsl:apply templates</h2> <ul> <xsl:apply-templates select="president"> <xsl:sort select="name/first" order="descending"/> </xsl:apply-templates> </ul> </xsl:template> <!--***************************************************************** ** 'president' template, outputs the president's name and vice ** president's name. ************************************************************** --> <xsl:template match="president"> <li> <xsl:apply-templates select="name"/> </li> <ul> <xsl:for-each select="vicePresident"> <xsl:sort select="name/first"/> <li> <xsl:apply-templates select="name"/> </li> </xsl:for-each> </ul> </xsl:template> <!--***************************************************************** ** name template, outputs first, middle, and last name ************************************************************** --> <xsl:template match="name"> <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="first"/> <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="middle"/> <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="last"/>
Slide 57: <br/> </xsl:template> </xsl:stylesheet> Notice that when applying a sort to <xsl:apply-templates>, that element can no longer be an empty element. Instead, one or more <xsl:sort> elements are added as children of <xsl:apply-templates>. You should also note that sorting cannot occur in the <xsl:template match="name"> element. The reason for this is simple: at the <xsl:applytemplates> end, you have a list of nodes to sort. By the time the processing reaches <xsl:template match="name">, the search has narrowed down to a single <name>, so there is no node list left to sort. 2.6 Outputting Dynamic Attributes Let's assume we have an XML document that lists books in a personal library, and we want to create an HTML document with links to these books on Amazon.com. In order to generate the hyperlink, the href attribute must contain the ISBN of the book, which can be found in our original XML data. An example of the URL we would like to generate is as follows: <a href="http://www.amazon.com/exec/obidos/ASIN/0596000162 ">Java and XML</a> One thought is to include <xsl:value-of select="isbn"/> directly inside of the attribute. However, XML does not allow you to insert the less-than (<) character inside of an attribute value: <!-- won't work... --> <a href="<xsl:value-of select="isbn"/>">Java and XML</a> We also need to consider that the attribute value is dynamic rather than static. XSLT does not automatically recognize content of the href="..." attribute as an XPath expression, since the <a> tag is not part of XSLT. There are two possible solutions to this problem. 2.6.1 <xsl:attribute> In the first approach, <xsl:attribute> is used to add one or more attributes to elements. In the following template, an href attribute is added to an <a> element: <xsl:template match="book"> <li> <a> <!-- the href attribute is generated below --> <xsl:attribute name="href"> <xsl:text>http://www.amazon.com/exec/obidos/ASIN/</xsl:text> <xsl:value-of select="@isbn"/> </xsl:attribute> <xsl:value-of select="title"/> </a> </li> </xsl:template> The <li> tag is used because this is part of a larger stylesheet that presents a bulleted list of links to each book. The <a> tag, as you can see, is missing its href attribute. The <xsl:attribute> element adds the missing href. Any child content of <xsl:attribute> is added to the attribute value. Because we do not want to introduce any unnecessary whitespace, <xsl:text> is used. Finally, <xsl:value-of> is used to select the isbn attribute. 2.6.2 Attribute Value Templates
Slide 58: Using <xsl:attribute> can be quite complex for a simple attribute value. Fortunately, XSLT provides a much simpler syntax called attribute value templates (AVT). The next example uses an AVT to achieve the identical result: <xsl:template match="book"> <li> <a href="http://www.amazon.com/exec/obidos/ASIN/{@isbn}"> <xsl:value-of select="title"/> </a> </li> </xsl:template> The curly braces ({}) inside of the attribute value cause the magic to happen. Normally, when the stylesheet encounters attribute values for HTML elements, it treats them as static text. The braces tell the processor to treat a portion of the attribute dynamically. In the case of {@isbn}, the contents of the curly braces is treated exactly as <xsl:value-of select="@isbn"/> in the previous approach. This is obviously much simpler. The text inside of the {} characters can be any location path, so you are not limited to selecting attributes. For example, to select the title of the book, simply change the value to {title}. So where do you use AVTs and where don't you? Well, whenever you need to treat an attribute value as an XPath expression rather than static text, you may need to use an AVT. But for standard XSLT elements, such as <xsl:template match="pattern">, you don't need to use the AVT syntax. For nonXSLT elements, such as any HTML tag, AVT syntax is required. 2.6.3 <xsl:attribute-set> There are times when you may want to define a group of attributes that can be reused. For this task, XSLT provides the <xsl:attribute-set> element. Using this element allows you to define a named group of attributes that can be referenced from other points in a stylesheet. The following stylesheet fragment shows how to define an attribute set: <xsl:attribute-set name="body-style"> <xsl:attribute name="bgcolor">yellow</xsl:attribute> <xsl:attribute name="text">green</xsl:attribute> <xsl:attribute name="link">navy</xsl:attribute> <xsl:attribute name="vlink">red</xsl:attribute> </xsl:attribute-set> This is a " top level element," which means that it can occur as a direct child of the <xsl:stylesheet> element. The definition of an attribute set does not have to come before templates that use it. The attribute set can be referenced from another <xsl:attribute-set>, from <xsl:element>, or from <xsl:copy> elements. We will talk about <xsl:copy> in the next chapter, but here is how <xsl:element> is used: <xsl:template match="/"> <html> <head> <title>Demo of attribute-set</title> </head> <xsl:element name="body" use-attribute-sets="body-style"> <h1>Books in my library...</h1> <ul> <xsl:apply-templates select="library/book"/> </ul> </xsl:element> </html>
Slide 59: </xsl:template> As you can probably guess, the code shown here will output an HTML body tag that looks like this: <body bgcolor="yellow" text="green" link="navy" vlink="red"> ...body content </body> In this particular example, the <xsl:attribute-set> was used only once, so its value is minimal. It is possible for one stylesheet to include another, however, as we will see in the next chapter. In this way, you can define the <xsl:attribute-set> in a fragment of XSLT included in many other stylesheets. Changes to the shared fragment are immediately reflected in all of your other stylesheets. Chapter 3. XSLT Part 2 -- Beyond the Basics As you may have guessed, this chapter is a continuation of the material presented in the previous chapter. The basic syntax of XSLT should make sense by now. If not, it is probably a good idea to sit down and write a few stylesheets to gain some basic familiarity with the technology. What we have seen so far covers the basic mechanics of XSLT but does not take full advantage of the programming capabilities this language has to offer. In particular, this chapter will show how to write more reusable, modular code through features such as named templates, parameters, and variables. The chapter concludes with a real-world example that uses XSLT to produce HTML documentation for Ant build files. Ant is a Java build tool that uses XML files instead of Makefiles to drive the compilation process. Since XML is used, XSLT is a natural choice for producing documentation about the build process. 3.1 Conditional Processing In the previous chapter, we saw a template that output the name of a president or vice president. Its basic job was to display the first name, middle name, and last name. A nonbreaking space was printed between each piece of data so the fields did not run into each other. What we did not see was that many presidents do not have middle names, so our template ended up printing the first name, followed by two spaces, followed by the last name. To fix this, we need to check for the existence of a middle name before simply outputting its content and a space. This requires conditional logic, a feature found in just about every programming language in existence. XSLT provides two mechanisms that support conditional logic: <xsl:if> and <xsl:choose>. These allow a stylesheet to produce different output depending on the results of a boolean expression, which must yield true or false as defined by the XPath specification. 3.1.1 <xsl:if> The behavior of the <xsl:if> element is comparable to the following Java code: if (boolean-expression) { // do something } In XSLT, the syntax is as follows: <xsl:if test="boolean-expression"> <!-- Content: template --> </xsl:if>
Slide 60: The test attribute is required and must contain a boolean expression. If the result is true, the content of this element is instantiated; otherwise, it is skipped. The code in Example 3-1 illustrates several uses of <xsl:if> and related XPath expressions. Code that is highlighted will be discussed in the next several paragraphs. Example 3-1. <xsl:if> examples <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <!--*************************************************** *** ** "/" template *************************************************** --> <xsl:template match="/"> <html> <body> <h1>Conditional Processing Examples</h1> <xsl:apply-templates select="presidents"/> </body> </html> </xsl:template> <!--****************************************************** ** "presidents" template *************************************************** --> <xsl:template match="presidents"> <h3> List of <xsl:value-of select="count(president)"/> Presidents </h3> <ul> <xsl:for-each select="president"> <li> <!-- display every other row in bold --> <xsl:if test="(position( ) mod 2) = 0"> <xsl:attribute name="style"> <xsl:text>font-weight: bold;</xsl:text> </xsl:attribute> </xsl:if> <xsl:apply-templates select="name"/> <!-- display some text after the last element --> <xsl:if test="position() = last( )"> <xsl:text> (current president)</xsl:text> </xsl:if> </li> </xsl:for-each> </ul> </xsl:template> <!--****************************************************** ** "name" template ***************************************************--> <xsl:template match="name"> <xsl:value-of select="last"/> <xsl:text>, </xsl:text> <xsl:value-of select="first"/> <xsl:if test="middle">
Slide 61: <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="middle"/> </xsl:if> </xsl:template> </xsl:stylesheet> The first thing the match="presidents" template outputs is a heading that displays the number of presidents: List of <xsl:value-of select="count(president)"/> Presidents The count( ) function is an XPath node set function and returns the number of elements in a node set. In this case, the node set is the list of <president> elements that are direct children of the <presidents> element, so the number of presidents in the XML file is displayed. The next block of code does the bulk of the work in this stylesheet, outputting each president as a list item using a loop: <xsl:for-each select="president"> <li> <!-- display every other row in bold --> <xsl:if test="(position( ) mod 2) = 0"> <xsl:attribute name="style"> <xsl:text>font-weight: bold;</xsl:text> </xsl:attribute> </xsl:if> In this example, the <xsl:for-each> loop first selects all <president> elements that are immediate children of the <presidents> element. As the loop iterates over this node set, the position( ) function returns an integer representing the current node position within the current node list, beginning with index 1. The mod operator computes the remainder following a truncating division, just as Java and ECMAScript do for their % operator. The XPath expression (position( ) mod 2) = 0 will return true for even numbers; therefore the style attribute will be added to the <li> tag for every other president, making that list item bold. This template continues as follows: <xsl:apply-templates select="name"/> <!-- display some text after the last element --> <xsl:if test="position() = last( )"> <xsl:text> (current president)</xsl:text> </xsl:if> </li> </xsl:for-each> The last( ) function returns an integer indicating the size of the current context; in this case, it returns the number of presidents. When the position is equal to this count, the additional text (current president) is appended to the result tree. Java programmers should note that XPath uses a single = character for comparisons instead of ==, as Java does. A portion of the HTML for our list ends up looking like this: <li>Washington, George</li> <li style="font-weight: bold;">Adams, John</li> <li>Jefferson, Thomas</li> <li style="font-weight: bold;">Madison, James</li> <li>Monroe, James</li> <li style="font-weight: bold;">Adams, John&nbsp;Quincy</li> <li>Jackson, Andrew</li> ...remaining HTML omitted
Slide 62: <li>Bush, George (current president)</li> The name output has been improved from the previous chapter and now uses <xsl:if> to determine if the middle name is present: <xsl:template match="name"> <xsl:value-of select="last"/> <xsl:text>, </xsl:text> <xsl:value-of select="first"/> <xsl:if test="middle"> <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="middle"/> </xsl:if> </xsl:template> In this case, <xsl:if test="middle"> checks for the existence of a node set rather than for a boolean value. If any <middle> elements are found, the content of <xsl:if> is instantiated. The test does not have to be this simplistic; any of the XPath location paths from the previous chapter would work here as well. As written here, if any <middle> elements are found, the first one is printed. Later, in Example 3-7, <xsl:for-each> will be used to print all middle names for presidents, such as George Herbert Walker Bush. Checking for the existence of an attribute is very similar to checking for the existence of an element. For example: <xsl:if test="@someAttribute"> ...execute this code if "someAttribute" is present </xsl:if> Unlike most programming languages, <xsl:if> does not have a corresponding else or otherwise clause. This is only a minor inconvenience[1] because the <xsl:choose> element provides this functionality. [1] <xsl:choose> requires a lot of typing. 3.1.2 <xsl:choose>, <xsl:when>, and <xsl:otherwise> The XSLT equivalent of Java's switch statement is <xsl:choose> , which is virtually identical[2] in terms of functionality. <xsl:choose> must contain one or more <xsl:when> elements followed by an optional <xsl:otherwise> element. Example 3-2 illustrates how to use this feature. This example also uses <xsl:variable>, which will be covered in the next section. [2] Java's switch statement only works with char, byte, short, or int. Example 3-2. <xsl:choose> <xsl:template match="presidents"> <h3>Color Coded by Political Party</h3> <ul> <xsl:for-each select="president"> <xsl:variable name="color"> <!-- define the color value based on political party --> <xsl:choose> <xsl:when test="party = 'Democratic'"> <xsl:text>blue</xsl:text> </xsl:when> <xsl:when test="party = 'Republican'">
Slide 63: <xsl:text>green</xsl:text> </xsl:when> <xsl:when test="party = 'Democratic Republican'"> <xsl:text>purple</xsl:text> </xsl:when> <xsl:when test="party = 'Federalist'"> <xsl:text>brown</xsl:text> </xsl:when> <xsl:when test="party = 'Whig'"> <xsl:text>black</xsl:text> </xsl:when> <!-- never executed in this example --> <xsl:otherwise> <xsl:text>red</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:variable> <li> <font color="{$color}"> <!-- show the party name --> <xsl:apply-templates select="name"/> <xsl:text> - </xsl:text> <xsl:value-of select="party"/> </font> </li> </xsl:for-each> </ul> </xsl:template> In this example, the list of presidents is displayed in order along with the political party of each president. The <xsl:when> elements test for each possible party, setting the value of a variable. This variable, color, is then used in a font tag to set the current color to something different for each party. The <xsl:otherwise> element is never executed because all of the political parties are listed in the <xsl:when> elements. If a new president affiliated with some other political party is ever elected, then none of the <xsl:when> conditions would be true, and the font color would be red. One difference between the XSLT approach and a pure Java approach is that XSLT does not require break statements between <xsl:when> elements. In XSLT, the <xsl:when> elements are evaluated in the order in which they appear, and the first one with a test expression resulting in true is evaluated. All others are skipped. If no <xsl:when> elements match, then <xsl:otherwise>, if present, is evaluated. Since <xsl:if> has no corresponding <xsl:else>, <xsl:choose> can be used to mimic the desired functionality as shown here: <xsl:choose> <xsl:when test="condition"> <!-- if condition --> </xsl:when> <xsl:otherwise> <!-- else condition --> </xsl:otherwise> </xsl:choose> As with other parts of XSLT, the XML syntax forces a lot more typing than Java programmers are accustomed to, but the mechanics of if/else are faithfully preserved.
Slide 64: 3.2 Parameters and Variables As in other programming languages, it is often desirable to set up a variable whose value is reused in several places throughout a stylesheet. If the title of a book is displayed repeatedly, then it makes sense to store that title in a variable rather than scan through the XML data and locate the title repeatedly. It can also be beneficial to set up a variable once and pass it as a parameter to one or more templates. These templates often use <xsl:if> or <xsl:choose> to produce different content depending on the value of the parameter that was passed. 3.2.1 <xsl:variable> Variables in XSLT are defined with the <xsl:variable> element and can be global or local. A global variable is defined at the "top-level" of a stylesheet, which means that it is defined outside of any templates as a direct child of the <xsl:stylesheet> element. Top-level variables are visible throughout the entire stylesheet, even in templates that occur before the variable declaration. The other place to define a variable is inside of a template. These variables are visible only to elements that follow the <xsl:variable> declaration within that template and to their descendants. The code in Example 3-2 showed this form of <xsl:variable> as a mechanism to define the font color. 3.2.1.1 Defining variables Variables can be defined in one of three ways: <xsl:variable name="homePage">index.html</xsl:varia ble> <xsl:variable name="lastPresident"select="president[position() = last( )]/name"/> <xsl:variable name="empty"/> In the first example, the content of <xsl:variable> specifies the variable value. In the simple example listed here, the text index.html is assigned to the homePage variable. More complex content is certainly possible, as shown earlier in Example 3-2. The second way to define a variable relies on the select attribute. The value is an XPath expression, so in this case we are selecting the name of the last president in the list. Finally, a variable without a select attribute or content is bound to an empty string. The example shown in item 3 is equivalent to: <xsl:variable name="empty" select="''"/> 3.2.1.2 Using variables To use a variable, refer to the variable name with a $ character. In the following example, an XPath location path is used to select the name of the last president. This text is then stored in the lastPresident variable: <xsl:variable name="lastPresident" select="president[position() = last( )]/name"/> Later in the same stylesheet, the lastPresident variable can be displayed using the following fragment of code: <xsl:value-of select="$lastPresident"/> Since the select attribute of <xsl:value-of> expects to see an XPath expression, $lastPresident is treated as something dynamic, rather than as static text. To use a variable within an HTML
Slide 65: attribute value, however, you must use the attribute value template (AVT) syntax, placing braces around the variable reference: <a href="{$homePage}">Click here to return to the home page...</a> Without the braces, the variable would be misinterpreted as literal text rather than treated dynamically. The primary limitation of variables is that they cannot be changed. It is impossible, for example, to use a variable as a counter in an <xsl:for-each> loop. This can be frustrating to programmers accustomed to variables that can be changed, but can often be overcome with some ingenuity. It usually comes down to passing a parameter to a template instead of using a global variable and then recursively calling the template again with an incremented parameter value. An example of this technique will be presented shortly. Another XSLT trick involves combining the variable initialization with <xsl:choose>. Since variables cannot be changed, you cannot first declare a variable and then assign its value later on. The workaround is to place the variable definition as a child of <xsl:variable>, perhaps using <xsl:choose> as follows: <xsl:variable name="midName"> <xsl:choose> <xsl:when test="middleName"> <xsl:value-of select="middleName"/> </xsl:when> <xsl:otherwise> <xsl:text> </xsl:text> </xsl:otherwise> </xsl:choose> </xsl:variable> This code defines a variable called midName. If the <middleName> element is present, its value is assigned to midName. Otherwise, a blank space is assigned. 3.2.2 <xsl:call-template> and Named Templates Up until this point, all of the templates have been tightly coupled to the actual data in the XML source. For example, the following template matches an <employee> element; therefore, <employee> must be contained within your XML data: <xsl:template match="employee"> ...content, perhaps display the name and SSN for the employee </xsl:template> But in many cases, you may wish to use this template for types of elements other than <employee>. In addition to <employee> elements, you may want to use this same code to output information for a <programmer> or <manager> element. In these circumstances, <xsl:call-template> can be used to explicitly invoke a template by name, rather than matching a pattern in the XML data. The template will have the following form: <xsl:template name="formatSSN"> ...content </xsl:template> This template will be used to support the following XML data, in which both <manager> and <programmer> elements have ssn attributes. Using a single named template avoids the necessity to write one template for <manager> and another for <programmer>. We will see an example XSLT stylesheet when we discuss parameters. <?xml version="1.0" encoding="UTF-8"?>
Slide 66: <team> <manager ssn="230568737"> <name>Aidan Burke</name> </manager> <programmer ssn="393776766"> <name>Jennifer Burke</name> </programmer> <programmer ssn="993885777"> <name>Bill Tellam</name> </programmer> </team> 3.2.3 <xsl:param>and <xsl:with-param> It is difficult to use named templates without parameters, and parameters can also be used for regular templates. Parameters allow the same template to take on different behavior depending on data the caller provides, resulting in more reusable code fragments. In the case of a named template, parameters allow data such as a social security number to be passed into the template. Example 3-3 contains a complete stylesheet that demonstrates how to pass the ssn parameter into a named template. Example 3-3. namedTemplate.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/"> <html> <body> <h3>Team Members</h3> <ul> <xsl:for-each select="team/manager|team/programmer"> <xsl:sort select="name"/> <li> <xsl:value-of select="name"/> <xsl:text>, ssn = </xsl:text> <xsl:call-template name="formatSSN"> <xsl:with-param name="ssn" select="@ssn"/> </xsl:call-template> </li> </xsl:for-each> </ul> </body> </html> </xsl:template> <!-- a named template that formats a 9 digit SSN by inserting '-' characters --> <xsl:template name="formatSSN"> <xsl:param name="ssn"/> <xsl:value-of select="substring($ssn, 1, 3)"/> <xsl:text>-</xsl:text> <xsl:value-of select="substring($ssn, 4, 2)"/> <xsl:text>-</xsl:text> <xsl:value-of select="substring($ssn, 6)"/> </xsl:template> </xsl:stylesheet>
Slide 67: This stylesheet displays the managers and programmers in a list, sorted by name. The <xsl:for-each> element selects the union of team/manager and team/programmer, so all of the managers and programmers are listed. The pipe operator (|) computes the union of its two operands: <xsl:for-each select="team/manager|team/programmer"> For each manager or programmer, the content of the <name> element is printed, followed by the value of the ssn attribute, which is passed as a parameter to the formatSSN template. Passing one or more parameters is accomplished by adding <xsl:with-param> as a child of <xsl:call-template> . To pass additional parameters, simply list additional <xsl:withparam> elements, all as children of <xsl:call-template>. At the receiving end, <xsl:param> is used as follows: <xsl:template name="formatSSN"> <xsl:param name="ssn"/> ... In this case, the value of the ssn parameter defaults to an empty string if it is not passed. In order to specify a default value for a parameter, use the select attribute. In the following example, the zeros are in apostrophes in order to treat the default value as a string rather than as an XPath expression: <xsl:param name="ssn" select="'000000000'"/> Within the formatSSN template, you can see that the substring( ) function selects portions of the social security number string. More details on substring( ) and other string-formatting functions are discussed later in this chapter. 3.2.4 Incrementing Variables Unfortunately, there is no standard way to increment a variable in XSLT. Once a variable has been defined, it cannot be changed. This is comparable to a final field in Java. In some circumstances, however, recursion combined with template parameters can achieve similar results. The XML shown in Example 3-4 will be used to illustrate one such approach. Example 3-4. familyTree.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="familyTree.xslt"?> <person name="Otto"> <person name="Sandra"> <person name="Jeremy"> <person name="Eliana"/> </person> <person name="Eric"> <person name="Aidan"/> </person> <person name="Philip"> <person name="Alex"/> <person name="Andy"/> </person> </person> </person> As you can see, the XML is structured recursively. Each <person> element can contain any number of <person> children, which in turn can contain additional <person> children. This is
Slide 68: certainly a simplified family tree, but this recursive pattern does occur in many XML documents. When displaying this family tree, it is desirable to indent the text according to the ancestry. Otto would be at the root, Sandra would be indented by one space, and her children would be indented by an additional space. This gives a visual indication of the relationships between the people. For example: Otto Sandra Jeremy Eliana Eric Aidan Philip Alex Andy The XSLT stylesheet that produces this output is shown in Example 3-5. Example 3-5. familyTree.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <!-- processing begins here --> <xsl:template match="/"> <html> <body> <!-- select the top level person --> <xsl:apply-templates select="person"> <xsl:with-param name="level" select="'0'"/> </xsl:apply-templates> </body> </html> </xsl:template> <!-- Output information for a person and recursively select all children. --> <xsl:template match="person"> <xsl:param name="level"/> <!-- indent according to the level --> <div style="text-indent:{$level}em"> <xsl:value-of select="@name"/> </div> <!-- recursively select children, incrementing the level --> <xsl:apply-templates select="person"> <xsl:with-param name="level" select="$level + 1"/> </xsl:apply-templates> </xsl:template> </xsl:stylesheet> As usual, this stylesheet begins by matching the document root and outputting a basic HTML document. It then selects the root <person> element, passing level=0 as the parameter to the template that matches person: <xsl:apply-templates select="person">
Slide 69: <xsl:with-param name="level" select="'0'"/> </xsl:apply-templates> The person template uses an HTML <div> tag to display each person's name on a new line and specifies a text indent in ems. In Cascading Style Sheets, one em is supposed to be equal to the width of the lowercase letter m in the current font. Finally, the person template is invoked recursively, passing in $level + 1 as the parameter. Although this does not increment an existing variable, it does pass a new local variable to the template with a larger value than before. Other than tricks with recursive processing, there is really no way to increment the values of variables in XSLT. 3.2.5 Template Modes The final variation on templates is that of the mode. This feature is similar to parameters but a little simpler, sometimes resulting in cleaner code. Modes make it possible for multiple templates to match the same pattern, each using a different mode of operation. One template may display data in verbose mode, while another may display the same data in abbreviated mode. There are no predefined modes; you make them up. The mode attribute looks like this: <xsl:template match="name" mode="verbose"> ...display the full name </xsl:template> <xsl:template match="name" mode="abbreviated"> ...omit the middle name </xsl:template> In order to instantiate the appropriate template, a mode attribute must be added to <xsl:applytemplates> as follows: <xsl:apply-templates select="president/name" mode="verbose"/> If the mode attribute is omitted, then the processor searches for a matching template that does not have a mode. In the code shown here, both templates have modes, so you must include a mode on <xsl:apply-templates> in order for one of your templates to be instantiated. A complete stylesheet is shown in Example 3-6. In this example, the name of a president may occur inside either a table or a list. Instead of passing a parameter to the president template, two modes of operation are defined. In table mode, the template displays the name as a row in a table. In list mode, the name is displayed as an HTML list item. Example 3-6. Template modes <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <!-** Demonstrates how to use template modes --> <xsl:template match="/"> <html> <body> <h2>Presidents in an HTML Table</h2> <table border="1"> <tr> <th>Last Name</th>
Slide 70: <th>First Name</th> </tr> <xsl:apply-templates select="//president" mode="table"/> </table> <h2>Presidents in an Unordered List</h2> <ul> <xsl:apply-templates select="//president" mode="list"/> </ul> </body> </html> </xsl:template> <!-** Display a president's name as a table row --> <xsl:template match="president" mode="table"> <tr> <td> <xsl:value-of select="name/last"/> </td> <td> <xsl:value-of select="name/first"/> </td> </tr> </xsl:template> <!-** Display a president's name as a list item --> <xsl:template match="president" mode="list"> <li> <xsl:value-of select="name/last"/> <xsl:text>, </xsl:text> <xsl:value-of select="name/first"/> </li> </xsl:template> </xsl:stylesheet> 3.2.6 <xsl:template> Syntax Summary Sorting through all of the possible variations of <xsl:template> is a seemingly difficult task, but we have really only covered three attributes: match Specifies the node in the XML data that a template applies to name Defines an arbitrary name for a template, independent of specific XML data mode Similar to method overloading in Java, allowing multiple versions of a template that match the same pattern The only attribute we have not discussed in detail is priority, which is used to resolve conflicts when more than one template matches. The XSLT specification defines a very specific set of
Slide 71: steps for processors to follow when more than one template rule matches.[3] From a code maintenance perspective, it is a good idea to avoid conflicting template rules within a stylesheet. When combining multiple stylesheets, however, you may find yourself with conflicting template rules. In these cases, specifying a higher numeric priority for one of the conflicting templates can resolve the problem. Table 3-1 provides a few summarized examples of the various forms of <xsl:template>. [3] See section 5.5 of the XSLT specification at http://www.w3.org/TR/xslt. Table 3-1. Summary of common template syntax Template example <xsl:template match="president"> ... </xsl:template> <xsl:template name="formatName"> <xsl:param name="style"/> ... </xsl:template> <xsl:template match="customer" mode="myModeName"> ... </xsl:template> Notes Matches president nodes in the source XML document Defines a named template; used in conjunction with <xsl:call-template> and <xsl:with-param> Matches customer nodes when <xsl:apply-templates> also uses mode="myModeName" 3.3 Combining Multiple Stylesheets Through template parameters, named templates, and template modes, we have seen how to create more reusable fragments of code that begin to resemble function calls. By combining multiple stylesheets, one can begin to develop libraries of reusable XSLT templates that can dramatically increase productivity. Productivity gains occur because programmers are not writing the same code over and over for each stylesheet. Reusable code is placed into a single stylesheet and imported or included into other stylesheets. Another advantage of this technique is maintainability. XSLT syntax can get ugly, and modularizing code into small fragments can greatly enhance readability. For example, we have seen several examples related to the list of presidents so far. Since we almost always want to display the name of a president or vice president, name-formatting templates should be broken out into a separate stylesheet. Example 3-7 shows a stylesheet designed for reuse by other stylesheets. Example 3-7. nameFormatting.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <!-** Show a name formatted like: "Burke, Eric Matthew" --> <xsl:template match="name" mode="lastFirstMiddle"> <xsl:value-of select="last"/> <xsl:text>, </xsl:text>
Slide 72: <xsl:value-of select="first"/> <xsl:for-each select="middle"> <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="."/> </xsl:for-each> </xsl:template> <!-** Show a name formatted like: "Eric Matthew Burke" --> <xsl:template match="name" mode="firstMiddleLast"> <xsl:value-of select="first"/> <xsl:for-each select="middle"> <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="."/> </xsl:for-each> <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text> <xsl:value-of select="last"/> </xsl:template> </xsl:stylesheet> The code in Example 3-7 uses template modes to determine which template is instantiated. Adding additional templates would be simple, and those changes would be available to any stylesheet that included or imported this one. This stylesheet was designed to be reused by other stylesheets, so it does not include a template that matches the root node. For large web sites, the ability to import or include stylesheets is crucial. It almost goes without saying that every web page on a large site will contain the same navigation bar, footer, and perhaps a common heading region. Standalone stylesheet fragments included by other stylesheets should generate all of these reusable elements. This allows you to modify something like the copyright notice on your page footer in one place, and those changes are reflected across the entire web site without any programming changes. 3.3.1 <xsl:include> The <xsl:include> element allows one stylesheet to include another. It is only allowed as a top-level element, meaning that <xsl:include> elements are siblings to <xsl:template> elements in the stylesheet structure. The syntax of <xsl:include> is: <xsl:include href="uri-reference"/> When a stylesheet includes another, the included stylesheet is effectively inserted in place of the <xsl:include> element. Actually, the children of its <xsl:stylesheet> element are inserted into the including document. It is possible to include many other stylesheets and for those stylesheets to include others. Inclusion is a relatively simple mechanism because the resulting stylesheet behaves exactly as if you had typed all included elements into the including stylesheet. This can result in problems when two conflicting template rules are included, so you must be careful to plan ahead to avoid any conflicts. When a conflict occurs, the XSLT processor should report an error and halt. 3.3.2 <xsl:import> Importing (rather than including) a stylesheet adds some intelligence to the process. When conflicts occur, the importing stylesheet takes precedence over any imported stylesheets. Unlike <xsl:include>, <xsl:import> elements must occur before any other element children of <xsl:stylesheet>, as shown here:
Slide 73: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- xsl:import must occur before any other top -level elements --> <xsl:import href="pageElements.xslt"/> <xsl:import href="globalConstants.xslt"/> <xsl:output method="html"/> <xsl:template match="/"> <html> ... </html> </xsl:template> <!-- but xsl:include can occur anywhere, provided it is a top -level element --> <xsl:include href="nameFormatting.xslt"/> </xsl:stylesheet> For the purposes of most web sites, the most common usage pattern is for each page to import or include common stylesheet fragments, such as templates to produce page headers, footers, and other reusable elements on a web site. Once a stylesheet has been included or imported, its templates can be used as if they were in the current stylesheet. The key reason to use <xsl:import> instead of <xsl:include> is to avoid conflicts. If your stylesheet already has a template that matches pageHeader, you will not be able to include pageElements.xslt if it also has that template. On the other hand, you can use <xsl:import>. In this case, your own pageHeader template will take priority over the imported pageHeader. Changing all <xsl:import> elements to <xsl:include> will help identify any naming conflicts you did not know about. 3.4 Formatting Text and Numbers XSLT and XPath define a small set of functions to manipulate text and numbers. These allow you to concatenate strings, extract substrings, determine the length of a string, and perform other similar tasks. While these features do not approach the capabilities offered by a programming language like Java, they do allow for some of the most common string manipulation tasks. 3.4.1 Number Formatting The format-number( ) function is provided by XSLT to convert numbers such as 123 into formatted numbers such as $123.00. The function takes the following form: string format-number(number, string, string?) The first parameter is the number to format, the second is a format string, and the third (optional) is the name of an <xsl:decimal-format> element. We will cover only the first two parameters in this book. Interestingly enough, the behavior of the format-number( ) function is defined by the JDK 1.1.x version of the java.text.DecimalFormat class. For complete information on the syntax of the second argument, refer to the JavaDocs for JDK 1.1.x. Outputting currencies is a common use for the format-number( ) function. The pattern $#,##0.00 can properly format a number into just about any U.S. currency. Table 3-2 demonstrates several possible inputs and results for this pattern. Table 3-2. Formatting currencies using $#,##0.00
Slide 74: Number 0 0.9 0.919 10 1000 12345.12345 $0.00 $0.90 $0.92 $10.00 $1,000.00 $12,345.12 Result The XSLT code to utilize this function may look something like this: <xsl:value-of select="format-number(amt,'$#,##0.00')"/> It is assumed that amt is some element in the XML data,[4] such as <amt>1000</amt>. The # and 0 characters are placeholders for digits and behave exactly as java.text.DecimalFormat specifies. Basically, 0 is a placeholder for any digit, while # is a placeholder that is absent when the input value is 0. [4] The XSLT specification does not define what happens if the XML data does not contain a valid number. Besides currencies, another common format is percentages. To output a percentage, end the format pattern with a % character. The following XSLT code shows a few examples: <!-- outputs 0% --> <xsl:value-of select="format-number(0,'0%')"/> <!-- outputs 10% --> <xsl:value-of select="format-number(0.1,'0%')"/> <!-- outputs 100% --> <xsl:value-of select="format-number(1,'0%')"/> As before, the first parameter to the format-number( ) function is the actual number to be formatted, and the second parameter is the pattern. The 0 in the pattern indicates that at least one digit should always be displayed. The % character also has the side effect of multiplying the value by 100 so it is displayed as a percentage. Consequently, 0.15 is displayed as 15%, and 1 is displayed as 100%. To test more patterns, the XML data shown in Example 3-8 can be used. This works in conjunction with numberFormatting.xslt to display every combination of format and number listed in the XML data. Example 3-8. numberFormatting.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="numberFormatting.xslt"?> <numberFormatting> <formatSamples> <!-- add more <format> elements to test more combinations --> <format>$#,##0.00</format> <format>#.#</format> <format>0.#</format> <format>0.0</format>
Slide 75: <format>0%</format> <format>0.0#</format> </formatSamples> <numberSamples> <!-- add more <number> elements to test more combinations --> <number>-10</number> <number>-1</number> <number>0</number> <number>0.000123</number> <number>0.1</number> <number>0.9</number> <number>0.91</number> <number>0.919</number> <number>1</number> <number>10</number> <number>100</number> <number>1000</number> <number>10000</number> <number>12345.12345</number> <number>55555.55555</number> </numberSamples> </numberFormatting> The stylesheet, numberFormatting.xslt, is shown in Example 3-9. Comments in the code explain what happens at each step. To test new patterns and numbers, just edit the XML data and apply the transformation again. Since the XML file references the stylesheet with <?xmlstylesheet?>, you can simply load the XML into an XSLT compliant web browser and click on the Reload button to see changes as they are made. Example 3-9. numberFormatting.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/"> <html> <body> <!-- loop over each of the sample formats --> <xsl:for-each select="numberFormatting/formatSamples/format"> <h2> <!-- show the format as a heading --> <xsl:value-of select="."/> </h2> <table border="1" cellpadding="2" cellspacing="0"> <tr> <th>Number</th> <th>Result</th> </tr> <!-- pass the format as a parameter to the template that shows each number --> <xsl:apply-templates select="/numberFormatting/numberSamples/number"> <xsl:with-param name="fmt" select="."/> </xsl:apply-templates> </table> </xsl:for-each>
Slide 76: </body> </html> </xsl:template> <!-- output the number followed by the result of the format -number function --> <xsl:template match="number"> <xsl:param name="fmt"/> <tr> <td align="right"> <xsl:value-of select="."/> </td> <td align="right"> <!-- the first param is a dot, representing the text content of the <number> element --> <xsl:value-of select="format-number(.,$fmt)"/> </td> </tr> </xsl:template> </xsl:stylesheet> This stylesheet first loops over the list of <format> elements: <xsl:for-each select="numberFormatting/formatSamples/format"> Within the loop, all of the <number> elements are selected. This means that every format is applied to every number: <xsl:apply-templates select="/numberFormatting/numberSamples/number"> 3.4.2 Text Formatting Several text-formatting functions are defined by the XPath specification, allowing code in an XSLT stylesheet to perform such operations as concatenating two or more strings, extracting a substring, and computing the length of a string. Unlike strings in Java, all strings in XSLT and XPath are indexed from position 1 instead of position 0. Let's suppose that a stylesheet defines the following variables: <xsl:variable name="firstName" select="'Eric'"/> <xsl:variable name="lastName" select="'Burke'"/> <xsl:variable name="middleName" select="'Matthew'"/> <xsl:variable name="fullName" select="concat($firstName, ' ', $middleName, ' ', $lastName)"/> In the first three variables, apostrophes are used to indicate that the values are strings. Without the apostrophes, the XSLT processor would treat these as XPath expressions and attempt to select nodes from the XML input data. The third variable, fullName, demonstrates how the concat( ) function is used to concatenate two or more strings together. The function simply takes a comma-separated list of strings as arguments and returns the concatenated results. In this case, the value for fullName is "Eric Matthew Burke." Table 3-3 provides additional examples of string functions. The variables in this table are the same ones from the previous example. In the first column, the return type of the function is listed first, followed by the function name and the list of parameters. The second and third columns provide an example usage and the output from that example. Table 3-3. String function examples
Slide 77: Function syntax string concat (string,string,string*) boolean starts-with (string,string) boolean contains(string,string) string substring-before (string,string) string substring-after (string,string) string substring (string,number,number?) number stringlength(string?) string normalizespace(string?) string translate (string,string,string) Example concat($firstName, ' ', $lastName) starts-with($firstName, 'Er') contains($fullName, 'Smith') substring-before($fullName, ' ') substring-after($fullName, ' ') substring($middleName,1,1) string-length($fullName) normalize-space(' testing ') Output Eric Burke true false Eric Matthew Burke M 18 testing translate('test','aeiou','AEIOU') tEst All string comparisons, such as starts-with() and contains( ), are case-sensitive. There is no concept of case-insensitive comparison in XSLT. One potential workaround is to convert both strings to upper- or lowercase, and then perform the comparison. Converting a string to upper- or lowercase is not directly supported by a function in the current implementation of XSLT, but the translate( ) function can be used to perform the task. The following XSLT snippet converts a string from lower- to uppercase: translate($text, 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') In the substring-before( ) and substring-after( ) functions, the second argument contains a delimiter string. This delimiter does not have to be a single character, and an empty string is returned if the delimiter is not found. These functions could be used to parse formatted data such as dates: <date>06/25/1999</date> The XSLT used to extract the month, day, and year looks like this: <xsl:variable name="dateStr" select="//date"/> <xsl:variable name="dayYear" select="substring -after($dateStr, '/')"/> Month: <xsl:value-of select="substring-before($dateStr, '/')"/> <br/> Day: <xsl:value-of select="substring-before($dayYear, '/')"/> <br/> Year: <xsl:value-of select="substring-after($dayYear, '/')"/> In the first line of code, the dateStr variable is initialized to contain the full date. The next line then creates the dayYear variable, which contains everything after the first / character -- at this point, dateStr=06/25/1999 and dayYear=25/1999. In Java, this is slightly easier because you simply create an instance of the StringTokenizer class and iterate through the tokens or use the lastIndexOf( ) method of java.lang.String to locate the second /. With XSLT, the options are somewhat more limited. The remaining lines continue chopping up the variables into substrings, again delimiting on the / character. The output is as follows: Month: 06 Day: 25
Slide 78: Year: 1999 Another form of the substring( ) function takes one or two number arguments, indicating the starting index and the optional length of the substring. If the second number is omitted, the substring continues until the end of the input string. The starting index always begins at position 1, so substring("abcde",2,3) returns bcd, and substring("abcde",2) returns bcde. 3.5 Schema Evolution Looking beyond HTML generation, a key use for XSLT is transforming one form of XML into another form. In many cases, these are not radical transformations, but minor enhancements such as adding new attributes, changing the order of elements, or removing unused data. If you have only a handful of XML files to transform, it is a lot easier to simply edit the XML directly rather than going through the trouble of writing a stylesheet. But in cases where a large collection of XML documents exist, a single XSLT stylesheet can perform transformations on an entire library of XML files in a single pass. For B2B applications, schema evolution is useful when different customers require the same data, but in different formats. 3.5.1 An Example XML File Let's suppose that you wrote a logging API for your Java programs. Log files are written in XML and are formatted as shown in Example 3-10. Example 3-10. Log file before transformation <?xml version="1.0" encoding="UTF-8"?> <log> <message text="input parameter was null"> <type>ERROR</type> <when> <year>2000</year> <month>01</month> <day>15</day> <hour>03</hour> <minute>12</minute> <second>18</second> </when> <where> <class>com.foobar.util.StringUtil</class> <method>reverse(String)</method> </where> </message> <message text="cannot read config file"> <type>WARNING</type> <when> <year>2000</year> <month>01</month> <day>15</day> <hour>06</hour> <minute>35</minute> <second>44</second> </when> <where> <class>com.foobar.servlet.MainServlet</class> <method>init( )</method> </where>
Slide 79: </message> <!-- more messages ... --> </log> As you can see from this example, the file format is quite verbose. Of particular concern is how the date and time are written. Since log files can be quite large, it would be a good idea to select a more concise format for this information. Additionally, the text is stored as an attribute on the <message> element, and the type is stored as a child element. It would make more sense to list the type as an attribute and the message as an element. For example: <message type="WARNING"> <text>This is the text of a message. Multi-line messages are easier when an element is used instead of an attribute.</text> ...remainder omitted 3.5.2 The Identity Transformation Whenever writing a schema evolution stylesheet, it is a good idea to start with an identity transformation . This is a very simple template that simply takes the original XML document and "transforms" it into a new document with the same elements and attributes as the original document. Example 3-11 shows a stylesheet that contains an identity transformation template. Example 3-11. identityTransformation.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="@*|node( )"> <xsl:copy> <xsl:apply-templates select="@*|node( </xsl:copy> </xsl:template> </xsl:stylesheet> )"/> Amazingly, it takes only a single template to perform the identity transformation, regardless of the complexity of the XML data. Our stylesheet encodes the result using UTF-8 and indents lines, regardless of the original XML format. In XPath, node( ) is a node test that matches all child nodes of the current context. This is fine, but it omits the attributes of the current context. For this reason, @* must be unioned with node( ) as follows: <xsl:template match="@*|node( )"> Translated into English, this means that the template will match any attribute or any child node of the current context. Since node( ) includes elements, comments, processing instructions, and even text, this template will match anything that can occur in the XML document. Inside of our template, we use <xsl:copy> . As you can probably guess, this instructs the XSLT processor to simply copy the current node to the result tree. To continue processing, <xsl:apply-templates> then selects all attributes or children of the current context using the following code: <xsl:apply-templates select="@*|node( )"/> 3.5.3 Transforming Elements and Attributes Once you have typed in the identity transformation and tested it, it is time to begin adding additional templates that actually perform the schema evolution. In XSLT, it is possible for two or more templates to match a pattern in the XML data. In these cases, the more specific template is
Slide 80: instantiated. Without going into a great deal of technical detail, an explicit match such as <xsl:template match="when"> takes precedence over the identity transformation template, which is essentially a wildcard pattern that matches any attribute or node. To modify specific elements and attributes, simply add more specific templates to the existing identity transformation stylesheet. In the log file example, a key problem is the quantity of XML data written for each <when> element. Instead of representing the date and time using a series of child elements, it would be much more concise to use the following syntax: <timestamp time="06:35:44" day="15" month="01" year="200 0"/> The following template will perform the necessary transformation: <xsl:template match="when"> <!-- change 'when' into 'timestamp', and change its child elements into attributes --> <timestamp time="{hour}:{minute}:{second}" year="{year}" month="{month}" day="{day}"/> </xsl:template> This template can be added to the identity transformation stylesheet and will take precedence whenever a <when> element is encountered. Instead of using <xsl:copy>, this template produces a new <timestamp> element AVTs are then used to specify attributes for this element, effectively converting element values into attribute values. The AVT syntax {hour} is equivalent to selecting the <hour> child of the <when> element. You may notice that XSLT processors do not necessarily preserve the order of attributes. This is not important because the relative ordering of attributes is meaningless in XML, and you cannot force the order of XML attributes. The next thing to tackle is the <message> element. As mentioned earlier, we would like to convert the text attribute to an element, and the <type> element to an attribute. Just like before, add a new template that matches the <message> element, which will take precedence over the identity transformation. Comments in the code explain what happens at each step. <!-- locate <message> elements --> <xsl:template match="message"> <!-- copy the current node, but not its attributes --> <xsl:copy> <!-- change the <type> element to an attribute --> <xsl:attribute name="type"> <xsl:value-of select="type"/> </xsl:attribute> <!-- change the text attribute to a child node --> <xsl:element name="text"> <xsl:value-of select="@text"/> </xsl:element> <!-- since the select attribute is not present, xsl:apply-templates processes all children of the current node. (not attributes or processing instructions!) --> <xsl:apply-templates/> </xsl:copy> </xsl:template> This almost completes the stylesheet. <xsl:copy> simply copies the <message> element to the result tree but does not copy any of its attributes or children. We can explicitly add new attributes
Slide 81: using <xsl:attribute> and explicitly create new child elements using <xsl:element>. <xsl:apply-templates> then tells the processor to continue the transformation process for the children of <message>. One problem right now is that the <type> element has been converted into an attribute but has not been removed from the document. The identity transformation still copies the <type> element to the result tree without modification. To fix this, simply add an empty template as follows: <xsl:template match="type"/> The complete schema evolution stylesheet simply contains the previous templates. Without duplicating all of the code, here is its overall structure: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF -8" indent="yes"/> <!-- the identity transformation --> <xsl:template match="@*|node( )"> ... </xsl:template> <!-- locate <message> elements --> <xsl:template match="message"> ... </xsl:template> <!-- locate <when> elements --> <xsl:template match="when"> ... </xsl:template> <!-- suppress the <type> element <xsl:template match="type"/> </xsl:stylesheet> 3.5.4 The Result File Now that the stylesheet is complete, it can be applied to all of the existing XML log files using a simple shell script or batch file. The resulting XML file is shown in Example 3-12. Example 3-12. Result of the transformation <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="schemaChange.xslt"?> <log> <message type="ERROR"> <text>input parameter was null</text> <timestamp time="03:12:18" day="15" month="01" year="2000"/> <where> <class>com.foobar.util.StringUtil</class> <method>reverse(String)</method> </where> </message> <message type="WARNING"> <text>cannot read config file</text>
Slide 82: <timestamp time="06:35:44" day="15" month="01" year="2000"/> <where> <class>com.foobar.servlet.MainServlet</class > <method>init( )</method> </where> </message> <message type="ERROR"> <text>negative duration is not allowed</text> <timestamp time="10:01:49" day="17" month="01" year="2000"/> <where> <class>com.foobar.util.DateUtil</class> <method>getWeek(int)</method> </where> </message> </log> 3.6 Ant Documentation Stylesheet Apache's Ant has taken the Java development community by storm, supplementing traditional Java IDEs and outright replacing Makefiles on most Java development projects. Ant is a build tool, similar to the make utility, only it uses XML files instead of Makefiles. In addition to a portable build file based on XML, Ant itself is written in Java and has few platform-specific dependencies. Finally, since Ant can reuse the same running instance of the Java Virtual Machine for nearly every step of the build process, it is blazingly fast. Ant can be downloaded from http://jakarta.apache.org and is open source software. 3.6.1 Ant Basics Ant is driven by an XML build file, which consists of one project. This project contains one or more targets, and targets can have dependencies on one another. The project and targets are represented as <project> and <target> in the XML build file; <project> must be the document root element. It is common to have a "prepare" target that builds the output directories and a "compile" target that depends on the "prepare" target. If you tell Ant to execute the "compile" target, it first checks to see that the "prepare" target has created the necessary directories. The structure of an Ant build file looks like this: <?xml version="1.0"?> <project name="SampleProject" default="compile" basedir="."> <!-- global properties --> <property name="srcdir" value="src"/> <property name="builddir" value="build"/> <target name="prepare" description="Creates the output directories"> ...tasks </target> <target name="compile" depends="prepare"> ...tasks </target> <target name="distribute" depends="compile"> ...tasks </target> </project>
Slide 83: For each target, Ant is smart enough to know if files have been modified and if it needs to do any work. For compilation, the timestamps of .class files are compared to timestamps of .java files. Through these dependencies, Ant can avoid unnecessary compilation and perform quite well. Although the targets shown here contain only single dependencies, it is possible for a target to depend on several other targets: <target name="X" depends="A,B,C"> Although Ant build files are much simpler than corresponding Makefiles, complex projects can introduce many dependencies that are difficult to visualize. It can be helpful to view the complete list of targets with dependencies displayed visually, such as in a hierarchical tree view. XSLT can be used to generate this sort of report. 3.6.2 Stylesheet Functionality Since the build file is XML, XSLT makes it easy to generate HTML web pages that summarize the targets and dependencies. Our stylesheet also shows a list of global properties and can easily be extended to display anything else contained in the build file. Although this stylesheet creates several useful HTML tables in its report, its most interesting feature is the ability to display a complete dependency graph of all Ant build targets. The output for this graph is shown in Example 3-13. Example 3-13. Target dependencies clean all (depends on clean, dist) prepare tomcat (depends on prepare) j2ee (depends on tomcat) j2ee-dist (depends on j2ee) main (depends on tomcat, webapps) dist (depends on main, webapps) dist-zip (depends on dist) all (depends on clean, dist) webapps (depends on prepare) dist (depends on main, webapps) dist-zip (depends on dist) all (depends on clean, dist) main (depends on tomcat, webapps) dist (depends on main, webapps) dist-zip (depends on dist) all (depends on clean, dist) targets This is actually the output from the Ant build file included with Apache's Tomcat. The list of toplevel targets is shown at the root level, and dependent targets are indented and listed next. The targets shown in parentheses list what each target depends on. This tree view is created by recursively analyzing the dependencies, which appear in the Ant build file as follows: <target name="all" depends="clean,dist"> Figure 3-1 shows a portion of the output in a web browser. A table listing all targets follows the dependency graph. The output concludes with a table of all global properties defined in the Ant build file. Figure 3-1. Antdoc sample output
Slide 84: The comma-separated list of dependencies presents a challenge that is best handled through recursion. For each target in the build file, it is necessary to print a list of targets that depend on that target. It is possible to have many dependencies, so an Ant build file may contain a <target> that looks like this: <target name="docs" depends="clean, prepare.docs, compile"> In the first prototype of the Antdoc stylesheet, the algorithm to print the dependency graph uses simple substring operations to determine if another target depends on the current target. This turns out to be a problem because two unrelated targets might have similar names, so some Ant build files cause infinite recursion in the stylesheet. In the preceding example, the original prototype of Antdoc says that "docs" depends on itself because its list of dependencies contains the text prepare.docs. In the finished version of Antdoc, the list of target dependencies is cleaned up to remove spaces and commas. For example, "clean, prepare.docs, compile" is converted into "|clean|prepare.docs|compile|". By placing the pipe (|) character before and after every dependency, it becomes much easier to locate dependencies by searching for strings. 3.6.3 The Complete Example
Slide 85: The complete XSLT stylesheet is listed in Example 3-14. Comments within the code explain what happens in each step. To use this stylesheet, simply invoke your favorite XSLT processor at the command line, passing antdoc.xslt and your Ant build file as parameters. Example 3-14. antdoc.xslt <?xml version="1.0" encoding="UTF-8"?> <!-************************************************************** ** Antdoc v1.0 ** ** Written by Eric Burke (burke_e@ociweb.com) ** ** Uses XSLT to generate HTML summary reports of Ant build ** files. *********************************************************** --> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" indent="yes" encoding="UTF-8"/> <!-- global variable: the project name --> <xsl:variable name="projectName" select="/project/@name"/> <xsl:template match="/"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Ant Project Summary <xsl:value-of select="$projectName"/></title> </head> <body> <h1>Ant Project Summary</h1> <xsl:apply-templates select="project"/> </body> </html> </xsl:template> <!-*************************************************************** ** "project" template ************************************************************ --> <xsl:template match="project"> <!-- show the project summary table, listing basic info such as name, default target, and base directory --> <table border="1" cellpadding="4" cellspacing="0"> <tr><th colspan="2">Project Summary</th></tr> <tr> <td>Project Name:</td> <td><xsl:value-of select="$projectName"/></td> </tr> <tr> <td>Default Target:</td> <td><xsl:value-of select="@default"/></td> </tr> <tr> <td>Base Directory:</td>
Slide 86: <td><xsl:value-of select="@basedir"/></td> </tr> </table> <!-- show all target dependencies as a tree --> <h3>Target Dependency Tree</h3> <xsl:apply-templates select="target[not(@depends)]" mode="tree"> <xsl:sort select="@name"/> </xsl:apply-templates> <p/> <!-- Show a table of all targets --> <table border="1" cellpadding="4" cellspacing="0"> <tr><th colspan="3">List of Targets</th></tr> <tr> <th>Name</th> <th>Dependencies</th> <th>Description</th> </tr> <xsl:apply-templates select="target" mode="tableRow"> <xsl:sort select="count(@description)" order="descending"/> <xsl:sort select="@name"/> </xsl:apply-templates> </table> <p/> <xsl:call-template name="globalProperties"/> </xsl:template> <!-*************************************************************** ** Create a table of all global properties. ************************************************************ --> <xsl:template name="globalProperties"> <xsl:if test="property"> <table border="1" cellpadding="4" cellspacing="0"> <tr><th colspan="2">Global Properties</th></tr> <tr> <th>Name</th> <th>Value</th> </tr> <xsl:apply-templates select="property" mode="tableRow"> <xsl:sort select="@name"/> </xsl:apply-templates> </table> </xsl:if> </xsl:template> <!-*************************************************************** ** Show an individual property in a table row. ************************************************************ --> <xsl:template match="property[@name]" mode="tableRow"> <tr> <td><xsl:value-of select="@name"/></td> <td> <xsl:choose> <xsl:when test="not(@value)">
Slide 87: <xsl:text disable-outputescaping="yes">&amp;nbsp;</xsl:text> </xsl:when> <xsl:otherwise> <xsl:value-of select="@value"/> </xsl:otherwise> </xsl:choose> </td> </tr> </xsl:template> <!-*************************************************************** ** "target" template, mode=tableRow ** Print a target name and its list of dependencies in a ** table row. ************************************************************ --> <xsl:template match="target" mode="tableRow"> <tr valign="top"> <td><xsl:value-of select="@name"/></td> <td> <xsl:choose> <xsl:when test="@depends"> <xsl:call-template name="parseDepends"> <xsl:with-param name="depends" select="@depends"/> </xsl:call-template> </xsl:when> <xsl:otherwise>-</xsl:otherwise> </xsl:choose> </td> <td> <xsl:if test="@description"> <xsl:value-of select="@description"/> </xsl:if> <xsl:if test="not(@description)"> <xsl:text>-</xsl:text> </xsl:if> </td> </tr> </xsl:template> <!-*************************************************************** ** "parseDepends" template ** Tokenizes and prints a comma separated list of dependencies. ** The first token is printed, and the remaining tokens are ** recursively passed to this template. ************************************************************ --> <xsl:template name="parseDepends"> <!-- this parameter contains the list of dependencies --> <xsl:param name="depends"/> <!-- grab everything before the first comma, or the entire string if there are no commas --> <xsl:variable name="firstToken"> <xsl:choose> <xsl:when test="contains($depends, ',')">
Slide 88: <xsl:value-of select="normalize-space(substring-before($depends, ','))"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="normalize-space($depends)"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name="remainingTokens" select="normalize-space(substring-after($depends, ','))"/> <!-- output the first dependency --> <xsl:value-of select="$firstToken"/> <!-- recursively invoke this template with the remainder of the comma separated list --> <xsl:if test="$remainingTokens"> <xsl:text>, </xsl:text> <xsl:call-template name="parseDepends"> <xsl:with-param name="depends" select="$remainingTokens"/> </xsl:call-template> </xsl:if> </xsl:template> <!-*************************************************************** ** This template will begin a recursive process that forms a ** dependency graph of all targets. ************************************************************ --> <xsl:template match="target" mode="tree"> <xsl:param name="indentLevel" select="'0'"/> <xsl:variable name="curName" select="@name"/> <div style="text-indent: {$indentLevel}em;"> <xsl:value-of select="$curName"/> <!-- if the 'depends' attribute is present, show the list of dependencies --> <xsl:if test="@depends"> <xsl:text> (depends on </xsl:text> <xsl:call-template name="parseDepends"> <xsl:with-param name="depends" select="@depends"/> </xsl:call-template> <xsl:text>)</xsl:text> </xsl:if> </div> <!-- set up the indentation --> <xsl:variable name="nextLevel" select="$ind entLevel+1"/> <!-- search all other <target> elements that have "depends" attributes --> <xsl:for-each select="../target[@depends]"> <!-- Take the comma-separated list of dependencies and "clean it up". See the comments for the "fixDependency"
Slide 89: template --> <xsl:variable name="correctedDependency"> <xsl:call-template name="fixDependency"> <xsl:with-param name="depends" select="@depends"/> </xsl:call-template> </xsl:variable> <!-- Now the dependency list is pipe (|) delimited, making it easier to reliably search for substrings. Recursively instantiate this template for all targets that depend on the current target --> <xsl:if test="contains($correctedDependency,concat('|',$curName,'|'))"> <xsl:apply-templates select="." mode="tree"> <xsl:with-param name="indentLevel" select="$nextLevel"/> </xsl:apply-templates> </xsl:if> </xsl:for-each> </xsl:template> <!-*************************************************************** ** This template takes a comma-separated list of dependencies ** and converts all commas to pipe (|) characters. It also ** removes all spaces. For instance: ** ** Input: depends="a, b,c " ** Ouput: |a|b|c| ** ** The resulting text is much easier to parse with XSLT. ************************************************************ --> <xsl:template name="fixDependency"> <xsl:param name="depends"/> <!-- grab everything before the first comma, or the entire string if there are no commas --> <xsl:variable name="firstToken"> <xsl:choose> <xsl:when test="contains($depends, ',')"> <xsl:value-of select="normalize-space(substring-before($depends, ','))"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="normalize-space($depends)"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <!-- define a variable that contains everything after the first comma --> <xsl:variable name="remainingTokens" select="normalize-space(substring-after($depends, ','))"/> <xsl:text>|</xsl:text> <xsl:value-of select="$firstToken"/> <xsl:choose> <xsl:when test="$remainingTokens">
Slide 90: <xsl:call-template name="fixDependency"> <xsl:with-param name="depends" select="$remainingTokens"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:text>|</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet> 3.6.3.1 Specifying XHTML output One of the first things this stylesheet does is set the output method to "xml" because the resulting page will be XHTML instead of HTML. The doctype-public and doctype-system are required for valid XHTML and indicate the strict DTD in this case: <xsl:output method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1 -strict.dtd" indent="yes" encoding="UTF-8"/> The remaining XHTML requirement is to declare the namespace of the <html> element: <xsl:template match="/"> <html xmlns="http://www.w3.org/1999/xhtml"> ... </html> </xsl:template> Because of these XSLT elements, the result tree will contain the following XHTML: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> ... </html> 3.6.3.2 Creating the dependency graph The most interesting and difficult aspect of this stylesheet is its ability to display the complete dependency graph for all Ant build targets. The first step is to locate all of the targets that do not have any dependencies. As shown in Example 3-13, these targets are named clean, prepare, and targets for the Tomcat build file. They are selected by looking for <target> elements that do not have an attribute named depends: <!-- show all target dependencies as a tree --> <h3>Target Dependency Tree</h3> <xsl:apply-templates select="target[not(@depends)]" mode="tree"> <xsl:sort select="@name"/> </xsl:apply-templates> The [not(@depends)] predicate will refine the list of <target> elements to include only those that do not have an attribute named depends. The <xsl:apply-templates> will instantiate the following template without any parameters: <xsl:template match="target" mode="tree"> <xsl:param name="indentLevel" select="'0'"/> <xsl:variable name="curName" select="@name"/>
Slide 91: If you refer to Example 3-14, you will see that this is the second-to-last template in the stylesheet. Since it is broken up into many pieces here, you may find it easier to refer to the original code as this description progresses. Since the indentLevel parameter is not specified, it defaults to '0', which makes sense for the top-level targets. As this template is instantiated recursively, the level of indentation increases. The curName variable is local to this template and contains the current Ant target name. Lines of text are indented using a style attribute: <div style="text-indent: {$indentLevel}em;"> CSS is used to indent everything contained within the <div> tag by the specified number of ems.[5] The value of the current target name is then printed using the appropriate indentation: [5] An em is approximately equal to the width of a lowercase letter "m" in the current font. <xsl:value-of select="$curName"/> If the current <target> element in the Ant build file has a depends attribute, its dependencies are printed next to the target name as part of the report. The parseDepends template handles this task. This template, also part of Example 3-14, is instantiated using <xsl:calltemplate>, as shown here: <xsl:if test="@depends"> <xsl:text> (depends on </xsl:text> <xsl:call-template name="parseDepends"> <xsl:with-param name="depends" select="@depends"/> </xsl:call-template> <xsl:text>)</xsl:text> </xsl:if> To continue with the dependency graph, the target template must instantiate itself recursively. Before doing this, the indentation must be increased. Since XSLT does not allow variables to be modified, a new variable is created: <xsl:variable name="nextLevel" select="$indentLevel+1 "/> When the template is recursively instantiated, nextLevel will be passed as the value for the indentLevel parameter: <xsl:apply-templates select="." mode="tree"> <xsl:with-param name="indentLevel" select="$nextLevel"/> </xsl:apply-templates> The remainder of the template is not duplicated here, but is emphasized in Example 3-14. The basic algorithm is as follows: • • • Use <xsl:for-each> to select all targets that have dependencies. Instantiate the "fixDependency" template to replace commas with | characters. Recursively instantiate the "target" template for all targets that depend on the current target. 3.6.3.3 Cleaning up dependency lists The final template in the Antdoc stylesheet is responsible for tokenizing a comma-separated list of dependencies, inserting pipe (|) characters between each dependency: <xsl:template name="fixDependency"> <xsl:param name="depends"/> The depends parameter may contain text such as "a, b, c." The template tokenizes this text, producing the following output:
Slide 92: |a|b|c| Since XSLT does not have an equivalent to Java's StringTokenizer class, recursion is required once again. The technique is to process the text before the first comma then recursively process everything after the comma. The following code assigns everything before the first comma to the firstToken variable: <xsl:variable name="firstToken"> <xsl:choose> <xsl:when test="contains($depends, ',')"> <xsl:value-of select="normalize-space(substring-before($depends, ','))"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="normalize-space($depends)"/> </xsl:otherwise> </xsl:choose> </xsl:variable> If the depends parameter contains a comma, the substring-before( ) function locates the text before the comma, and normalize-space( ) trims whitespace. If no commas are found, there must be only one dependency. Next, any text after the first comma is assigned to the remainingTokens variable. If there are no commas, the remainingTokens variable will contain an empty string: <xsl:variable name="remainingTokens" select="normalize-space(substring-after($depends, ','))"/> The template then outputs a pipe character followed by the value of the first token: <xsl:text>|</xsl:text> <xsl:value-of select="$firstToken"/> Next, if the remainingTokens variable is nonempty, the fixDependency template is instantiated recursively. Otherwise, another pipe character is output at the end: <xsl:choose> <xsl:when test="$remainingTokens"> <xsl:call-template name="fixDependency"> <xsl:with-param name="depends" select="$remainingTokens"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:text>|</xsl:text> </xsl:otherwise> </xsl:choose> Ideally, these descriptions will help clarify some of the more complex aspects of this stylesheet. The only way to really learn how this all works is to experiment, changing parts of the XSLT stylesheet and then viewing the results in a web browser. You should also make use of a command-line XSLT processor and view the results in a text editor. This is important because browsers may skip over tags they do not understand, so you might not see mistakes until you view the source. Chapter 4. Java-Based Web Technologies In a perfect world, a single web development technology would be inexpensive, easy to maintain, offer rapid response time, and be highly scalable. It would also be portable to any operating system or hardware platform and would adapt well to future requirement changes. It would
Slide 93: support access from wireless devices, standalone client applications, and web browsers, all with minimal changes to code. No perfect solution exists, nor is one likely to exist anytime soon. If it did, many of us would be out of work. A big part of software engineering is recognizing that tradeoffs are inevitable and knowing when to sacrifice one set of goals in order to deliver the maximum value to your customer or business. For example, far too many programmers focus on raw performance metrics without any consideration for ease of development or maintainability by nonexperts. These decisions are hard and are often subjective, based on individual experience and preferences. The goal of this chapter is to look at the highlights of several popular technologies for web application development using Java and see how each measures up to an XSLT-based approach. The focus is on architecture, which implies a high-level viewpoint without emphasis on specific implementation details. Although XSLT offers a good balance between performance, maintainability, and flexibility, it is not the right solution for all applications. It is hoped that the comparisons made here will help you decide if XSLT is the right choice for your web applications. 4.1 Traditional Approaches Before delving into more sophisticated options, let's step back and look at a few basic approaches to web development using Java. For small web applications or moderately dynamic web sites, these approaches may be sufficient. As you might suspect, however, none of these approaches hold up as well as XML and XSLT when your sites get more complex. 4.1.1 CGI Common Gateway Interface (CGI) is a protocol for interfacing external applications, which can be written in just about any language, with web servers. The most common language choices for CGI are C and Perl. This interface is accomplished in a number of ways, depending on the type of request. For example, parameters associated with an HTTP GET request are passed to the CGI script via the QUERY_STRING environment variable. HTTP POST data, on the other hand, is piped to the standard input stream of the CGI script. CGI always sends results back to the web server via its standard output. Ordinary CGI programs are invoked from the web server as external programs, which is the most notable difference when compared with servlets. With each request from the browser, the web server spawns a new process to run the CGI program. Aside from the obvious performance penalty, this also makes it difficult to maintain state information between requests. A web-based shopping cart is a perfect example of state information that must be preserved between requests. Figure 4-1 illustrates the CGI process. Figure 4-1. CGI process
Slide 94: FastCGI is an alternative to CGI with two notable differences. First, FastCGI processes do not exit with each request/response cycle. Second, the environment variable and pipe I/O mechanism of CGI has been eschewed in favor of TCP connections, allowing FastCGI programs to be distributed to different servers. The net result is that FastCGI eliminates the most vexing problems of CGI while making it easy to salvage existing CGI programs. Although technically possible, using Java for CGI programming is not generally a good idea. In fact, it is an awful idea! The Java Virtual Machine (JVM) would have to be launched with each and every request, which would be painfully slow. Any Java programmer knows that application startup time has never been one of the strengths of Java. Servlets had to address this issue first. What was needed was a new approach in which the JVM was loaded a single time and left running even when no requests came in. The term servlet engine referred to the JVM that hosted the servlets, often serving a dual role as an HTTP web server. 4.1.2 Servlets as CGI Replacements Sun's Java servlet API was originally released way back in 1997 when Java was mostly a clientside development language. Servlets were originally marketed and used as replacements for CGI programs. Developers were quick to adopt servlets because of their advantages over CGI. Since the servlet engine can run for as long as the web server runs, servlets can be loaded into memory once and kept around for subsequent requests. This is easy to accomplish in Java because servlets are really nothing more than Java classes. The JVM simply loads the servlet objects into memory, hanging on to the references for as long as the web application runs. The persistent nature of servlets results in two additional benefits, both of which push servlets well beyond the capabilities of basic CGI. First, state information can be preserved in memory for long periods of time. Even though the browser loses its connection to the web server after each request/response cycle, servlets can store objects in memory until the browser reconnects for the next page. Secondly, since Java has built-in threading capability, it is possible for numerous clients to share the same servlet instance. Creating additional threads is far more efficient than spawning additional external processes, making servlets very good performers. Early versions of the Java servlet API did not specify the mechanism for deployment (i.e., installation) onto servers. Although the servlet API was consistent, deployment onto different servlet engines was completely vendor specific. With Version 2.2 of the servlet API, however, proprietary servlet engines were dropped in favor of a generic servlet container specification. The idea of a container is to formalize the relationship between a servlet and the environment in which it resides. This made it possible to deploy the same servlet on any vendor's container without any changes. Along with the servlet container came the concept of a web application. A web application consists of a collection of servlets, static web pages, images, or any other resources that may be needed. The standard unit of deployment for web applications is the Web Application Archive (WAR) file, which is actually just a Java Application Archive (JAR) file that uses a standard directory structure and has a .war file extension. In fact, you use the jar command to create WAR files. Along with the WAR file comes a deployment descriptor, which is an XML configuration file that specifies all configuration aspects of a web application. The important details of WAR files and deployment descriptors will be outlined in Chapter 6.
Slide 95: Servlets are simple to implement, portable, can be deployed to any servlet container in a consistent way, and offer high performance. Because of these advantages, servlets are the underlying technology for every other approach discussed in this chapter. When used in isolation, however, servlets do have limitations. These limitations manifest themselves as web applications grow increasingly complex and web pages become more sophisticated. The screen shot shown in Figure 4-2 shows a simple web page that lists television shows for the current day. In this first implementation, a servlet is used. It will be followed with a JavaServer Pages (JSP) implementation presented later in this chapter. Figure 4-2. ScheduleServlet output The Schedule Java class has a method called getTodaysShows( ), that returns an array of Show objects. The array is already sorted, which reduces the amount of work that the servlet has to do to generate this page. The Schedule and Show classes are used for all of the remaining examples in this chapter. Ideally, this will help demonstrate that no matter which approach you take, keeping business logic and database access code out of the servlet makes it easier to move to new technologies without rewriting all of your code. The code for ScheduleServlet.java is shown in Example 4-1. This is typical of a first-generation servlet, generating its output using a series of println( ) statements. Example 4-1. ScheduleServlet.java package chap4; import import import import java.io.*; java.text.SimpleDateFormat; javax.servlet.*; javax.servlet.http.*; public class ScheduleServlet extends HttpServlet {
Slide 96: public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { SimpleDateFormat dateFmt = new SimpleDateFormat("hh:mm a"); Show[] shows = Schedule.getInstance().getTodaysShows( response.setContentType("text/html"); PrintWriter pw = response.getWriter( ); pw.println("<html><head><title>Tod ay's Shows</title></head><body>"); pw.println("<h1>Today's Shows</h1>"); pw.println("<table border=\"1\" cellpadding=\"3\""); pw.println(" cellspacing=\"0\">"); pw.println("<tr><th>Channel</th><th>From</th>"); pw.println("<th>To</th><th>Title</th></tr>"); for (int i=0; i<shows.length; i++) { pw.println("<tr>"); pw.print("<td>"); pw.print(shows[i].getChannel( )); pw.println("</td>"); pw.print("<td>"); pw.print(dateFmt.format(shows[i].getStartTime( ))); pw.println("</td>"); pw.print("<td>"); pw.print(dateFmt.format(shows[i].getEndTime( ))); pw.println("</td>"); pw.print("<td>"); pw.print(shows[i].getTitle( )); pw.println("</td>"); pw.println("</tr>"); } pw.println("</table>"); pw.println("</body>"); pw.println("</html>"); } } If you are interested in the details of servlet coding, be sure to read Chapter 6. For now, focus on how the HTML is generated. All of those println( ) statements look innocuous enough in this short example, but a "real" web page will have thousands of println( ) statements, resulting in code that is quite difficult to maintain over the years. Generally, you will want to factor that code out into a series of methods or objects that generate fragments of the HTML. However, this approach is still tedious and error prone. The main problems are development scalability and future maintainability. The code becomes increasingly difficult to write as your pages get more complex, and it becomes very difficult to make changes to the HTML when new requirements arrive. Web content authors and graphic designers are all but locked out of the process since it takes a programmer to create and modify the code. Each minor change requires your programming staff to recompile, test, and deploy to the servlet container. Beyond the tedious nature of HTML generation, first-generation servlets tend to do too much. It is not clear where error handling, form processing, business logic, and HTML generation are );
Slide 97: supposed to reside. Although we are able to leverage two helper classes to generate the list of shows, a more rigorous approach will be required for complex web applications. All of the remaining technologies presented in this chapter are designed to address one or more of these issues, which become increasingly important as web applications get more sophisticated. 4.1.3 JSP You have no doubt heard about JSP. This is a hot area in web development right now with some pretty hefty claims about productivity improvements. The argument is simple: instead of embedding HTML code into Java servlets, which requires a Java programmer, why not start out with static HTML? Then add special tags to this HTML that are dynamically expanded by the JSP engine, thus producing a dynamic web page. Example 4-2 contains a very simple example of JSP that produces exactly the same output as ScheduleServlet. Example 4-2. schedule.jsp <%@ page import="chap4.*,java.text.*" %> <%! SimpleDateFormat dateFmt = new SimpleDateFormat("hh:mm a"); %> <html> <head> <title>Today's Shows</title> </head> <body> <h1>Today's Shows</h1> <% Show[] shows = Schedule.getInstance().getTodaysShows( ); %> <table border="1" cellpadding="3" cellspacing="0"> <tr><th>Channel</th><th>From</th><th>To</th><th>Title</th></tr> <% for (int i=0; i<shows.length; i++) { %> <tr> <td><%= shows[i].getChannel( ) %></td> <td><%= dateFmt.format(shows[i].getStartTime( )) %> </td> <td><%= dateFmt.format(shows[i].getEndTime( )) %></td> <td><%= shows[i].getTitle( ) %></td> </tr> <% } %> </table> </body> </html> As schedule.jsp shows, most of the JSP is static HTML with dynamic content sprinkled in here and there using special JSP tags. When a client first requests a JSP, the entire page is translated into source code for a servlet. This generated servlet code is then compiled and loaded into memory for use by subsequent requests. During the translation process, JSP tags are replaced with dynamic content, so the end user only sees the HTML output as if the entire page was static. Runtime performance of JSP is comparable to hand-coded servlets because the static content in the JSP is generally replaced with a series of println( ) statements in the generated servlet code. The only major performance hit occurs for the first person to visit the JSP, because it will have to be translated and compiled. Most JSP containers provide options to precompile the JSP, so even this hit can be avoided. Debugging in JSP can be somewhat challenging. Since JSP pages are machine translated into Java classes, method signatures and class names are not always intuitive. When a programming error occurs, you are often faced with ugly stack traces that show up directly in the browser. You do have the option of specifying an error page to be displayed whenever an unexpected condition occurs. This gives the end user a more friendly error message, but does little to help you diagnose the problem.
Slide 98: Here is a portion of what Apache's Tomcat shows in the web browser when the closing curly brace (}) is accidentally omitted from the loop shown in the JSP example: A Servlet Exception Has Occurred org.apache.jasper.JasperException: Unable to compile class for JSP..\work\localhost\chap4\_0002fschedule_0002ejspschedule_jsp_2.java:10 4: 'catch' without 'try'. } catch (Throwable t) { ^ ..\work\localhost\chap4\_0002fschedule_0002ejspschedule_jsp_2.java:112: 'try' without 'catch' or 'finally'. } ^ ..\work\localhost\chap4\_0002fschedule_0002ejspschedule_jsp_2.java:112: '}' expected. } ^ 3 errors at org.apache.jasper.compiler.Compiler.compile(Compiler.java:294) at org.apache.jasper.servlet.JspServlet.doLoadJSP(JspServlet.java:478) ...remainder of stack trace omitted The remainder of the stack trace is not very helpful because it simply lists methods that are internal to Tomcat. _0002fschedule_0002ejspschedule_jsp_2 is the name of the Java servlet class that was generated. The line numbers refer to positions in this generated code, rather than in the JSP itself. Embedding HTML directly into servlets is not appealing because it requires a programmer to maintain. With JSP, you often embed Java code into HTML. Although the embedding is reversed, you still have not cleanly separated HTML generation and programming logic. Think about the problems you encounter when the validation logic in a JSP goes beyond a simple one-page example. Do you really want hundreds of lines of Java code sprinkled throughout your HTML, surrounded by those pretty <% %> tags? Unfortunately, far too many JSP pages have a substantial amount of Java code embedded directly in the HTML. The first few iterations of JSP did not offer bulletproof approaches for separating Java code from the HTML. Although JavaBeans tags were offered in an attempt to remove some Java code, the level of sophistication was quite limited. These tags allow JSPs to interact with helper classes written according to Sun's JavaBeans API (http://java.sun.com/products/javabeans). Recent trends in the JSP specification have made substantial improvements. The big push right now is for custom tags,[1] which finally allow you to remove the Java code from your pages. A web page with custom tags may look like Example 4-3. [1] Technically, programmers create custom actions, which are invoked u sing custom JSP tags. Example 4-3. JSP with custom tags <%@ taglib uri="/my_taglib" prefix="abc" %> <html> <head> <title>JSP Tag Library Demonstration</title> </head> <body> <abc:standardHeader/> <abc:companyLogo/>
Slide 99: <h1>Recent Announcements</h1> <abc:announcements filter="recent"/> <h1>Job Openings</h1> <abc:jobOpenings department="hr"/> <abc:standardFooter/> </body> </html> As you can see, custom tags look like normal XML tags with a namespace prefix . Namespace prefixes are used to give XML tags unique names. Because you select the prefix for each tag library, you can use libraries from many different vendors without fear of naming conflicts. These tags are mapped to Java classes called tag handlers that are responsible for the actual work. In fact, the JSP specification does not limit the underlying implementation to Java, so other languages can be used if the JSP container supports it. Using the custom tag approach, programmers in your company can produce a set of approved tags for creating corporate logos, search boxes, navigation bars, and page footers. Nonprogrammers can focus on HTML layout, oblivious to the underlying tag handler code. The main drawback to this approach is the current lack of standard tags. Although several open source projects are underway to develop custom tag libraries, it is unlikely that you will be able to find an existing custom tag for every requirement. One persistent problem with a pure JSP approach is that of complex validation. Although JSP with custom tags can be an ideal approach for displaying pages, the approach falls apart when a JSP is used to validate the input from a complex HTML form. In this situation, it is almost inevitable that Java code -- perhaps a lot of it -- will creep into the page. This is where a hybrid approach (JSP and servlets), which will be covered in the next section, is desirable. Compared with an XML/XSLT approach, JSP requires a lot more effort to cleanly separate presentation from the underlying data and programming logic. For web sites that are mostly static, JSP can be easy for nonprogrammers to create, since they work directly in HTML. When dynamic content becomes more prevalent, your options are to embed lots of Java code into the JSP, create custom tags, or perhaps write Java beans that output fragments of HTML. Embedding code into the JSP is not desirable because of the ugly syntax and maintenance difficulties. The other approaches do hide code from the JSP author, but some part of your web application (to be consistent) is still cranking out HTML from Java code, either in custom tags or JavaBeans components. This still raises serious questions about the ability to make quick changes to your HTML without recompiling and deploying your Java code. Another weakness of JSPs in comparison with XML and XSLT becomes obvious when you try to test your web application. With JSP, it is virtually impossible to test your code outside the bounds of a web browser and servlet container. In order to write a simple automated unit test against a JSP, you have to start a web server and invoke your JSPs via HTTP requests. With XML and XSLT, on the other hand, you can programmatically generate the XML data without a web browser or server. This XML can then be validated against a DTD or schema. You can also test the XSLT stylesheets using command-line tools without deploying to a servlet container or starting a web server. The result of the transformation can even be validated again with a DTD if you use XHTML instead of HTML. 4.1.4 Template Engines Before moving on, let's discuss template engines. A quick search on the Internet reveals that template engines are abundant, each claiming to be better than JSP for various reasons. For the most part, template engines have a lot in common with JSP, particularly if you restrict yourself to custom tags. There are some differences, however: • Template engines typically forbid you from embedding Java code into pages. Although JSP allows Java code along with HTML, it is not considered good form.
Slide 100: • Most template engines are not compiled, so they do not have the same problems that JSP has with error messages. They also start up faster on the first invocation, which can make development easier. The effect on end users is minimal. From a deployment perspective, you do not need a Java compiler on the web server as you do with JSP. Template engines come with an existing library of tags or simple scripting languages. JSP does not provide any standard tags, although numerous libraries are available from other vendors and open source projects. The JSP API is open, so you can create your own custom tags with a fair amount of effort. Template engines have their own unique mechanisms for integrating with underlying Java code. JSP has the backing of Sun and is pretty much available out of the box on any servlet container. The main benefit of a "standard" is the wide availability of documentation, knowledgeable people, and examples. There are many implementations of JSP to choose from. • • 4.1.5 The Hybrid Approach Since JSP now has custom tags, you can remove (hide, actually) all of the Java code when "rendering," or generating a page to send to the browser. When a complex HTML form is posted to the JSP, however, you still have problems. You must verify that all fields are present, verify that the data is within bounds, and clean up the data by checking for null values and trimming all strings. Validation is not particularly difficult, but it can be tedious and requires a lot of custom code. You do not want to embed that code directly into a JSP because of the debugging and maintenance issues. The solution is a hybrid approach, in which a servlet works in conjunction with a JSP. The servlet API has a nice class called RequestDispatcher that allows server-side forwarding and including. This is the normal mechanism for interaction between the servlet and JSP. Figure 4-3 illustrates this design at a high level. Figure 4-3. Hybrid JSP/servlet approach This approach combines the best features of servlets with the best features of JSPs. The arrows indicate the flow of control whenever the browser issues a request. The job of the servlet is to intercept the request, validate that the form data is correct, and delegate control to an appropriate JSP. Delegation occurs via javax.servlet.RequestDispatcher, which is a standard part of the servlet API. The JSP simply renders the page, ideally using custom tags and no Java code mixed with the HTML. The main issue with this approach becomes evident when your web site begins to grow beyond a few pages. You must make a decision between one large servlet that intercepts all requests, a
Slide 101: separate servlet per page, or helper classes responsible for processing individual pages. This is not a difficult technological challenge, but rather a problem of organization and consistency. This is where web frameworks can lend a helping hand. 4.2 The Universal Design Despite the proliferation of APIs, frameworks, and template engines, most web application approaches seem to be consolidating around the idea of model-view-controller (MVC). Clean separation between data, presentation, and programming logic is a key goal of this design. Most web frameworks implement this pattern, and the hybrid approach of JSP and servlets follows it. XSLT implementations also use this pattern, which leads to the conclusion that model-viewcontroller is truly a universal approach to development on the web tier. 4.2.1 Web Frameworks A framework is a value-added class library that makes it easier to develop certain types of applications. For example, an imaging framework may contain APIs for reading, writing, and displaying several image formats. This makes it much easier to build applications because someone else already figured out how to structure your application. Servlet frameworks are no different. Now that servlets, JSP, and hybrid approaches have been available for a few years, common architectural patterns are emerging as "best practices." These include separation of Java code and HTML generation, using servlets in conjunction with JSP, and other variations. Once basic patterns and themes are understood, it becomes desirable to write common frameworks that automate the mundane tasks of building web applications. The most important tradeoff you make when selecting a framework is vendor lock-in versus open standards. At this time, there are no open standards for frameworks. Although there are numerous open source frameworks, none is backed by a standards organization or even Sun's Java Community Process. The low-level servlet and JSP APIs are very well defined and widely implemented Java standard extensions. But a framework can offer much more sophisticated features such as enhanced error checking, database connection pooling, custom tag libraries, and other value-added features. As you add more framework-specific features, however, your flexibility to choose another framework or vendor quickly diminishes. One typical framework is Turbine, which is one of many different frameworks supported by Apache. Turbine is a large framework with many value-added features including: • • • • • Database connection pooling, integration with object to relational mapping tools, and relational database abstractions Integration with numerous template engines Role-based security and access control lists Web browser detection Integration with JavaMail This is only a short list of Turbine's features. At its core, however, the compelling reason to use a framework like Turbine is the underlying object model. The fundamental approach of Turbine is to cleanly separate validation logic, the servlet itself, and page rendering into distinctly different modules. In fact, Turbine uses a single servlet, so your validation and rendering logic have to go elsewhere. The approach is to define helper classes called actions, which are responsible for validation of incoming requests. Once an action has validated the inbound request, other classes such as Layout, Page, and Navigation are responsible for rendering a view back to the browser.
Slide 102: When compared to a pure XML/XSLT approach, frameworks have the advantage of value-added features. If you remove all of the non-web features, such as database connection pooling and object-to-relational mapping tools, you will see that the underlying model-view-controller architecture is very easy to implement. You should be wary of any framework that provides too much non-web-related functionality because many of these features should be placed on the application server instead of the web server anyway. The remainder of this chapter is devoted to showing you how to structure a complex web application without committing yourself to a specific framework. 4.2.2 Model-View-Controller Cleanly separating data and presentation logic is important. What exactly are the benefits? First and foremost, when data is completely isolated from the user interface, changes can be made to the visual appearance of an application without affecting the underlying data. This is particularly important in web applications that have to support multiple incompatible browsers or even WML, XHTML Basic, or HTML. It is much harder to adapt to new user interface requirements when data and presentation are mixed. Programming logic should also be separated from data and presentation logic. To a certain extent, programming logic must depend in part on both data and presentation. But you can generally isolate business logic, which depends on the data, and presentation logic, which depends on the user interface. Figure 4-4 illustrates these dependencies. Figure 4-4. Dependencies The arrows indicate dependencies. For example, if your underlying data changes, then the business logic will probably have to change. However, that does not always flow up and break your presentation logic. In general, if changes are sweeping, it is hard to avoid affecting upper layers, but minor changes can almost always be encapsulated. If the implementation of your business logic changes, however, there is no reason to change the underlying data. Likewise, you should be able to make changes to the presentation logic without breaking the business logic. Later in this chapter, we will see how Java, XML, and XSLT can be utilized to satisfy these dependencies. The dominant pattern in scalable web sites is model-view-controller. The MVC pattern originated with Smalltalk-80 as a way to develop graphical user interfaces in an object-oriented way. The basics are simple. GUI components represent the view and are responsible for displaying visual information to the user. The model contains application data. The controller is responsible for coordinating between the model and the view. It intercepts events from the view components, queries the model for its current state, makes modifications to the model, and notifies the view of changes to the model. Figure 4-5 illustrates the interaction between these three components.
Slide 103: Figure 4-5. Model-view-controller As shown, the Model, View, and Controller are either abstract classes or interfaces. The concrete classes are application-specific, and the open arrows indicate the direction of association between the various classes. For example, the abstract Model sends notifications only to the abstract View, but ConcreteView knows about its ConcreteModel. This makes sense when you consider how hard it would be to create a specific view, such as a customer editor panel, without knowledge of a specific data model like Customer. Since the Model only knows about View instances in an abstract way, however, it can send generic notifications when it changes, allowing new views to be attached later. It is important to remember that this is just a pattern; specific implementations may vary somewhat and use different class names. One variation is to eliminate the explicit references from ConcreteView to ConcreteModel and from Model to View. In this approach, the Controller would take a more prevalent role. A common theme in Java is to remove the explicit controller using data models and view components that send notifications to event listeners. Although typically thought of in terms of GUI applications, the MVC architecture is not limited to this domain. For web applications, it is commonly used in: • • • The hybrid servlet + JSP approach Most servlet frameworks The XSLT approach In the hybrid approach, the servlet is the controller and the JSP is the view. It is assumed that the data will be retrieved from a database or Enterprise JavaBeans (EJB) components, which act as the model. A good framework may make the distinction between model, view, and controller more explicit. Instead of using the servlet as a controller, a common pattern is to use a single servlet that delegates work to helper classes that act as controllers. Each of these classes is equivalent to ConcreteController in Figure 4-5 and has knowledge of specific web pages and data. Although originally intended for Smalltalk GUIs, MVC has always been one of the most frequently used patterns in all sorts of GUIs, from Motif to Java. On the web, MVC is also prevalent, although a few mechanics are slightly different. In a web environment, we are restricted to the HTTP protocol, which is stateless . With each click of a hyperlink, the browser must establish a new connection to the web server. Once the page has been delivered, the connection is broken. It is impossible for the server to initiate a conversation with the client, so the server merely waits until the next request arrives. Implementing MVC in this stateless architecture results in looser coupling between the controller and the view. In a GUI environment, the controller immediately notifies the view of any changes to the underlying model. In a web environment, the controller must maintain state information as it waits for the browser to make another request. As each browser request arrives, it is the controller's job to validate the request and forward commands on to the model. The controller then sends the results back to the view.
Slide 104: This may all sound academic and vague at this point. The next few sections will present much more detailed diagrams that show exactly how MVC is implemented for an XSLT-driven web site. 4.2.3 XSLT Implementation All of the approaches presented up to this point are, of course, building up to the XSLT approach. In many respects, the XSLT approach is simultaneously the most powerful and the easiest to understand. For a single web page, the XSLT approach is probably harder than a servlet or JSP to configure. Configuration of the XML parser and XSLT processor can be quite difficult, mostly due to CLASSPATH issues.[2] But as the complexity of a web application increases, the benefits of using XSLT become obvious. Figuring out how to tackle these complex web applications is the real goal of this chapter. [2] This can be a frustrating experience when the servlet container comes with an older XML parser that uses DOM or SAX Version 1. Most XSLT processors require Version 2 parsers. The XSLT approach maps fairly directly to the MVC pattern. The XML represents the model, the servlet represents the controller, and the XSLT produces HTML, which represents the view. The XSLT stylesheets may contain a minimal amount of logic, potentially blurring the line between view and controller. Figure 4-6 represents a conceptual view of how the XSLT approach maps to MVC. Figure 4-6. XSLT conceptual model One weakness common to every approach other than XSLT is the HTML-centric viewpoint. In every example presented thus far, it was assumed that we generated HTML. What happens when the requirement to support cellular phones arises? It is very likely that these devices will not use HTML. Instead, they will use WML, XHTML Basic, or some other technology that has not been invented yet. For now, consider that you would have to write brand new servlets or JSPs to support these devices when using traditional approaches. Any programming logic embedded into JSP pages would be duplicated or would have to be factored out into common helper classes. In a pure servlet approach, the hardcoded HTML generation logic would have to be completely rewritten. XSLT offers an elegant solution -- simply create a second stylesheet. Instead of transforming XML into HTML, this new stylesheet transforms XML into WML. You can even support different web browsers with the XSLT approach. Again, just write different stylesheets for browser-specific functions. Since XSLT stylesheets can import and include functionality from other stylesheets, much of the code can be shared and reused across a project. Regardless of what your XSLT will produce, start by producing the XML. For the schedule web application, the XML is dynamic and must be programmatically generated. JDOM code is shown in Example 4-4, which produces the XML necessary to create the schedule web page. Example 4-4. ScheduleJDOM.java package chap4;
Slide 105: import java.text.SimpleDateFormat; import org.jdom.*; import org.jdom.output.*; /** * Produces a JDOM Document for a tv schedule. */ public class ScheduleJDOM { private SimpleDateFormat dateFmt = new SimpleDateFormat("hh:mm a"); /** * Simple main( ) method for printing the XML document to System.out, * useful for testing. */ public static void main(String[] args) throws Exception { Document doc = new ScheduleJDOM().getTodaysShows( ); new XMLOutputter(" ", true, "UTF-8").output(doc, System.out); } /** * @return a new JDOM Document for all TV show s scheduled for today. */ public Document getTodaysShows( ) { Schedule sched = Schedule.getInstance( ); Show[] shows = sched.getTodaysShows( ); Element rootElem = new Element("schedule"); for (int i=0; i<shows.length; i++) { rootElem.addContent(createShowElement(shows[i])); } return new Document(rootElem); } /** * A helper method to convert a Show object into a JDOM Element. */ public Element createShowElement(Show show) { Element e = new Element("show"); e.addContent(new Element("channel").setText( Integer.toString(show.getChannel( )))); e.addContent(new Element("from").setText( this.dateFmt.format(show.ge tStartTime( )))); e.addContent(new Element("to").setText( this.dateFmt.format(show.getEndTime( )))); e.addContent(new Element("title").setText(show.getTitle( ))); return e; } } You might be wondering why this JDOM code is that much better than the servlet code, which also used Java to programmatically produce output. The difference is fundamental and important. With this JDOM example, println( ) statements are not used. Instead, a data structure representing the television schedule is created. By virtue of the JDOM API, the data structure is
Slide 106: guaranteed to produce well-formed XML. We could very easily add a DTD, writing a unit test that validates the integrity of the generated data structure. In addition to ensuring the integrity of the data, the JDOM code will typically be much smaller than the servlet or JSP code. In this basic web page, the servlet and JSP were quite small because the HTML did not contain any significant formatting or layout. In a real-world web page, however, the servlet and JSP will continue to grow in complexity as the HTML layout gets more sophisticated, while the JDOM code remains exactly the same. Although the XSLT stylesheet will get larger as the HTML gets more complex, this is arguably less of a problem because the presentation logic is completely separated from the underlying XML data. Once fully tested, the XSLT can be deployed to the web server without recompiling the Java code or restarting the servlet. The XML data produced by JDOM is shown in Example 4-5. Example 4-5. XML for schedule web page <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="schedule.xslt"?> <schedule> <show> <channel>2</channel> <from>06:00 AM</from> <to>06:30 AM</to> <title>Baseball</title> </show> <show> <channel>3</channel> <from>06:00 AM</from> <to>08:00 AM</to> <title>Stand up Comedy</title> </show> ...remaining XML omitted </schedule> The stylesheet that produces the exact same output as the JSP and servlet is listed in Example 4-6. Example 4-6. schedule.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <!-- ========== Produce the HTML Document ========== --> <xsl:template match="/"> <html> <head><title>Today's Shows</title></head> <body> <h1>Today's Shows</h1> <table cellpadding="3" border="1" cellspacing="0"> <tr> <th>Channel</th> <th>From</th> <th>To</th> <th>Title</th> </tr>
Slide 107: <!-- ===== select the shows ===== --> <xsl:apply-templates select="schedule/show"/> </table> </body> </html> </xsl:template> <!-- ======== Display each show as a row in the tabl e ======== --> <xsl:template match="show"> <tr> <td><xsl:value-of select="channel"/></td> <td><xsl:value-of select="from"/></td> <td><xsl:value-of select="to"/></td> <td><xsl:value-of select="title"/></td> </tr> </xsl:template> </xsl:stylesheet> The remaining piece of the puzzle is to write a servlet that combines all of these pieces and delivers the result of the XSLT transformation to the client (see Chapter 6). In a nutshell, the servlet acts as a controller between the various components, doing very little of the actual work. The client request is intercepted by the servlet, which tells ScheduleJDOM to produce the XML data. This XML is then fed into an XSLT processor such as Xalan, along with schedule.xslt. Finally, the output is sent to the browser as HTML, XHTML, WML, or some other format. Another interesting option made possible by this architecture is allowing the client to request raw XML without any kind of XSLT transformation. This allows your web site to support nonbrowser clients that wish to extract meaningful business data in a portable way. We examined the weaknesses of other approaches, so it is only fair to take a critical look at the XSLT approach. First, XSLT is a new language that developers or web content authors have to learn. Although the syntax is strange, it can be argued that XSLT is easier to learn than a sophisticated programming language like Java. There is resistance on this front, however, which is typical of a new technology that is unfamiliar. The second potential weakness of the XSLT approach is runtime performance. There is a performance penalty associated with XSLT transformation. Fortunately, there are numerous optimizations that can be applied. The most common involves the caching of stylesheets so they do not have to be parsed with each request. This and other techniques for optimization will be covered in later chapters. Since XSLT stylesheets are actually XML documents, any available XML editor will work for XSLT. But eventually we should see more and more specialized XSLT editors that hide some of the implementation details for nonprogrammers. As with first-generation Java GUI builders, these early tools may not generate stylesheets as cleanly as a handcoded effort. 4.2.4 Development and Maintenance Benefits of XSLT As mentioned earlier, testing JSPs can be difficult. Since they can be executed only within a JSP container, automated unit tests must start a web server and invoke the JSP via HTTP requests in order to test their output. The XSLT-based web approach does not suffer from this problem.
Slide 108: Referring back to Figure 4-6, you can see that the data model in an XSLT web application is represented as XML. This XML is generated independently of the servlet container, so a unit test can simply create the XML and validate it against a DTD or XML Schema. Tools such as XML Spy make it easy to create XSLT stylesheets and test them interactively against sample XML files long before they are ever deployed to a servlet container. XML Spy is available from http://www.xmlspy.com. If you are looking for alternatives, a directory of XML tools can be found at http://www.xmlsoftware.com. The XSLT processor is another piece of the puzzle that is not tied to the servlet in any way. Because the processor is an independent component, additional unit tests can perform transformations by applying the XSLT stylesheets to the XML data, again without any interference from a web server or servlet container. If your stylesheets produce XHTML instead of HTML, the output can be easily validated against one of the W3C DTDs for XHTML. JUnit, an open source unit-testing tool, can be used for all of these tests. It can be downloaded from http://www.junit.org. 4.3 XSLT and EJB Now that the options for web tier development have been examined, let's look at how the web tier interacts with other tiers in large enterprise class systems. A typical EJB architecture involves a thin browser client, a servlet-driven web tier, and EJB on an application server tier. Figure 4-7 expands upon the conceptual XSLT model presented earlier. Figure 4-7. XSLT and EJB architecture This diagram is much closer to the true physical model of a multitier web application that uses XSLT. The arrows indicate the overall flow of a single request, originating with the client. This client is typically a web browser, but it could be a cell phone or some other device. The client request goes to a single servlet and is handed off to something called RequestHandler. In the pattern outlined here, you create numerous subclasses of RequestHandler. Each subclass is responsible for validation and presentation logic for a small set of related functions. One manageable strategy is to design one subclass of RequestHandler for each web page in the application. Another approach is to create fine-grained request handlers that handle one specific task, which can be beneficial if the same piece of functionality is invoked from many different screens in your application. The request handler interacts with the application server via EJB components. The normal pattern is to execute commands on session beans, which in turn get their data from entity beans. The internal behavior of the EJB layer is irrelevant to the web tier, however. Once the EJB
Slide 109: method call is complete, one or more "data objects" are returned to the web tier. From this point, the data object must be converted to XML. The conversion to XML can be handled in a few different ways. One common approach is to write methods in the data objects themselves that know how to generate a fragment of XML, or perhaps an entire document. Another approach is to write an XML adapter class for each data object. Instead of embedding the XML generation code into the data object, the adapter class generates the XML. This approach has the advantage of keeping the data objects lightweight and clean, but it does result in additional classes to write. In either approach, it is preferable to return XML as a DOM or JDOM tree, rather than raw XML text. If the XML is returned as raw text, it will have to be parsed right back into memory by the XSLT processor. Returning the XML as a data structure allows the tree to be passed directly to the XSLT processor without the additional parsing step. Yet another approach is to return XML directly from the EJB components, thus eliminating the intermediate data objects. Chapter 9 will examine this in detail, primarily from a performance perspective. The main drawback to consider is that XML tends to be very verbose. Sending largetext XML files from the application server to the web server may be less efficient than sending serialized Java objects. You could compress the data, but that would add processor overhead for compression and decompression. Regardless of how the XML is generated, the final step shown in Figure 4-7 is to pass the XML and stylesheet to the XSLT processor for transformation. The result tree is sent directly to the client, thus fulfilling the request. If the client is a browser, the XSLT stylesheet will probably transform the XML into HTML or XHTML. For a nonbrowser client, however, it is conceivable that the XML data is delivered directly without any XSLT transformation. 4.3.1 Tradeoffs Scalability is a key motivation for a multitier EJB architecture. In such an architecture, each tier can execute on a different machine. Additional performance gains are possible when multiple servers are clustered on each tier. Another motivating factor is reliability. If one machine fails, a redundant machine can continue processing. When updates are made, new versions of software can be deployed to one machine at a time, preventing long outages. Security is improved by strictly regulating access to the data tier via EJB components. Yet another motivation for a distributed system is simplicity, although a basic EJB application is far more complex than a simple two-tier application. Yes, distributed systems are complex, but for highly complex applications this approach simplifies your work by dividing independent tasks across tiers. One group of programmers can work on the EJB components, while another works on the request handler classes on the web tier. Yet another group of designers can work on XML and XSLT, while your database expert focuses on the database. For simple applications, a multitier EJB approach is overkill and will likely harm performance. If your web site serves only a few hundred visitors per day, then eliminating EJB could be much faster because there is no additional application tier to hop through.[3] [3] Keep in mind that other benefits of EJB, such as security, will be lost. 4.4 Summary of Key Approaches If separation of HTML from Java code is a goal, then neither a pure servlet nor a pure JSP approach is desirable. Although a hybrid approach does allow a clean separation, you may have to create custom JSP tags to take full advantage of this capability. This approach does not support WML output unless you duplicate all of the HTML generation code. Even though the custom JSP tags hide the Java code from the page author, you still end up with Java code somewhere producing HTML programmatically.
Slide 110: Web frameworks typically build on the hybrid approach, including proprietary value-added features and conveniences. Frameworks have the advantage of defining a consistent way to structure the overall application, which is probably more important in terms of software maintenance than any value-added features. The primary disadvantage of frameworks is that you could be locked into a particular approach and vendor. The XSLT approach achieves the maximum attainable separation of presentation from underlying data. It also supports multiple browsers and even WML targets. XSLT transformation does incur additional processing load on the web tier. This must be carefully weighed against benefits gained from the modular, clean design that XSLT offers. Table 4-1 summarizes the strengths and weaknesses of different approaches to Web application development. Table 4-1. Different web technologies Technology Strengths Weaknesses Changes to HTML require Java code changes. Hard to maintain complex pages. No separation of data, logic, and presentation. Does not enforce separation of Java code and HTML. Not good for validation of incoming requests. Requires deployment to web server for development and testing. Still requires deployment to web server for testing and development. Does not force programmers to keep code out of JSPs. Cannot target multiple client device types as effectively as XSLT. Pure servlet Fastest runtime performance. Pure JSP Best for pages that are mostly display-only, static HTML with small amounts of dynamic content. Fast runtime performance. Hybrid servlet/JSP Allows greater separation between Java code and HTML than "pure" servlet or JSP approaches. More modular design is easier to maintain for large projects. Fast runtime performance. Maximum separation between data, programming logic, and presentation. XML and XSLT can be developed and tested outside of the web server. Maximum modularity improves maintainability. Easy to target multiple client devices and languages via different XSLT stylesheets. XSLT Slowest runtime performance.[4] For pages that are mostly static HTML, XSLT might be harder to write than JSP. Requires an extra step to generate XML. [4] Once more browsers support XSLT transformation, the server load wil l be greatly reduced. Chapter 5. XSLT Processingwith Java Since many of the XSLT processors are written in Java, they can be directly invoked from a Java application or servlet. Embedding the processor into a Java application is generally a matter of including one or two JAR files on the CLASSPATH and then invoking the appropriate methods. This chapter shows how to do this, along with a whole host of other programming techniques. When invoked from the command line, an XSLT processor such as Xalan expects the location of an XML file and an XSLT stylesheet to be passed as parameters. The two files are then parsed
Slide 111: into memory using an XML parser such as Xerces or Crimson, and the transformation is performed. But when the XSLT processor is invoked programmatically, you are not limited to using static files. Instead, you can send a precompiled stylesheet and a dynamically generated DOM tree directly to the processor, or even fire SAX events as processor input. A major goal is to eliminate the overhead of parsing, which can dramatically improve performance. This chapter is devoted to Java and XSLT programming techniques that work for both standalone applications as well as servlets, with a particular emphasis on Sun's Java API for XML Processing (JAXP) API. In Chapter 6, we will apply these techniques to servlets, taking into account issues such as concurrency, deployment, and performance. 5.1 A Simple Example Let's start with perhaps the simplest program that can be written. For this task, we will write a simple Java program that transforms a static XML data file into HTML using an XSLT stylesheet. The key benefit of beginning with a simple program is that it isolates problems with your development environment, particularly CLASSPATH issues, before you move on to more complex tasks. Two versions of our Java program will be written, one for Xalan and another for SAXON. A JAXP implementation will follow in the next section, showing how the same code can be utilized for many different processors. CLASSPATH Problems CLASSPATH problems are a common culprit when your code is not working, particularly with XML-related APIs. Since so many tools now use XML, it is very likely that a few different DOM and SAX implementations reside on your system. Before trying any of the examples in this chapter, you may want to verify that older parsers are not listed on your CLASSPATH. More subtle problems can occur if an older library resides in the Java 2 optional packages directory. Any JAR file found in the jre/lib/ext directory is automatically available to the JVM without being added to the CLASSPATH. You should look for files such as jaxp.jar and parser.jar, which could contain older, incompatible XML APIs. If you experience problems, remove all JAR files from the optional packages directory. Unfortunately, you will have to do some detective work to figure out where the JAR files came from. Although Java 2 Version 1.3 introduced enhanced JAR features that included versioning information, most of the JAR files you encounter probably will not utilize this capability. 5.1.1 The Design The design of this application is pretty simple. A single class contains a main( ) method that performs the transformation. The application requires two arguments: the XML file name followed by the XSLT file name. The results of the transformation are simply written to System.out. We will use the following XML data for our example: <?xml version="1.0" encoding="UTF-8"?> <message>Yep, it worked!</message>
Slide 112: The following XSLT stylesheet will be used. It's output method is text, and it simply prints out the contents of the <message> element. In this case, the text will be Yep, it worked!. <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" encoding="UTF-8"/> <!-- simply copy the message to the result tree --> <xsl:template match="/"> <xsl:value-of select="message"/> </xsl:template> </xsl:stylesheet> Since the filenames are passed as command-line parameters, the application can be used with other XML and XSLT files. You might want to try this out with one of the president examples from Chapter 2 and 3. 5.1.2 Xalan 1 Implementation The complete code for the Xalan implementation is listed in Example 5-1. As comments in the code indicate, this code was developed and tested using Xalan 1.2.2, which is not the most recent XSLT processor from Apache. Fully qualified Java class names, such as org.apache.xalan.xslt.XSLTProcessor, are used for all Xalan-specific code. A Xalan 2 example is not shown here because Xalan 2 is compatible with Sun's JAXP. The JAXP version of this program works with Xalan 2, as well as any other JAXP compatible processor. Example 5-1. SimpleXalan1.java package chap5; import import import import java.io.*; java.net.MalformedURLException; java.net.URL; org.xml.sax.SAXException; /** * A simple demo of Xalan 1. This code was originally written us ing * Xalan 1.2.2. It will not work with Xalan 2. */ public class SimpleXalan1 { /** * Accept two command line arguments: the name of an XML file, and * the name of an XSLT stylesheet. The result of the transformation * is written to stdout. */ public static void main(String[] args) throws MalformedURLException, SAXException { if (args.length != 2) {
Slide 113: System.err.println("Usage:"); System.err.println(" java " + SimpleXalan1.class.get Name( ) + " xmlFileName xsltFileName"); System.exit(1); } String xmlFileName = args[0]; String xsltFileName = args[1]; String xmlSystemId = new File(xmlFileName).toURL().toExternalForm( ); String xsltSystemId = new File(xsltFileName).toURL().toExternalForm( ); org.apache.xalan.xslt.XSLTProcessor processor = org.apache.xalan.xslt.XSLTProcessorFactory.getProcessor( ); org.apache.xalan.xslt.XSLTInputSource xmlInputSource = new org.apache.xalan.xslt.XSLTInputSource(xmlSystemId); org.apache.xalan.xslt.XSLTInputSource xsltInputSource = new org.apache.xalan.xslt.XSLTInputSource(xsltSystemId); org.apache.xalan.xslt.XSLTResultTarget resultTree = new org.apache.xalan.xslt.XSLTResultTarget(System.out); processor.process(xmlInputSource, xsltInputSource, resultTree); } } The code begins with the usual list of imports and the class declaration, followed by a simple check to ensure that two command line arguments are provided. If all is OK, then the XML file name and XSLT file name are converted into system identifier values: String xmlSystemId = new File(xmlFileName).toURL().toExternal Form( ); String xsltSystemId = new File(xsltFileName).toURL().toExternalForm( ); System identifiers are part of the XML specification and really mean the same thing as a Uniform Resource Identifier (URI). A Uniform Resource Locator (URL) is a specific type of URI and can be used for methods that require system identifiers as parameters. From a Java programming perspective, this means that a platform-specific filename such as C:/data/simple.xml needs to be converted to file:///C:/data/simple.xml before it can be used by most XML APIs. The code shown here does the conversion and will work on Unix, Windows, and other platforms supported by Java. Although you could try to manually prepend the filename with the literal string file:///, that may not result in portable code. The documentation for java.io.File clearly states that its toURL( ) method generates a system-dependent URL, so the results will vary when the same code is executed on a non-Windows platform. In fact, on Windows the code actually produces a nonstandard URL (with a single slash), although it does work within Java programs: file:/C:/data/simple.xml. Now that we have system identifiers for our two input files, an instance of the XSLT processor is created: org.apache.xalan.xslt.XSLTProcessor processor = org.apache.xalan.xslt.XSLTProcessorFactory.getProcessor( );
Slide 114: XSLTProcessor is an interface, and XSLTProcessorFactory is a factory for creating new instances of classes that implement it. Because Xalan is open source software, it is easy enough to determine that XSLTEngineImpl is the class that implements the XSLTProcessor interface, although you should try to avoid code that depends on the specific implementation. The next few lines of code create XSLTInputSource objects, one for the XML file and another for the XSLT file: org.apache.xalan.xslt.XSLTInputSource xmlInputSource = new org.apache.xalan.xslt.XSLTInputSource(xmlSystemId); org.apache.xalan.xslt.XSLTInputSource xsltInputSource = new org.apache.xalan.xslt.XSLTInputSo urce(xsltSystemId); XSLTInputSource is a subclass of org.xml.sax.InputSource, adding the ability to read directly from a DOM Node. XSLTInputSource has the ability to read XML or XSLT data from a system ID, java.io.InputStream, java.io.Reader, org.w3c.dom.Node, or an existing InputSource. As shown in the code, the source of the data is specified in the constructor. XSLTInputSource also has a no-arg constructor, along with get/set methods for each of the supported data source types. An instance of XSLTResultTarget is created next, sending the result of the transformation to System.out: org.apache.xalan.xslt.XSLTResultTarget resultTree = new org.apache.xalan.xslt.XSLTResultTarget(System.out); In a manner similar to XSLTInputSource, the XSLTResultTarget can also be wrapped around an instance of org.w3c.dom.Node, an OutputStream or Writer, a filename (not a system ID!), or an instance of org.xml.sax.DocumentHandler. The final line of code simply instructs the processor to perform the transformation: processor.process(xmlInputSource, xsltInputSource, resultTree); 5.1.3 SAXON Implementation For comparison, a SAXON 5.5.1 implementation is presented in Example 5-2. As you scan through the code, you will notice the word "trax" appearing in the Java packages. This is an indication that Version 5.5.1 of SAXON was moving towards something called Transformation API for XML (TrAX). More information on TrAX is coming up in the JAXP discussion. In a nutshell, TrAX provides a uniform API that should work with any XSLT processor. Example 5-2. SimpleSaxon.java package chap5; import import import import java.io.*; java.net.MalformedURLException; java.net.URL; org.xml.sax.SAXException; /** * A simple demo of SAXON. This code was originally written using * SAXON 5.5.1. */ public class SimpleSaxon { /** * Accept two command line arguments: the name of an XML file, and
Slide 115: * the name of an XSLT stylesheet. The result of th e transformation * is written to stdout. */ public static void main(String[] args) throws MalformedURLException, IOException, SAXException { if (args.length != 2) { System.err.println("Usage:"); System.err.println(" java " + SimpleSaxon.class.getName( ) + " xmlFileName xsltFileName"); System.exit(1); } String xmlFileName = args[0]; String xsltFileName = args[1]; String xmlSystemId = new File(xmlFileName).toURL().toExternalForm( ); String xsltSystemId = new File(xsltFileName).toURL().toExternalForm( ); com.icl.saxon.trax.Processor processor = com.icl.saxon.trax.Processor.newInstance("xslt"); // unlike Xalan, SAXON uses the SAX InputSource. Xalan // uses its own class, XSLTInputSource org.xml.sax.InputSource xmlInputSource = new org.xml.sax.InputSource(xmlSystemId); org.xml.sax.InputSource xsltInputSourc e = new org.xml.sax.InputSource(xsltSystemId); com.icl.saxon.trax.Result result = new com.icl.saxon.trax.Result(System.out); // create a new compiled stylesheet com.icl.saxon.trax.Templates template s = processor.process(xsltInputSource); // create a transformer that can be used for a single transformation com.icl.saxon.trax.Transformer trans = templates.newTransformer( ); trans.transform(xmlInputSource, resul t); } } The SAXON implementation starts exactly as the Xalan implementation does. Following the class declaration, the command-line parameters are validated and then converted to system IDs. The XML and XSLT system IDs are then wrapped in org.xml.sax.InputSource objects as follows: org.xml.sax.InputSource xmlInputSource = new org.xml.sax.InputSource(xmlSystemId); org.xml.sax.InputSource xsltInputSource = new org.xml.sax.InputSource(xsltSystemId); This code is virtually indistinguishable from the Xalan code, except Xalan uses XSLTInputSource instead of InputSource. As mentioned before, XSLTInputSource is
Slide 116: merely a subclass of InputSource that adds support for reading from a DOM Node. SAXON also has the ability to read from a DOM node, although its approach is slightly different. Creating a Result object sets up the destination for the XSLT result tree, which is directed to System.out in this example: com.icl.saxon.trax.Result result = new com.icl.saxon.trax.Result(System.out); The XSLT stylesheet is then compiled, resulting in an object that can be used repeatedly from many concurrent threads: com.icl.saxon.trax.Templates templates = processor.process(xsltInputSource); In a typical XML and XSLT web site, the XML data is generated dynamically, but the same stylesheets are used repeatedly. For instance, stylesheets generating common headers, footers, and navigation bars will be used by many pages. To maximize performance, you will want to process the stylesheets once and reuse the instances for many clients at the same time. For this reason, the thread safety that Templates offers is critical. An instance of the Transformer class is then created to perform the actual transformation. Unlike the stylesheet itself, the transformer cannot be shared by many clients and is not threadsafe. If this was a servlet implementation, the Transformer instance would have to be created with each invocation of doGet or doPost. In our example, the code is as follows: com.icl.saxon.trax.Transformer trans = templates.newTransformer( trans.transform(xmlInputSource, result); ); 5.1.4 SAXON, Xalan, or TrAX? As the previous examples show, SAXON and Xalan have many similarities. While similarities make learning the various APIs easy, they do not result in portable code. If you write code directly against either of these interfaces, you lock yourself into that particular implementation unless you want to rewrite your application. The other option is to write a facade around both processors, presenting a consistent interface that works with either processor behind the scenes. The only problem with this approach is that as new processors are introduced, you must update the implementation of your facade. It would be very difficult for one individual or organization to keep up with the rapidly changing world of XSLT processors. But if the facade was an open standard and supported by a large enough user base, the people and organizations that write the XSLT processors would feel pressure to adhere to the common API, rather than the other way around. TrAX was initiated in early 2000 as an effort to define a consistent API to any XSLT processor. Since some of the key people behind TrAX were also responsible for implementing some of the major XSLT processors, it was quickly accepted that TrAX would be a de facto standard, much in the way that SAX is. 5.2 Introduction to JAXP 1.1 TrAX was a great idea, and the original work and concepts behind it were absorbed into JAXP Version 1.1. If you search for TrAX on the Web and get the feeling that the effort is waning, this is only because focus has shifted from TrAX to JAXP. Although the name has changed, the concept has not: JAXP provides a standard Java interface to many XSLT processors, allowing you to choose your favorite underlying implementation while retaining portability. First released in March 2000, Sun's JAXP 1.0 utilized XML 1.0, XML Namespaces 1.0, SAX 1.0, and DOM Level 1. JAXP is a standard extension to Java, meaning that Sun provides a
Slide 117: specification through its Java Community Process (JCP) as well as a reference implementation. JAXP 1.1 follows the same basic design philosophies of JAXP 1.0, adding support for DOM Level 2, SAX 2, and XSLT 1.0. A tool like JAXP is necessary because the XSLT specification defines only a transformation language; it says nothing about how to write a Java XSLT processor. Although they all perform the same basic tasks, every processor uses a different API and has its own set of programming conventions. JAXP is not an XML parser, nor is it an XSLT processor. Instead, it provides a common Java interface that masks differences between various implementations of the supported standards. When using JAXP, your code can avoid dependencies on specific vendor tools, allowing flexibility to upgrade to newer tools when they become available. The key to JAXP's design is the concept of plugability layers. These layers provide consistent Java interfaces to the underlying SAX, DOM, and XSLT implementations. In order to utilize one of these APIs, you must obtain a factory class without hardcoding Xalan or SAXON code into your application. This is accomplished via a lookup mechanism that relies on Java system properties. Since three separate plugability layers are used, you can use a DOM parser from one vendor, a SAX parser from another vendor, and yet another XSLT processor from someone else. In reality, you will probably need to use a DOM parser compatible with your XSLT processor if you try to transform the DOM tree directly. Figure 5-1 illustrates the high-level architecture of JAXP 1.1. Figure 5-1. JAXP 1.1 architecture As shown, application code does not deal directly with specific parser or processor implementations, such as SAXON or Xalan. Instead, you write code against abstract classes that JAXP provides. This level of indirection allows you to pick and choose among different implementations without even recompiling your application. The main drawback to an API such as JAXP is the "least common denominator" effect, which is all too familiar to AWT programmers. In order to maximize portability, JAXP mostly provides functionality that all XSLT processors support. This means, for instance, that Xalan's custom XPath APIs are not included in JAXP. In order to use value-added features of a particular processor, you must revert to nonportable code, negating the benefits of a plugability layer. Fortunately, most common tasks are supported by JAXP, so reverting to implementation-specific code is the exception, not the rule. Although the JAXP specification does not define an XML parser or XSLT processor, reference implementations do include these tools. These reference implementations are open source Apache XML tools,[1] so complete source code is available. [1] Crimson and Xalan.
Slide 118: 5.2.1 JAXP 1.1 Implementation You guessed it -- we will now reimplement the simple example using Sun's JAXP 1.1. Behind the scenes, this could use any JAXP 1.1-compliant XSLT processor; this code was developed and tested using Apache's Xalan 2 processor. Example 5-3 contains the complete source code. Example 5-3. SimpleJaxp.java package chap5; import java.io.*; /** * A simple demo of JAXP 1.1 */ public class SimpleJaxp { /** * Accept two command line arguments: the name of an XML file, and * the name of an XSLT stylesheet. The result of the transformation * is written to stdout. */ public static void main(String[] args) throws javax.xml.transform.Tra nsformerException { if (args.length != 2) { System.err.println("Usage:"); System.err.println(" java " + SimpleJaxp.class.getName( ) + " xmlFileName xsltFileName"); System.exit(1); } File xmlFile = new File(args[0]); File xsltFile = new File(args[1]); javax.xml.transform.Source xmlSource = new javax.xml.transform.stream.StreamSource(xmlFile); javax.xml.transform.Source xsltSource = new javax.xml.transform.stream.StreamSource(xsltFile); javax.xml.transform.Result result = new javax.xml.transform.stream.StreamResult(System.out); // create an instance of TransformerFactory javax.xml.transform.TransformerFactory transFact = javax.xml.transform.TransformerFactory.newInstance( javax.xml.transform.Transformer trans = transFact.newTransformer(xsltSource); trans.transform(xmlSource, result); } } As in the earlier examples, explicit package names are used in the code to point out which classes are parts of JAXP. In future examples, import statements will be favored because they result in less typing and more readable code. Our new program begins by declaring that it may throw TransformerException: public static void main(String[] args) );
Slide 119: throws javax.xml.transform.TransformerException { This is a general-purpose exception representing anything that might go wrong during the transformation process. In other processors, SAX-specific exceptions are typically propagated to the caller. In JAXP, TransformerException can be wrapped around any type of Exception object that various XSLT processors may throw. Next, the command-line arguments are converted into File objects. In the SAXON and Xalan examples, we created a system ID for each of these files. Since JAXP can read directly from a File object, the extra conversion to a URI is not needed: File xmlFile = new File(args[0]); File xsltFile = new File(args[1]); javax.xml.transform.Source xmlSource = new javax.xml.transform.stream.StreamSource(xmlFile); javax.xml.transform.Source xsltSource = new javax.xml.transform.stream.StreamSource(xsltFile); The Source interface is used to read both the XML file and the XSLT file. Unlike the SAX InputSource class or Xalan's XSLTInputSource class, Source is an interface that can have many implementations. In this simple example we are using StreamSource, which has the ability to read from a File object, an InputStream, a Reader, or a system ID. Later we will examine additional Source implementations that use SAX and DOM as input. Just like Source, Result is an interface that can have several implementations. In this example, a StreamResult sends the output of the transformations to System.out: javax.xml.transform.Result result = new javax.xml.transform.stream.StreamResult(System.out); Next, an instance of TransformerFactory is created: javax.xml.transform.TransformerFactory trans Fact = javax.xml.transform.TransformerFactory.newInstance( ); The TransformerFactory is responsible for creating Transformer and Template objects. In our simple example, we create a Transformer object: javax.xml.transform.Transformer trans = transFact.newTransformer(xsltSource); Transformer objects are not thread-safe, although they can be used multiple times. For a simple example like this, we will not encounter any problems. In a threaded servlet environment, however, multiple users cannot concurrently access the same Transformer instance. JAXP also provides a Templates interface, which represents a stylesheet that can be accessed by many concurrent threads. The transformer instance is then used to perform the actual transformation: trans.transform(xmlSource, result); This applies the XSLT stylesheet to the XML data, sending the result to System.out. 5.2.2 XSLT Plugability Layer JAXP 1.1 defines a specific lookup procedure to locate an appropriate XSLT processor. This must be accomplished without hardcoding vendor-specific code into applications, so Java system properties and JAR file service providers are used. Within your code, first locate an instance of the TransformerFactory class as follows: javax.xml.transform.TransformerFactory transFact = javax.xml.transform.TransformerFactory.newInstance( );
Slide 120: Since TransformerFactory is abstract, its newInstance( ) factory method is used to instantiate an instance of a specific subclass. The algorithm for locating this subclass begins by looking at the javax.xml.transform.TransformerFactory system property. Let us suppose that com.foobar.AcmeTransformer is a new XSLT processor compliant with JAXP 1.1. To utilize this processor instead of JAXP's default processor, you can specify the system property on the command line[2] when you start your Java application: [2] System properties can also be specified in Ant build files. java -Djavax.xml.transform.TransformerFactory=com.foobar.A cmeTransformer MyApp Provided that JAXP is able to instantiate an instance of AcmeTransformer, this is the XSLT processor that will be used. Of course, AcmeTransformer must be a subclass of TransformerFactory for this to work, so it is up to vendors to offer support for JAXP. If the system property is not specified, JAXP next looks for a property file named lib/jaxp.properties in the JRE directory. A property file consists of name=value pairs, and JAXP looks for a line like this: javax.xml.transform.TransformerFactory=com.foobar.AcmeTransformer You can obtain the location of the JRE with the following code: String javaHomeDir = System.getProperty("java.home"); Some popular development tools change the value of java.home when they are installed, which could prevent JAXP from locating jaxp.properties. JBuilder, for instance, installs its own version of Java 2 that it uses by default. The advantage of creating jaxp.properties in this directory is that you can use your preferred processor for all of your applications that use JAXP without having to specify the system property on the command line. You can still override this file with the -D command-line syntax, however. If jaxp.properties is not found, JAXP uses the JAR file service provider mechanism to locate an appropriate subclass of TransformerFactory. The service provider mechanism is outlined in the JAR file specification from Sun and simply means that you must create a file in the METAINF/services directory of a JAR file. In JAXP, this file is called javax.xml.transform.TransformerFactory. It contains a single line that specifies the implementation of TransformerFactory: com.foobar.AcmeTransformer in our fictitious example. If you look inside of xalan.jar in JAXP 1.1, you will find this file. In order to utilize a different parser that follows the JAXP 1.1 convention, simply make sure its JAR file is located first on your CLASSPATH. Finally, if JAXP cannot find an implementation class from any of the three locations, it uses its default implementation of TransformerFactory. To summarize, here are the steps that JAXP performs when attempting to locate a factory: 1. Use the value of the javax.xml.transform.TransformerFactory system property if it exists. 2. If JRE/lib/jaxp.properties exists, then look for a javax.xml.transform.TransformerFactory=ImplementationClass entry in that file.
Slide 121: 3. Use a JAR file service provider to look for a file called METAINF/services/javax.xml.transform.TransformerFactory in any JAR file on the CLASSPATH. 4. Use the default TransformerFactory instance. The JAXP 1.1 plugability layers for SAX and DOM follow the exact same process as the XSLT layer, only they use the javax.xml.parsers.SAXParserFactory and javax.xml.parsers.DocumentBuilderFactory system properties respectively. It should be noted that JAXP 1.0 uses a much simpler algorithm where it checks only for the existence of the system property. If that property is not set, the default implementation is used. 5.2.3 The Transformer Class As shown in Example 5-3, a Transformer object can be obtained from the TransformerFactory as follows: javax.xml.transform.TransformerFactory transFact = javax.xml.transform.TransformerFactory.newInstance( javax.xml.transform.Transformer trans = transFact.newTransformer(xsltSource); ); The Transformer instance is wrapped around an XSLT stylesheet and allows you to perform as many transformations as you wish. The main caveat is thread safety, because many threads cannot use a single Transformer instance concurrently. For each transformation, invoke the transform method: abstract void transform(Source xmlSource, Result outputTarget) throws TransformerException This method is abstract because the TransformerFactory returns a subclass of Transformer that does the actual work. The Source interface defines where the XML data comes from and the Result interface specifies where the transformation result is sent. The TransformerException will be thrown if anything goes wrong during the transformation process and may contain the location of the error and a reference to the original exception. The ability to properly report the location of the error is entirely dependent upon the quality of the underlying XSLT transformer implementation's error reporting. We will talk about specific classes that implement the Source and Result interfaces later in this chapter. Aside from actually performing the transformation, the Transformer implementation allows you to set output properties and stylesheet parameters. In XSLT, a stylesheet parameter is declared and used as follows: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:param name="image_dir" select="'images'"/> <xsl:template match="/"> <html> <body> <h1>Stylesheet Parameter Example</h1> <img src="{$image_dir}/sample.gif"/> </body> </html> </xsl:template> </xsl:stylesheet>
Slide 122: The <xsl:param> element declares the parameter name and an optional select attribute. This attribute specifies the default value if the stylesheet parameter is not provided. In this case, the string 'images' is the default value and is enclosed in apostrophes so it is treated as a string instead of an XPath expression. Later, the image_dir variable is referred to with the attribute value template syntax: {$image_dir}. Passing a variable for the location of your images is a common technique because your development environment might use a different directory name than your production web server. Another common use for a stylesheet parameter is to pass in data that a servlet generates dynamically, such as a unique ID for session tracking. From JAXP, pass this parameter via the Transformer instance. The code is simple enough: javax.xml.transform.Transformer trans = transFact.newTransformer(xsltSource); trans.setParameter("image_dir", "graphics"); You can set as many parameters as you like, and these parameters will be saved and reused for every transformation you make with this Transformer instance. If you wish to remove a parameter, you must call clearParameters( ), which clears all parameters for this Transformer instance. Parameters work similarly to a java.util.Map; if you set the same parameter twice, the second value overwrites the first value. Another use for the Transformer class is to get and set output properties through one of the following methods: void setOutputProperties(java.util.Properties props) void setOutputProperty(String name, String value) java.util.Properties getOutputProperties( ) String getOutputProperty(String name) As you can see, properties are specified as name/value pairs of Strings and can be set and retrieved individually or as a group. Unlike stylesheet parameters, you can un-set an individual property by simply passing in null for the value. The permitted property names are defined in the javax.xml.transform.OutputKeys class and are explained in Table 5-1. Table 5-1. Constants defined in javax.xml.transform.OutputKeys Constant Meaning Specifies a whitespace-separated list of element names whose CDATA_SECTION_ELEMENTS content should be output as CDATA sections. See the XSLT specification from the W3C for examples. Only used if DOCTYPE_SYSTEM is also used, this instructs the processor to output a PUBLIC document type declaration. For example: <!DOCTYPE rootElem PUBLIC "public id" "system id">. Instructs the processor to output a document-type declaration. For example: <!DOCTYPE rootElem SYSTEM "system id">. Specifies the character encoding of the result tree, such as UTF-8 or UTF-16. Specifies whether or not whitespace may be added to the result tree, making the output more readable. Acceptable values are yes DOCTYPE_PUBLIC DOCTYPE_SYSTEM ENCODING INDENT
Slide 123: and no. Although indentation makes the output more readable, it does make the file size larger, thus harming performance. MEDIA_TYPE The MIME type of the result tree. The output method, either xml, html, or text. Although other values are possible, such as xhtml, these are implementationdefined and may be rejected by your processor. Acceptable values are yes and no, specifying whether or not to include the XML declaration on the first line of the result tree. Acceptable values are yes and no, specifying whether or not the XML declaration indicates that the document is standalone. For example: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>. Specifies the version of the output method, typically 1.0 for XML output. This shows up in the XML declaration as follows: <?xml version="1.0" encoding="UTF-8"?>. METHOD OMIT_XML_DECLARATION STANDALONE VERSION It is no coincidence that these output properties are the same as the properties you can set on the <xsl:output> element in your stylesheets. For example: <xsl:output method="xml" indent="yes" encoding="UTF-8"/> Using JAXP, you can either specify additional output properties or override those set in the stylesheet. To change the encoding, write this code: // this will take precedence over any encoding specified in the stylesheet trans.setOutputProperty(OutputKeys.ENCODING, "UTF -16"); Keep in mind that this will, in addition to adding encoding="UTF-16" to the XML declaration, actually cause the processor to use that encoding in the result tree. For a value of UTF-16, this means that 16-bit Unicode characters will be generated, so you may have trouble viewing the result tree in many ASCII-only text editors. 5.2.4 JAXP XSLT Design Now that we have seen some example code and have begun our exploration of the Transformer class, let's step back and look at the overall design of the XSLT plugability layer. JAXP support for XSLT is broken down into the packages listed in Table 5-2. Table 5-2. JAXP transformation packages Package Description Defines a general-purpose API for XML transformations without any dependencies on SAX or DOM. The Transformer class is obtained from the TransformerFactory class. The Transformer transforms from a Source to a Result. Defines how transformations can be performed using DOM. javax.xml.transform javax.xml.transform.dom
Slide 124: Provides implementations of Source and Result: DOMSource and DOMResult. Supports SAX2 transformations. Defines SAX versions of Source and Result: SAXSource and SAXResult. Also defines a subclass of TransformerFactory that allows SAX2 events to be fed into an XSLT processor. Defines I/O stream implementations of Source and Result: StreamSource and StreamResult. javax.xml.transform.sax javax.xml.transform.stream The heart of JAXP XSLT support lies in the javax.xml.transform package, which lays out the mechanics and overall process for any transformation that is performed. This package mostly consists of interfaces and abstract classes, except for OutputKeys and a few exception and error classes. Figure 5-2 presents a UML class diagram that shows all of the pieces in this important package. Figure 5-2. javax.xml.transform class diagram As you can see, this is a small package, indicative of the fact that JAXP is merely a wrapper around the tools that actually perform transformations. The entry point is TransformerFactory, which creates instances of Transformer, as we have already seen, as well as instances of the Templates abstract class. A Templates object represents a compiled stylesheet and will be covered in detail later in this chapter.[3] The advantage of compilation is performance: the same Templates object can be used over and over by many threads without reparsing the XSLT file. [3] The exact definition of a "compiled" stylesheet is vague. XSLT processors are free to optimize cached stylesheets however they see fit. The URIResolver is responsible for resolving URIs found within stylesheets and is generally something you will not need to deal with directly. It is used when a stylesheet imports or includes
Slide 125: another document, and the processor needs to figure out where to look for that document. For example: <xsl:import href="commonFooter.xslt"/> ErrorListener, as you may guess, is an interface that allows your code to register as a listener for error conditions. This interface defines the following three methods: void error(TransformerException ex) void fatalError(TransformerException ex) void warning(TransformerException ex) The TransformerException has the ability to wrap around another Exception or Throwable object and may return an instance of the SourceLocator class. If the underlying XSLT implementation does not provide a SourceLocator, null is returned. The SourceLocator interface defines methods to locate where a TransformerException originated. In the case of error() and warning(), the XSLT processor is required to continue processing the document until the end. For fatalError(), on the other hand, the XSLT processor is not required to continue. If you do not register an ErrorListener object, then all errors, fatal errors, and warnings are normally written to System.err. TransformerFactoryConfigurationError and TransformerConfigurationException round out the error-handling APIs for JAXP, indicating problems configuring the underlying XSLT processor implementation. The TransformerFactoryConfigurationError class is generally used when the implementation class cannot be found on the CLASSPATH or cannot be instantiated at all. TransformerConfigurationException simply indicates a "serious configuration error" according to its documentation. 5.3 Input and Output XSLT processors, like other XML tools, can read their input data from many different sources. In the most basic scenario, you will load a static stylesheet and XML document using the java.io.File class. More commonly, the XSLT stylesheet will come from a file, but the XML data will be generated dynamically as the result of a database query. In this case, it does not make sense to write the database query results to an XML file and then parse it into the XSLT processor. Instead, it is desirable to pipe the XML data directly into the processor using SAX or DOM. In fact, we will even see how to read nonXML data and transform it using XSLT. 5.3.1 System Identifiers, Files, and URLs The simple examples presented earlier in this chapter introduced the concept of a system identifier. As mentioned before, system identifiers are nothing more than URIs and are used frequently by XML tools. For example, javax.xml.transform.Source, one of the key interfaces in JAXP, has the following API: public interface Source { String getSystemId( ); void setSystemId(String systemId); } The second method, setSystemId( ), is crucial. By providing a URI to the Source, the XSLT processor can resolve URIs encountered in XSLT stylesheets. This allows XSLT code like this to work: <xsl:import href="commonFooter.xslt"/> When it comes to XSLT programming, you will use methods in java.io.File and java.net.URL to convert platform-specific file names into system IDs. These can then be used
Slide 126: as parameters to any methods that expect a system ID as a parameter. For example, you would write the following code to convert a platform-specific filename into a system ID: public static void main(String[] args) { // assume that the first command-line arg contains a file name // - on Windows, something like "C:\home\index.xml" // - on Unix, something like "/usr/home/index.xml" String fileName = args[0]; File fileObject = new File(fileName); URL fileURL = fileObject.toURL( ); String systemID = fileURL.toExternalForm( ); This code was written on several lines for clarity; it can be consolidated as follows: String systemID = new File(fileName).toURL().toExterna lForm( ); Converting from a system identifier back to a filename or a File object can be accomplished with this code: URL url = new URL(systemID); String fileName = url.getFile( ); File fileObject = new File(fileName); And once again, this code can be condensed into a single line as follows: File fileObject = new File((new URL(systemID)).getFile( )); 5.3.2 JAXP I/O Design The Source and Result interfaces in javax.xml.transform provide the basis for all transformation input and output in JAXP 1.1. Regardless of whether a stylesheet is obtained via a URI, filename, or InputStream, its data is fed into JAXP via an implementation of the Source interface. The output is then sent to an implementation of the Result interface. The implementations provided by JAXP are shown in Figure 5-3. Figure 5-3. Source and Result interfaces As you can see, JAXP is not particular about where it gets its data or sends its results. Remember that two instances of Source are always specified: one for the XML data and another for the XSLT stylesheet.
Slide 127: 5.3.3 JAXP Stream I/O As shown in Figure 5-3, StreamSource is one of the implementations of the Source interface. In addition to the system identifiers that Source provides, StreamSource allows input to be obtained from a File, an InputStream, or a Reader. The SimpleJaxp class in Example 5-3 showed how to use StreamSource to read from a File object. There are also four constructors that allow you to construct a StreamSource from either an InputStream or Reader. The complete list of constructors is shown here: public public public public public public public StreamSource( ) StreamSource(File f) StreamSource(String systemId) StreamSource(InputStream byteStream) StreamSource(InputStream byteStream, String systemId) StreamSource(Reader characterStream) StreamSource(Reader characterStream, String systemId) For the constructors that take InputStream and Reader as arguments, the first argument provides either the XML data or the XSLT stylesheet. The second argument, if present, is used to resolve relative URI references in the document. As mentioned before, your XSLT stylesheet may include the following code: <xsl:import href="commonFooter.xslt"/> By providing a system identifier as a parameter to the StreamSource, you are telling the XSLT processor where to look for commonFooter.xslt. Without this parameter, you may encounter an error when the processor cannot resolve this URI. The simple fix is to call the setSystemId( ) method as follows: // construct a Source that reads from an InputStream Source mySrc = new StreamSource(anInputStream); // specify a system ID (a String) so the Source can resolve relative URLs // that are encountered in XSLT stylesheets mySrc.setSystemId(aSystemId); The documentation for StreamSource also advises that InputStream is preferred to Reader because this allows the processor to properly handle the character encoding as specified in the XML declaration. StreamResult is similar in functionality to StreamSource, although it is not necessary to resolve relative URIs. The available constructors are as follows: public public public public public StreamResult( ) StreamResult(File f) StreamResult(String systemId) StreamResult(OutputStream byteStream) StreamResult(Writer characterStream) Let's look at some of the other options for StreamSource and StreamResult. Example 5-4 is a modification of the SimpleJaxp program that was presented earlier. It downloads the XML specification from the W3C web site and stores it in a temporary file on your local disk. To download the file, construct a StreamSource with a system identifier as a parameter. The stylesheet is a simple one that merely performs an identity transformation, copying the unmodified XML data to the result tree. The result is then sent to a StreamResult using its File constructor. Example 5-4. Streams.java package chap5;
Slide 128: import java.io.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; /** * A simple demo of JAXP 1.1 StreamSource and StreamResult. This * program downloads the XML specification from the W3C and prints * it to a temporary file. */ public class Streams { // an identity copy stylesheet private static final String IDENTITY_XSLT = "<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'" + " version='1.0'>" + "<xsl:template match='/'><xsl:copy -of select='.'/>" + "</xsl:template></xsl:stylesheet>"; // the XML spec in XML format // (using an HTTP URL rather than a file URL) private static String xmlSystemId = "http://www.w3.org/TR/2000/REC -xml-20001006.xml"; public static void main(String[] args) throws IOException, TransformerException { // show how to read from a system identifier and a Reader Source xmlSource = new StreamSource(xmlSystemId); Source xsltSource = new StreamSource( new StringReader(IDENTITY_XSLT)); // send the result to a file File resultFile = File.createTempFile("Streams", ".xml"); Result result = new StreamResult(resultFile); System.out.println("Results will go to : " + resultFile.getAbsolutePath( )); // get the factory TransformerFactory transFact = TransformerFactory.newInstance( ); // get a transformer for this particular stylesheet Transformer trans = transFact.newTransformer(xsltSource); // do the transformation trans.transform(xmlSource, result); } } The "identity copy" stylesheet simply matches "/", which is the document itself. It then uses <xsl:copy-of select='.'/> to select the document and copy it to the result tree. In this case, we coded our own stylesheet. You can also omit the XSLT stylesheet altogether as follows: // construct a Transformer without any XSLT stylesheet Transformer trans = transFact.newTransformer( );
Slide 129: In this case, the processor will provide its own stylesheet and do the same thing that our example does. This is useful when you need to use JAXP to convert a DOM tree to XML text for debugging purposes because the default Transformer will simply copy the XML data without any transformation. 5.3.4 JAXP DOM I/O In many cases, the fastest form of transformation available is to feed an instance of org.w3c.dom.Document directly into JAXP. Although the transformation is fast, it does take time to generate the DOM; DOM is also memory intensive, and may not be the best choice for large documents. In most cases, the DOM data will be generated dynamically as the result of a database query or some other operation (see Chapter 1). Once the DOM is generated, simply wrap the Document object in a DOMSource as follows: org.w3c.dom.Document domDoc = createDomDocument( ); Source xmlSource = new javax.xml.transform.dom.DOMSource(domDoc); The remainder of the transformation looks identical to the file-based transformation shown in Example 5-4. JAXP needs only the alternate input Source object shown here to read from DOM. 5.3.5 JAXP SAX I/O XSLT is designed to transform well-formed XML data into another format, typically HTML. But wouldn't it be nice if we could also use XSLT stylesheets to transform nonXML data into HTML? For example, most spreadsheets have the ability to export their data into Comma Separated Values (CSV) format, as shown here: Burke,Eric,M Burke,Jennifer,L Burke,Aidan,G One approach is parsing the file into memory, using DOM to create an XML representation of the data, and then feeding that information into JAXP for transformation. This approach works but requires an intermediate programming step to convert the CSV file into a DOM tree. A better option is to write a custom SAX parser, feeding its output directly into JAXP. This avoids the overhead of constructing the DOM tree, offering better memory utilization and performance. 5.3.5.1 The approach It turns out that writing a SAX parser is quite easy.[4] All a SAX parser does is read an XML file top to bottom and fire event notifications as various elements are encountered. In our custom parser, we will read the CSV file top to bottom, firing SAX events as we read the file. A program listening to those SAX events will not realize that the data file is CSV rather than XML; it sees only the events. Figure 5-4 illustrates the conceptual model. [4] Our examples use SAX 2. Figure 5-4. Custom SAX parser
Slide 130: In this model, the XSLT processor interprets the SAX events as XML data and uses a normal stylesheet to perform the transformation. The interesting aspect of this model is that we can easily write custom SAX parsers for other file formats, making XSLT a useful transformation language for just about any legacy application data. In SAX, org.xml.sax.XMLReader is a standard interface that parsers must implement. It works in conjunction with org.xml.sax.ContentHandler, which is the interface that listens to SAX events. For this model to work, your XSLT processor must implement the ContentHandler interface so it can listen to the SAX events that the XMLReader generates. In the case of JAXP, javax.xml.transform.sax.TransformerHandler is used for this purpose. Obtaining an instance of TransformerHandler requires a few extra programming steps. First, create a TransformerFactory as usual: TransformerFactory transFact = TransformerF actory.newInstance( ); As before, the TransformerFactory is the JAXP abstraction to some underlying XSLT processor. This underlying processor may not support SAX features, so you have to query it to determine if you can proceed: if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { If this returns false, you are out of luck. Otherwise, you can safely downcast to a SAXTransformerFactory and construct the TransformerHandler instance: SAXTransformerFactory saxTransFact = (SAXTransformerFactory) transFact; // create a ContentHandler, don't specify a stylesheet. Without // a stylesheet, raw XML is sent to the output. TransformerHandler transHand = saxTransFact.newTransformerHandler( ); In the code shown here, a stylesheet was not specified. JAXP defaults to the identity transformation stylesheet, which means that the SAX events will be "transformed" into raw XML output. To specify a stylesheet that performs an actual transformation, pass a Source to the method as follows: Source xsltSource = new StreamSource(myXsltSystemId); TransformerHandler transHand = saxTransFact.newTransformerHandler( xsltSource); 5.3.5.2 Detailed CSV to SAX design Before delving into the complete example program, let's step back and look at a more detailed design diagram. The conceptual model is straightforward, but quite a few classes and interfaces come into play. Figure 5-5 shows the pieces necessary for SAX-based transformations. Figure 5-5. SAX and XSLT transformations
Slide 131: This diagram certainly appears to be more complex than previous approaches, but is similar in many ways. In previous approaches, we used the TransformerFactory to create instances of Transformer; in the SAX approach, we start with a subclass of TransformerFactory. Before any work can be done, you must verify that your particular implementation supports SAX-based transformations. The reference implementation of JAXP does support this, although other implementations are not required to do so. In the following code fragment, the getFeature method of TransformerFactory will return true if you can safely downcast to a SAXTransformerFactory instance: TransformerFactory transFact = TransformerFactory.newInstance( ); if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { // downcast is allowed SAXTransformerFactory saxTransFact = (SAXTransformerFa ctory) transFact; If getFeature returns false, your only option is to look for an implementation that does support SAX-based transformations. Otherwise, you can proceed to create an instance of TransformerHandler: TransformerHandler transHand = saxTransFact.newTransformerHandler(myXsltSource); This object now represents your XSLT stylesheet. As Figure 5-5 shows, TransformerHandler extends org.xml.sax.ContentHandler, so it knows how to listen to events from a SAX parser. The series of SAX events will provide the "fake XML" data, so the only remaining piece of the puzzle is to set the Result and tell the SAX parser to begin parsing. The TransformerHandler also provides a reference to a Transformer, which allows you to set output properties such as the character encoding, whether to indent the output or any other attributes of <xsl:output>. 5.3.5.3 Writing the custom parser Writing the actual SAX parser sounds harder than it really is. The process basically involves implementing the org.xml.sax.XMLReader interface, which provides numerous methods you can safely ignore for most applications. For example, when parsing a CSV file, it is probably not
Slide 132: necessary to deal with namespaces or validation. The code for AbstractXMLReader.java is shown in Example 5-5. This is an abstract class that provides basic implementations of every method in the XMLReader interface except for the parse( ) method. This means that all you need to do to write a parser is create a subclass and override this single method. Example 5-5. AbstractXMLReader.java package com.oreilly.javaxslt.util; import java.io.IOException; import java.util.*; import org.xml.sax.*; /** * An abstract class that implements the SAX2 XMLReader interface. The * intent of this class is to make it easy for subclasses to act as * SAX2 XMLReader implementations. This makes it possible, for examp le, for * them to emit SAX2 events that can be fed into an XSLT processor for * transformation. */ public abstract class AbstractXMLReader implements org.xml.sax.XMLReader { private Map featureMap = new HashMap( ); private Map propertyMap = new HashMap( ); private EntityResolver entityResolver; private DTDHandler dtdHandler; private ContentHandler contentHandler; private ErrorHandler errorHandler; /** * The only abstract method in this class. Derived classes can parse * any source of data and emit SAX2 events to the ContentHandler. */ public abstract void parse(InputSource input) throws IOException, SAXException; public boolean getFeature(String name) throws SAXNotRecognizedException, SAXNotSupportedException { Boolean featureValue = (Boolean) this.featureMap.get(name); return (featureValue == null) ? false : featureValue.booleanValue( ); } public void setFeature(String name, boolean val ue) throws SAXNotRecognizedException, SAXNotSupportedException { this.featureMap.put(name, new Boolean(value)); } public Object getProperty(String name) throws SAXNotRecognizedException, SAXNotSupportedException { return this.propertyMap.get(name); } public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException { this.propertyMap.put(name, value);
Slide 133: } public void setEntityResolver(EntityResolver entityResolver) { this.entityResolver = entityResolver; } public EntityResolver getEntityResolver( return this.entityResolver; } ){ public void setDTDHandler(DTDHandler dtdHandler) { this.dtdHandler = dtdHandler; } public DTDHandler getDTDHandler( return this.dtdHandler; } ){ public void setContentHandler(ContentHandler contentHandler) { this.contentHandler = contentHandler; } public ContentHandler getContentHandler( return this.contentHandler; } ){ public void setErrorHandler(ErrorHandler errorHandler) { this.errorHandler = errorHandler; } public ErrorHandler getErrorHandler( return this.errorHandler; } ){ public void parse(String systemId) throws IOException, SAXException { parse(new InputSource(systemId)); } } Creating the subclass, CSVXMLReader, involves overriding the parse( ) method and actually scanning through the CSV file, emitting SAX events as elements in the file are encountered. While the SAX portion is very easy, parsing the CSV file is a little more challenging. To make this class as flexible as possible, it was designed to parse through any CSV file that a spreadsheet such as Microsoft Excel can export. For simple data, your CSV file might look like this: Burke,Eric,M Burke,Jennifer,L Burke,Aidan,G The XML representation of this file is shown in Example 5-6. The only real drawback here is that CSV files are strictly positional, meaning that names are not assigned to each column of data. This means that the XML output merely contains a sequence of three <value> elements for each line, so your stylesheet will have to select items based on position. Example 5-6. Example XML output from CSV parser <?xml version="1.0" encoding="UTF-8"?> <csvFile>
Slide 134: <line> <value>Burke</value> <value>Eric</value> <value>M</value> </line> <line> <value>Burke</value> <value>Jennifer</value> <value>L</value> </line> <line> <value>Burke</value> <value>Aidan</value> <value>G</value> </line> </csvFile> One enhancement would be to design the CSV parser so it could accept a list of meaningful column names as parameters, and these could be used in the XML that is generated. Another option would be to write an XSLT stylesheet that transformed this initial output into another form of XML that used meaningful column names. To keep the code example relatively manageable, these features were omitted from this implementation. But there are some complexities to the CSV file format that have to be considered. For example, fields that contain commas must be surrounded with quotes: "Consultant,Author,Teacher",Burke,Eric,M Teacher,Burke,Jennifer,L None,Burke,Aidan,G To further complicate matters, fields may also contain quotes ("). In this case, they are doubled up, much in the same way you use double backslash characters (\\) in Java to represent a single backslash. In the following example, the first column contains a single quote, so the entire field is quoted, and the single quote is doubled up: "test""quote",Teacher,Burke,Jennifer,L This would be interpreted as: test"quote,Teacher,Burke,Jennifer,L The code in Example 5-7 shows the complete implementation of the CSV parser. Example 5-7. CSVXMLReader.java package com.oreilly.javaxslt.util; import java.io.*; import java.net.URL; import org.xml.sax.*; import org.xml.sax.helpers.*; /** * A utility class that parses a Comma Separated Values (CSV) file * and outputs its contents using SAX2 events. The format of CSV that * this class reads is identical to the export format for Microsoft * Excel. For simple values, the CSV file may look like this: * <pre> * a,b,c * d,e,f * </pre>
Slide 135: * Quotes are used as delimiters when the values contain commas: * <pre> * a,"b,c",d * e,"f,g","h,i" * </pre> * And double quotes are used when the values contain quotes. This parser * is smart enough to trim spaces around commas, as well. * * @author Eric M. Burke */ public class CSVXMLReader extends AbstractXMLReader { // an empty attribute for use with SAX private static final Attributes EMPTY_ATTR = new AttributesImpl( ); /** * Parse a CSV file. SAX events are delivered to the ContentHandler * that was registered via <code>setContentHandler</code>. * * @param input the comma separated values file to parse. */ public void parse(InputSource input) throws IOException, SAXException { // if no handler is registered to receive events, don't bother // to parse the CSV file ContentHandler ch = getContentHandler( ); if (ch == null) { return; } // convert the InputSource into a BufferedReader BufferedReader br = null; if (input.getCharacterStream( ) != null) { br = new BufferedReader(input.getCharacterStream( )); } else if (input.getByteStream( ) != null) { br = new BufferedReader(new InputStreamReader( input.getByteStream( ))); } else if (input.getSystemId( ) != null) { java.net.URL url = new URL(input.getSystemId( )); br = new BufferedReader(new InputStreamReader(url.openStream( ))); } else { throw new SAXException("Invalid InputSource object"); } ch.startDocument( ); // emit <csvFile> ch.startElement("","","csvFile",EMPTY_ATTR); // read each line of the file until EOF is reached String curLine = null; while ((curLine = br.readLine( )) != null) { curLine = curLine.trim( ); if (curLine.length( ) > 0) { // create the <line> element
Slide 136: ch.startElement("","","line",EMPTY_ATTR); // output data from this line parseLine(curLine, ch); // close the </line> element ch.endElement("","","line"); } } // emit </csvFile> ch.endElement("","","csvFile"); ch.endDocument( ); } // Break an individual line into tokens. This is a recursive function // that extracts the first token, then recursively parses the // remainder of the line. private void parseLine(String curLine, ContentHandler ch) throws IOException, SAXException { String firstToken = null; String remainderOfLine = null; int commaIndex = locateFirstDelimiter(curLine); if (commaIndex > -1) { firstToken = curLine.substring(0, commaIndex).trim( ); remainderOfLine = curLine.substring(commaInde x+1).trim( } else { // no commas, so the entire line is the token firstToken = curLine; } // remove redundant quotes firstToken = cleanupQuotes(firstToken); // emit the <value> element ch.startElement("","","value",EMPTY_ATTR); ch.characters(firstToken.toCharArray(), 0, firstToken.length( )); ch.endElement("","","value"); // recursively process the remainder of the line if (remainderOfLine != null) { parseLine(remainderOfLine, ch); } } // locate the position of the comma, taking into account that // a quoted token may contain ignorable commas. private int locateFirstDelimiter(String curLine) { if (curLine.startsWith("\"")) { boolean inQuote = true; int numChars = curLine.length( ); for (int i=1; i<numChars; i++) { char curChar = curLine.charAt(i); if (curChar == '"') { inQuote = !inQuote; } else if (curChar == ',' && !inQuote) { );
Slide 137: return i; } } return -1; } else { return curLine.indexOf(','); } } // remove quotes around a token, as well as pairs of quotes // within a token. private String cleanupQuotes(String token) { StringBuffer buf = new StringBuffer( ); int length = token.length( ); int curIndex = 0; if (token.startsWith("\"") && token.endsWith("\"")) { curIndex = 1; length--; } boolean oneQuoteFound = false; boolean twoQuotesFound = false; while (curIndex < length) { char curChar = token.charAt(curIndex); if (curChar == '"') { twoQuotesFound = (oneQuoteFound) ? true : false; oneQuoteFound = true; } else { oneQuoteFound = false; twoQuotesFound = false; } if (twoQuotesFound) { twoQuotesFound = false; oneQuoteFound = false; curIndex++; continue; } buf.append(curChar); curIndex++; } return buf.toString( } } CSVXMLReader is a subclass of AbstractXMLReader, so it must provide an implementation of the abstract parse method: public void parse(InputSource input) throws IOException, SAXException { // if no handler is registered to receive events, don't bother // to parse the CSV file ContentHandler ch = getContentHandler( ); if (ch == null) { );
Slide 138: return; } The first thing this method does is check for the existence of a SAX ContentHandler. The base class, AbstractXMLReader, provides access to this object, which is responsible for listening to the SAX events. In our example, an instance of JAXP's TransformerHandler is used as the SAX ContentHandler implementation. If this handler is not registered, our parse method simply returns because nobody is registered to listen to the events. In a real SAX parser, the XML would be parsed anyway, which provides an opportunity to check for errors in the XML data. Choosing to return immediately was merely a performance optimization selected for this class. The SAX InputSource parameter allows our custom parser to locate the CSV file. Since an InputSource has many options for reading its data, parsers must check each potential source in the order shown here: // convert the InputSource into a BufferedReader BufferedReader br = null; if (input.getCharacterStream( ) != null) { br = new BufferedReader(input.getCharacterStream( )); } else if (input.getByteStream( ) != null) { br = new BufferedReader(new InputStreamReader( input.getByteStream( ))); } else if (input.getSystemId( ) != null) { java.net.URL url = new URL(input.getSystemId( )); br = new BufferedReader(new InputStreamReader(url.openStream( } else { throw new SAXException("Invalid InputSource object"); } ))) ; Assuming that our InputSource was valid, we can now begin parsing the CSV file and emitting SAX events. The first step is to notify the ContentHandler that a new document has begun: ch.startDocument( ); // emit <csvFile> ch.startElement("","","csvFile",EMPTY_ATTR); The XSLT processor interprets this to mean the following: <?xml version="1.0" encoding="UTF-8"?> <csvFile> Our parser simply ignores many SAX 2 features, particularly XML namespaces. This is why many values passed as parameters to the various ContentHandler methods simply contain empty strings. The EMPTY_ATTR constant indicates that this XML element does not have any attributes. The CSV file itself is very straightforward, so we merely loop over every line in the file, emitting SAX events as we read each line. The parseLine method is a private helper method that does the actual CSV parsing: // read each line of the file until EOF is reached String curLine = null; while ((curLine = br.readLine( )) != null) { curLine = curLine.trim( ); if (curLine.length( ) > 0) { // create the <line> element ch.startElement("","","line",EMPTY_ATTR); parseLine(curLine, ch); ch.endElement("","","line"); }
Slide 139: } And finally, we must indicate that the parsing is complete: // emit </csvFile> ch.endElement("","","csvFile"); ch.endDocument( ); The remaining methods in CSVXMLReader are not discussed in detail here because they are really just responsible for breaking down each line in the CSV file and checking for commas, quotes, and other mundane parsing tasks. One thing worth noting is the code that emits text, such as the following: <value>Some Text Here</value> SAX parsers use the characters method on ContentHandler to represent text, which has this signature: public void characters(char[] ch, int start, int length) Although this method could have been designed to take a String, using an array allows SAX parsers to preallocate a large character array and then reuse that buffer repeatedly. This is why an implementation of ContentHandler cannot simply assume that the entire ch array contains meaningful data. Instead, it must read only the specified number of characters beginning at the start position. Our parser uses a relatively straightforward approach, simply converting a String to a character array and passing that as a parameter to the characters method: // emit the <value>text</value> element ch.startElement("","","value",EMPTY_ATTR); ch.characters(firstToken.toCharArray(), 0, firstToken.length( ch.endElement("","","value"); 5.3.5.4 Using the parser To wrap things up, let's look at how you will actually use this CSV parser with an XSLT stylesheet. The code shown in Example 5-8 is a standalone Java application that allows you to perform XSLT transformations on CSV files. As the comments indicate, it requires the name of a CSV file as its first parameter and can optionally take the name of an XSLT stylesheet as its second parameter. All output is sent to System.out. Example 5-8. SimpleCSVProcessor.java package com.oreilly.javaxslt.util; import import import import import java.io.*; javax.xml.transform.*; javax.xml.transform.sax.*; javax.xml.transform.stream.*; org.xml.sax.*; )); /** * Shows how to use the CSVXMLReader class. This is a command -line * utility that takes a CSV file and optionally an XSLT file as * command line parameters. A transformation is applied and the * output is sent to System.out. */ public class SimpleCSVProcessor { public static void main(String[] args) throws Exception { if (args.length == 0) {
Slide 140: System.err.println("Usage: java " + SimpleCSVProcessor.class.getName( ) + " <csvFile> [xsltFile]"); System.err.println(" - csvFile is required"); System.err.println(" - xsltFile is optional"); System.exit(1); } String csvFileName = args[0]; String xsltFileName = (args.length > 1) ? args[1] : null; TransformerFactory transFact = TransformerFactory.newInstance( ); if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { SAXTransformerFactory saxTransFact = (SAXTransformerFactory) transFact; TransformerHandler transHand = null; if (xsltFileName == null) { transHand = saxTransFact.newTransformerHandler( ); } else { transHand = saxTransFact.newTransformerHandler( new StreamSource(new File(xsltFileName))); } // set the destination for the XSLT transformation transHand.setResult(new StreamResult(System.ou t)); // hook the CSVXMLReader to the CSV file CSVXMLReader csvReader = new CSVXMLReader( InputSource csvInputSrc = new InputSource( new FileReader(csvFileName)); ); // attach the XSLT processor to the CSVXMLReader csvReader.setContentHandler(transHand); csvReader.parse(csvInputSrc); } else { System.err.println("SAXTransformerFactory is not supported."); System.exit(1); } } } As mentioned earlier in this chapter, the TransformerHandler is provided by JAXP and is an implementation of the org.xml.sax.ContentHandler interface. It is constructed by the SAXTransformerFactory as follows: TransformerHandler transHand = null; if (xsltFileName == null) { transHand = saxTransFact.newTransformerHandler( ); } else { transHand = saxTransFact.newTransformerHandler( new StreamSource(new File(xsltFileName))); } When the XSLT stylesheet is not specified, the transformer performs an identity transformation. This is useful when you just want to see the raw XML output without applying a stylesheet. You
Slide 141: will probably want to do this first to see how your XSLT will need to be written. If a stylesheet is provided, however, it is used for the transformation. The custom parser is then constructed as follows: CSVXMLReader csvReader = new CSVXMLReader( ); The location of the CSV file is then converted into a SAX InputSource: InputSource csvInputSrc = new InputSource( new FileReader(csvFileName)); And finally, the XSLT processor is attached to our custom parser. This is accomplished by registering the TransformerHandler as the ContentHandler on csvReader. A single call to the parse method causes the parsing and transformation to occur: // attach the XSLT processor to the CSVXMLReader csvReader.setContentHandler(transHand); csvReader.parse(csvInputSrc); For a simple test, assume that a list of presidents is available in CSV format: Washington,George,, Adams,John,, Jefferson,Thomas,, Madison,James,, etc... Bush,George,Herbert,Walker Clinton,William,Jefferson, Bush,George,W, To see what the XML looks like, invoke the program as follows: java com.oreilly.javaxslt.util.SimpleCSVProcessor presidents.csv This will parse the CSV file and apply the identity transformation stylesheet, sending the following output to the console: <?xml version="1.0" encoding="UTF-8"?> <csvFile> <line> <value>Washington</value> <value>George</value> <value/> <value/> </line> <line> etc... </csvFile> Actually, the output is crammed onto a single long line, but it is broken up here to make it more readable. Any good XML editor application should provide a feature to pretty-print the XML as shown. In order to transform this into something useful, a stylesheet is required. The XSLT stylesheet shown in Example 5-9 takes any output from this program and converts it into an HTML table. Example 5-9. csvToHTMLTable.xslt <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/>
Slide 142: <xsl:template match="/"> <table border="1"> <xsl:apply-templates select="csvFile/line"/> </table> </xsl:template> <xsl:template match="line"> <tr> <xsl:apply-templates select="value"/> </tr> </xsl:template> <xsl:template match="value"> <td> <!-- If a value is empty, print a non-breaking space so the HTML table looks OK --> <xsl:if test=".=''"> <xsl:text> disable-output-escaping="yes">&amp;nbsp;</xsl:text> </xsl:if> <xsl:value-of select="."/> </td> </xsl:template> </xsl:stylesheet> In order to apply this stylesheet, type the following command: java com.oreilly.javaxslt.util.SimpleCSVProcessor presidents.csv csvToHTMLTable.xslt As before, the results are sent to System.out and contain code for an HTML table. This stylesheet will work with any CSV file parsed with SimpleCSVProcessor, not just presidents.xml. Now that the concept has been proved, you can add fancy formatting and custom output to the resulting HTML without altering any Java code -- just edit the stylesheet or write a new one. 5.3.5.5 Conclusion Although writing a SAX parser and connecting it to JAXP does involve quite a few interrelated classes, the resulting application requires only two command-line arguments and will work with any CSV or XSLT file. What makes this example interesting is that the same approach will work with essentially any data source. The steps are broken down as follows: 1. Create a custom SAX parser by implementing org.xml.sax.XMLReader or extending com.oreilly.javaxslt.util.AbstractXMLReader . 2. In your parser, emit the appropriate SAX events as you read your data. 3. Modify SimpleCSVProcessor to utilize your custom parser instead of CSVXMLReader. For example, you might want to write a custom parser that accepts a SQL statement as input rather than a CSV file. Your parser could then connect to a database, issue the query, and fire SAX events for each row in the ResultSet. This makes it very easy to extract data from any relational database without writing a lot of custom code. This also eliminates the intermediate step of JDOM or DOM production because the SAX events are fed directly into JAXP for transformation. 5.3.6 Feeding JDOM Output into JAXP
Slide 143: The DOM API is tedious to use, so many Java programmers opt for JDOM instead. The typical usage pattern is to generate XML dynamically using JDOM and then somehow transform that into a web page using XSLT. This presents a problem because JAXP does not provide any direct implementation of the javax.xml.Source interface that integrates with JDOM.[5] There are at least three available options: [5] As this is being written, members of the JDOM community are writing a JDOM implementation of javax.xml.Source that will directly integrate with JAXP. • • • Use org.jdom.output.SAXOutputter to pipe SAX 2 events from JDOM to JAXP. Use org.jdom.output.DOMOutputter to convert the JDOM tree to a DOM tree, and then use javax.xml.transform.dom.DOMSource to read the data into JAXP. Use org.jdom.output.XMLOutputter to serialize the JDOM tree to XML text, and then use java.xml.transform.stream.StreamSource to parse the XML back into JAXP. 5.3.6.1 JDOM to SAX approach The SAX approach is generally preferable to other approaches. Its primary advantage is that it does not require an intermediate transformation to convert the JDOM tree into a DOM tree or text. This offers the lowest memory utilization and potentially the fastest performance. In support of SAX, JDOM offers the org.jdom.output.SAXOutputter class. The following code fragment demonstrates its usage: TransformerFactory transFact = TransformerFactory.newInstance( ); if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { SAXTransformerFactory stf = (SAXTransformerFactory) transFact; // the 'stylesheet' parameter is an instance of JAXP's // javax.xml.transform.Templates interface TransformerHandler transHand = stf.newTransformerHandler(stylesheet); // result is a Result instance transHand.setResult(result); SAXOutputter saxOut = new SAXOutputter(tr ansHand); // the 'jdomDoc' parameter is an instance of JDOM's // org.jdom.Document class. In contains the XML data saxOut.output(jdomDoc); } else { System.err.println("SAXTransformerFactory is not supported"); } 5.3.6.2 JDOM to DOM approach The DOM approach is generally a little slower and will not work if JDOM uses a different DOM implementation than JAXP. JDOM, like JAXP, can utilize different DOM implementations behind the scenes. If JDOM refers to a different version of DOM than JAXP, you will encounter exceptions when you try to perform the transformation. Since JAXP uses Apache's Crimson parser by default, you can configure JDOM to use Crimson with the org.jdom.adapters.CrimsonDOMAdapter class. The following code shows how to convert a JDOM Document into a DOM Document: org.jdom.Document jdomDoc = createJDOMDocument( // add data to the JDOM Document ... );
Slide 144: // convert the JDOM Document into a DOM Document org.jdom.output.DOMOutputter domOut = new org.jdom.output.DOMOutputter( "org.jdom.adapters.CrimsonDOMAdapter"); org.w3c.dom.Document domDoc = domOut.output(jdomDoc); The second line is highlighted because it is likely to give you the most problems. When JDOM converts its internal object tree into a DOM object tree, it must use some underlying DOM implementation. In many respects, JDOM is similar to JAXP because it delegates many tasks to underlying implementation classes. The DOMOutputter constructors are overloaded as follows: // use the default adapter class public DOMOutputter( ) // use the specified adapter class public DOMOutputter(String adapterClass) The first constructor shown here will use JDOM's default DOM parser, which is not necessarily the same DOM parser that JAXP uses. The second method allows you to specify the name of an adapter class, which must implement the org.jdom.adapters.DOMAdapter interface. JDOM includes standard adapters for all of the widely used DOM implementations, or you could write your own adapter class. 5.3.6.3 JDOM to text approach In the final approach listed earlier, you can utilize java.io.StringWriter and java.io.StringReader. First create the JDOM data as usual, then use org.jdom.output.XMLOutputter to convert the data into a String of XML: StringWriter sw = new StringWriter( ); org.jdom.output.XMLOutputter xmlOut = new org.jdom.output.XMLOutputter("", false); xmlOut.output(jdomDoc, sw); The parameters for XMLOutputter allow you to specify the amount of indentation for the output along with a boolean flag indicating whether or not linefeeds should be included in the output. In the code example, no spaces or linefeeds are specified in order to minimize the size of the XML that is produced. Now that the StringWriter contains your XML, you can use a StringReader along with javax.xml.transform.stream.StreamSource to read the data into JAXP: StringReader sr = new StringReader(sw.toString( )); Source xmlSource = new javax.xml.transform.stream.StreamSource(sr); The transformation can then proceed just as it did in Example 5-4. The main drawback to this approach is that the XML, once converted to text form, must then be parsed back in by JAXP before the transformation can be applied. 5.4 Stylesheet Compilation XSLT is a programming language, expressed using XML syntax. This is not for the benefit of the computer, but rather for human interpretation. Before the stylesheet can be processed, it must be converted into some internal machine-readable format. This process should sound familiar, because it is the same process used for every high-level programming language. You, the programmer, work in terms of the high-level language, and an interpreter or compiler converts this language into some machine format that can be executed by the computer. Interpreters analyze source code and translate it into machine code with each execution. In this case of XSLT, this requires that the stylesheet be read into memory using an XML parser, translated into machine format, and then applied to your XML data. Performance is the obvious problem, particularly when you consider that stylesheets rarely change. Typically, the stylesheets
Slide 145: are defined early on in the development process and remain static, while XML data is generated dynamically with each client request. A better approach is to parse the XSLT stylesheet into memory once, compile it to machineformat, and then preserve that machine representation in memory for repeated use. This is called stylesheet compilation and is no different in concept than the compilation of any programming language. 5.4.1 Templates API Different XSLT processors implement stylesheet compilation differently, so JAXP includes the javax.xml.transform.Templates interface to provide consistency. This is a relatively simple interface with the following API: public interface Templates { java.util.Properties getOutputProperties( ); javax.xml.transform.Transformer newTransformer( ) throws TransformerConfigurationException; } The getOutputProperties( ) method returns a clone of the properties associated with the <xsl:output> element, such as method="xml", indent="yes", and encoding="UTF-8". You might recall that java.util.Properties (a subclass of java.util.Hashtable) provides key/value mappings from property names to property values. Since a clone, or deep copy, is returned, you can safely modify the Properties instance and apply it to a future transformation without affecting the compiled stylesheet that the instance of Templates represents. The newTransformer( ) method is more commonly used and allows you to obtain a new instance of a class that implements the Transformer interface. It is this Transformer object that actually allows you to perform XSLT transformations. Since the implementation of the Templates interface is hidden by JAXP, it must be created by the following method on javax.xml.transform.TransformerFactory: public Templates newTemplates(Source source) throws TransformerConfigurationException As in earlier examples, the Source may obtain the XSLT stylesheet from one of many locations, including a filename, a system identifier, or even a DOM tree. Regardless of the original location, the XSLT processor is supposed to compile the stylesheet into an optimized internal representation. Whether the stylesheet is actually compiled is up to the implementation, but a safe bet is that performance will continually improve over the next several years as these tools stabilize and vendors have time to apply optimizations. Figure 5-6 illustrates the relationship between Templates and Transformer instances. Figure 5-6. Relationship between Templates and Transformer
Slide 146: Thread safety is an important issue in any Java application, particularly in a web context where many users share the same stylesheet. As Figure 5-6 illustrates, an instance of Templates is thread-safe and represents a single stylesheet. During the transformation process, however, the XSLT processor must maintain state information and output properties specific to the current client. For this reason, a separate Transformer instance must be used for each concurrent transformation. Transformer is an abstract class in JAXP, and implementations should be lightweight. This is an important goal because you will typically create many copies of Transformer, while the number of Templates is relatively small. Transformer instances are not thread-safe, primarily because they hold state information about the current transformation. Once the transformation is complete, however, these objects can be reused. 5.4.2 A Stylesheet Cache XSLT transformations commonly occur on a shared web server with a large number of concurrent users, so it makes sense to use Templates whenever possible to optimize performance. Since each instance of Templates is thread-safe, it is desirable to maintain a single copy shared by many clients. This reduces the number of times your stylesheets have to be parsed into memory and compiled, as well as the overall memory footprint of your application. The code shown in Example 5-10 illustrates a custom XSLT stylesheet cache that automates the mundane tasks associated with creating Templates instances and storing them in memory. This cache has the added benefit of checking the lastModified flag on the underlying file, so it will reload itself whenever the XSLT stylesheet is modified. This is highly useful in a webapplication development environment because you can make changes to the stylesheet and simply click on Reload on your web browser to see the results of the latest edits. Example 5-10. StylesheetCache.java package com.oreilly.javaxslt.util; import java.io.*;
Slide 147: import java.util.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; /** * A utility class that caches XSLT stylesheets in memory. * */ public class StylesheetCache { // map xslt file names to MapEntry instances // (MapEntry is defined below) private static Map cache = new HashMap( ); /** * Flush all cached stylesheets from memory, emptying the cache. */ public static synchronized void flushAll( ) { cache.clear( ); } /** * Flush a specific cached stylesheet from memo ry. * * @param xsltFileName the file name of the stylesheet to remove. */ public static synchronized void flush(String xsltFileName) { cache.remove(xsltFileName); } /** * Obtain a new Transformer instance for the sp ecified XSLT file name. * A new entry will be added to the cache if this is the first request * for the specified file name. * * @param xsltFileName the file name of an XSLT stylesheet. * @return a transformation context for the gi ven stylesheet. */ public static synchronized Transformer newTransformer(String xsltFileName) throws TransformerConfigurationException { File xsltFile = new File(xsltFileName); // determine when the file was last modif ied on disk long xslLastModified = xsltFile.lastModified( ); MapEntry entry = (MapEntry) cache.get(xsltFileName); if (entry != null) { // if the file has been modified more recently than the // cached stylesheet, remove the entry reference if (xslLastModified > entry.lastModified) { entry = null; } } // create a new entry in the cache if necessary
Slide 148: if (entry == null) { Source xslSource = new StreamSource(xsltFile); TransformerFactory transFact = TransformerFactory.newInstance( ); Templates templates = transFact.newTemplates(xslSource); entry = new MapEntry(xslLastModified, templates); cache.put(xsltFileName, entry); } return entry.templates.newTransformer( } // prevent instantiation of this class private StylesheetCache( ) { } /** * This class represents a value in the cache Map. */ static class MapEntry { long lastModified; // when the file was modified Templates templates; MapEntry(long lastModified, Templates templates) { this.lastModified = lastModified; this.templates = templates; } } } Because this class is a singleton, it has a private constructor and uses only static methods. Furthermore, each method is declared as synchronized in an effort to avoid potential threading problems. The heart of this class is the cache itself, which is implemented using java.util.Map: private static Map cache = new HashMap( ); Although HashMap is not thread-safe, the fact that all of our methods are synchronized basically eliminates any concurrency issues. Each entry in the map contains a key/value pair, mapping from an XSLT stylesheet filename to an instance of the MapEntry class. MapEntry is a nested class that keeps track of the compiled stylesheet along with when its file was last modified: static class MapEntry { long lastModified; // when the file was modified Templates templates; MapEntry(long lastModified, Templates templates) { this.lastModified = lastModified; this.templates = templates; } } Removing entries from the cache is accomplished by one of two methods: public static synchronized void flushAll( cache.clear( ); ){ );
Slide 149: } public static synchronized void flush(String xsltFileName) { cache.remove(xsltFileName); } The first method merely removes everything from the Map, while the second removes a single stylesheet. Whether you use these methods is up to you. The flushAll method, for instance, should probably be called from a servlet's destroy( ) method to ensure proper cleanup. If you have many servlets in a web application, each servlet may wish to flush specific stylesheets it uses via the flush(...) method. If the xsltFileName parameter is not found, the Map implementation silently ignores this request. The majority of interaction with this class occurs via the newTransformer method, which has the following signature: public static synchronized Transformer newTransformer(String xsltFileName) throws TransformerConfigurationException { The parameter, an XSLT stylesheet filename, was chosen to facilitate the "last accessed" feature. We use the java.io.File class to determine when the file was last modified, which allows the cache to automatically reload itself as edits are made to the stylesheets. Had we used a system identifier or InputStream instead of a filename, the auto-reload feature could not have been implemented. Next, the File object is created and its lastModified flag is checked: File xsltFile = new File(xsltFileName); // determine when the file was last modified on disk long xslLastModified = xsltFile.lastModified( ); The compiled stylesheet, represented by an instance of MapEntry, is then retrieved from the Map. If the entry is found, its timestamp is compared against the current file's timestamp, thus allowing auto-reload: MapEntry entry = (MapEntry) cache.get(xsltFileName); if (entry != null) { // if the file has been modified more recently than the // cached stylesheet, remove the entry reference if (xslLastModified > entry.lastModified) { entry = null; } } Next, we create a new entry in the cache if the entry object reference is still null. This is accomplished by wrapping a StreamSource around the File object, instantiating a TransformerFactory instance, and using that factory to create our Templates object. The Templates object is then stored in the cache so it can be reused by the next client of the cache: // create a new entry in the cache if necessary if (entry == null) { Source xslSource = new StreamSource(xsltFile); TransformerFactory transFact = TransformerFactory. newInstance( Templates templates = transFact.newTemplates(xslSource); entry = new MapEntry(xslLastModified, templates); cache.put(xsltFileName, entry); } );
Slide 150: Finally, a brand new Transformer is created and returned to the caller: return entry.templates.newTransformer( ); Returning a new Transformer is critical because, although the Templates object is threadsafe, the Transformer implementation is not. Each caller gets its own copy of Transformer so multiple clients do not collide with one another. One potential improvement on this design could be to add a lastAccessed timestamp to each MapEntry object. Another thread could then execute every couple of hours to flush map entries from memory if they have not been accessed for a period of time. In most web applications, this will not be an issue, but if you have a large number of pages and some are seldom accessed, this could be a way to reduce the memory usage of the cache. Another potential modification is to allow javax.xml.transform.Source objects to be passed as a parameter to the newTransformer method instead of as a filename. However, this would make the auto-reload feature impossible to implement for all Source types. Chapter 6. Servlet Basics and XSLT XSLT and servlets are a natural fit. Java is a cross-platform programming language, XML provides portable data, and XSLT provides a way to transform that data without cluttering up your servlet code with HTML. Because your data can be transformed into many different formats, you can also achieve portability across a variety of browsers and other devices. Best of all, a clean separation between data, presentation, and programming logic allow changes to be made to the look and feel of a web site without digging in to Java code. This makes it possible, for example, to sell highly customizable web applications. You can encourage your customers to modify the XSLT stylesheets to create custom page layouts and corporate logos, while preventing access to your internal Java business logic. As discussed in previous chapters, an initial challenge faced with XSLT and servlets is the initial configuration. Getting started with a web application is typically harder than client-only applications because there are more pieces to assemble. With a Swing application, for instance, you can start with a single class that has a main( ) method. But with a web application, you must create an XML deployment descriptor in addition to the servlet, package everything up into a WAR file, and properly deploy to a servlet container. When errors occur, you see something like "HTTP 404 -- File not found," which is not particularly helpful. The goal of this chapter is to introduce servlet syntax with particular emphasis on configuration and deployment issues. Once servlet syntax has been covered, integration with XSLT stylesheets and XML is covered, illustrated by the implementation of a basic web application. By the time you have worked through this material, you should have confidence to move on to the more complicated examples found in the remainder of this book. 6.1 Servlet Syntax Servlet architecture was covered in Chapter 4, along with comparisons to many other approaches. The architecture of a system is a mile-high view, ignoring implementation details so you can focus on the big picture. We now need to dig into the low-level syntax issues to proceed with the really interesting examples in later chapters. For a complete discussion of servlets, check out Jason Hunter's Java Servlet Programming (O'Reilly). Be sure to look for the second edition because so much has changed in the servlet world since this book was first published. 6.1.1 Splash Screen Servlet Example

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location