rickdog's picture
From rickdog rss RSS  subscribe Subscribe

Apress.Beginning.XML.with.DOM.and.Ajax.From.Novice.to.Professional.Jun.2006 



 

 
 
Views:  1988
Downloads:  21
Published:  October 24, 2007
 
1
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
No related plicks found
 
More from this user
about blank

about blank

From: rickdog
Views: 354
Comments: 0

file    C  Users rickdog Desktop page2

file C Users rickdog Desktop page2

From: rickdog
Views: 169
Comments: 0

bush photos

bush photos

From: rickdog
Views: 2021
Comments: 0

ebook) dhtml and javascript

ebook) dhtml and javascript

From: rickdog
Views: 1874
Comments: 0

See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 2: Beginning XML with DOM and Ajax From Novice to Professional Sas Jacobs
Slide 3: Beginning XML with DOM and Ajax: From Novice to Professional Copyright © 2006 by Sas Jacobs All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-13 (pbk): 978-1-59059-676-0 ISBN-10 (pbk): 1-59059-676-5 Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1 Trademarked names may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. Lead Editors: Charles Brown, Chris Mills Technical Reviewer: Allan Kent Editorial Board: Steve Anglin, Ewan Buckingham, Gary Cornell, Jason Gilmore, Jonathan Gennick, Jonathan Hassell, James Huddleston, Chris Mills, Matthew Moodie, Dominic Shakeshaft, Jim Sumser, Keir Thomas, Matt Wade Project Manager: Beth Christmas Copy Edit Manager: Nicole LeClerc Copy Editor: Nicole Abramowitz Assistant Production Director: Kari Brooks-Copony Production Editor: Kelly Winquist Compositor: Dina Quan Proofreader: Dan Shaw Indexer: Brenda Miller Artist: Kinetic Publishing Services, LLC Cover Designer: Kurt Krames Manufacturing Director: Tom Debolski Distributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax 201-348-4505, e-mail orders-ny@springer-sbm.com, or visit http://www.springeronline.com. For information on translations, please contact Apress directly at 2560 Ninth Street, Suite 219, Berkeley, CA 94710. Phone 510-549-5930, fax 510-549-5939, e-mail info@apress.com, or visit http://www.apress.com. The information in this book is distributed on an “as is” basis, without warranty. Although every precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in this work. The source code for this book is available to readers at http://www.apress.com in the Source Code section.
Slide 4: Contents at a Glance About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii About the Technical Reviewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER ■ CHAPTER 1 2 3 4 5 6 7 8 9 10 11 12 13 Introduction to XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Related XML Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Web Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Client-Side XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Displaying XML Using CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Introduction to XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Advanced Client-Side XSLT Techniques . . . . . . . . . . . . . . . . . . . . . . . 191 Scripting in the Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 The Ajax Approach to Browser Scripting . . . . . . . . . . . . . . . . . . . . . . 265 Using Flash to Display XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Introduction to Server-Side XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Case Study: Using .NET for an XML Application . . . . . . . . . . . . . . . . 349 Case Study: Using PHP for an XML Application . . . . . . . . . . . . . . . . 381 ■ INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 iii
Slide 6: Contents About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii About the Technical Reviewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix ■ CHAPTER 1 Introduction to XML .........................................1 What Is XML? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A Brief History of XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Goals of XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Understanding XML Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Well-Formed Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Understanding the Difference Between Tags and Elements . . . . . . . 5 Viewing a Complete XML Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Understanding the Structure of an XML Document . . . . . . . . . . . . . . 7 Naming Rules in XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Understanding the XML Document Prolog . . . . . . . . . . . . . . . . . . . . . . 9 Understanding Sections Within the XML Document Element . . . . . 11 The XML Processing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 XML Processing Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 DOM Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 SAX Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Why Have Two Processing Models? . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Some XML Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 ■ CHAPTER 2 Related XML Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Understanding the Role of XML Namespaces . . . . . . . . . . . . . . . . . . . . . . . 21 Adding Namespaces to XML Documents . . . . . . . . . . . . . . . . . . . . . . 23 Adding Default Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 v
Slide 7: vi ■CONTENTS Defining XML Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 The Document Type Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Comparing DTDs and Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Other Schema Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 XML Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Displaying XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 XML and CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 XSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 XPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 XPath Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Identifying Specific Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Including Calculations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . 46 XPath Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Linking with XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Simple Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Extended Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 XPointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 XML Links Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 ■ CHAPTER 3 Web Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Separation of Presentation and Content . . . . . . . . . . . . . . . . . . . . . . . 54 XHTML Construction Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 XHTML Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Well-Formed and Valid XHTML Documents . . . . . . . . . . . . . . . . . . . . 67 XHTML Modularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 MathML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Presentation MathML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Content MathML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Scalable Vector Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Vector Graphic Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Putting It Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 WSDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 SOAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Slide 8: ■CONTENTS vii Other Web Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 RSS and News Feeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 VoiceXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 SMIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Database Output Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 ■ CHAPTER 4 Client-Side XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Why Use Client-Side XML? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Working with XML Content Client-Side . . . . . . . . . . . . . . . . . . . . . . . 100 Styling Content in a Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Manipulating XML Content in a Browser . . . . . . . . . . . . . . . . . . . . . 101 Working with XML in Flash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Examining XML Support in Major Browsers . . . . . . . . . . . . . . . . . . . . . . . 103 Understanding the W3C DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Understanding the XML Schema Definition Language . . . . . . . . . . 104 Understanding XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Microsoft Internet Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Mozilla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Opera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Adobe (Formerly Macromedia) Flash . . . . . . . . . . . . . . . . . . . . . . . . . 115 Choosing Between Client and Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Using Client-Side XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Using Server-Side XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 ■ CHAPTER 5 Displaying XML Using CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Introduction to CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Why CSS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 CSS Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Styling XHTML Documents with CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Styling XML Documents with CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Attaching the Stylesheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Layout of XML with CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Understanding the W3C Box Model . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Positioning in CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Slide 9: viii ■CONTENTS Displaying Tabular Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Working with Display Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Working with Floating Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Table Row Spans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Linking Between Displayed XML Documents . . . . . . . . . . . . . . . . . . . . . . 154 XLink in Netscape and Firefox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Forcing Links Using the HTML Namespace . . . . . . . . . . . . . . . . . . . 157 Adding Images in XML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Adding Images with Netscape and Firefox . . . . . . . . . . . . . . . . . . . . 158 Using CSS to Add an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Using CSS to Add Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Working with Attribute Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Using Attributes in Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Using Attribute Values in Documents . . . . . . . . . . . . . . . . . . . . . . . . 164 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 ■ CHAPTER 6 Introduction to XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Browser Support for XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Using XSLT to Create Headers and Footers . . . . . . . . . . . . . . . . . . . . . . . . 170 Understanding XHTML, XSLT, and Namespaces . . . . . . . . . . . . . . . 172 Creating the XSLT Stylesheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Understanding the Stylesheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Transforming the <body> Element . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Applying the Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Adding the Footer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Transformation Without Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Creating a Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Selecting Each Planet with <xsl:for-each> . . . . . . . . . . . . . . . . . . . 179 Adding a New Planet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Presenting XML with XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Moving from XHTML to XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Styling the XML with XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Removing Content with XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Understanding the Role of XPath in XSLT . . . . . . . . . . . . . . . . . . . . . 185 Including Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Importing Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Including Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Tools for XSLT Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Slide 10: ■CONTENTS ix ■ CHAPTER 7 Advanced Client-Side XSLT Techniques . . . . . . . . . . . . . . . . . 191 Sorting Data Within an XML Document . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Sorting Dynamically with JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Adding Extension Functions (Internet Explorer) . . . . . . . . . . . . . . . . . . . . . 203 Understanding More About Namespaces . . . . . . . . . . . . . . . . . . . . . 205 Adding Extension Functions to the Stylesheet . . . . . . . . . . . . . . . . . 206 Providing Support for Browsers Other Than IE . . . . . . . . . . . . . . . . 209 Working with Named Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Generating JavaScript with XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Understanding XSLT Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Understanding White Space and Modes . . . . . . . . . . . . . . . . . . . . . . 215 Working Through the onelinehtml Template . . . . . . . . . . . . . . . . . . . 217 Finishing Off the Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Generating JavaScript in Mozilla . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 XSLT Tips and Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Dealing with White Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Using HTML Entities in XSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Checking Browser Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Building on What Others Have Done . . . . . . . . . . . . . . . . . . . . . . . . . 223 Understanding the Best Uses for XSLT . . . . . . . . . . . . . . . . . . . . . . . 223 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 ■ CHAPTER 8 Scripting in the Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 The W3C XML DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Understanding Key DOM Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Examining Extra Functionality in MSXML . . . . . . . . . . . . . . . . . . . . . 238 Browser Support for the W3C DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Using the xDOM Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 xDOM Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Using JavaScript with the DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Creating DOM Document Objects and Loading XML . . . . . . . . . . . 247 XSLT Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Extracting Raw XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Manipulating the DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Putting It into Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Understanding the Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Examining the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Dealing with Large XML Documents . . . . . . . . . . . . . . . . . . . . . . . . . 262 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Slide 11: x ■CONTENTS ■ CHAPTER 9 The Ajax Approach to Browser Scripting . . . . . . . . . . . . . . . . 265 Understanding Ajax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Explaining the Role of Ajax Components . . . . . . . . . . . . . . . . . . . . . 266 Understanding the XMLHttpRequest Object . . . . . . . . . . . . . . . . . . . 267 Putting It Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Username Validation with the XMLHttpRequest Object . . . . . . . . . 276 Contacts Address Book Using an Ajax Approach . . . . . . . . . . . . . . . 279 Using Cross-Browser Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Sarissa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Other Ajax Frameworks and Toolkits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Backbase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Bindows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Dojo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Interactive Website Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 qooxdoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Criticisms of Ajax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Providing Visual Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Updating the Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Preloading Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Providing Links to State and Enabling the Back Button . . . . . . . . . 289 Ajax Best Practices and Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . 289 Minimizing Server Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Using Standard Interface Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Using Wrappers or Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Using Ajax Appropriately . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 ■ CHAPTER 10 Using Flash to Display XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 The XML Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Loading an XML Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Understanding the XML Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Understanding the XMLNode Class . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Loading and Displaying XML Content in Flash . . . . . . . . . . . . . . . . . . . . . 301 Updating XML Content in Flash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Sending XML Content from Flash . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Slide 12: ■CONTENTS xi Using the XMLConnector Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Loading an XML Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Data Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Updating XML Content with Data Components . . . . . . . . . . . . . . . . 315 Understanding Flash Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 ■ CHAPTER 11 Introduction to Server-Side XML . . . . . . . . . . . . . . . . . . . . . . . . . 317 Server-Side vs. Client-Side XML Processing . . . . . . . . . . . . . . . . . . . . . . . 317 Server-Side Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Working Through Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 The XML Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Transforming the XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Adding a New DVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Modifying an Existing DVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Deleting a DVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 ■ CHAPTER 12 Case Study: Using .NET for an XML Application . . . . . . . . . 349 Understanding the Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Setting Up the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 Understanding the Components of the News Application . . . . . . . 352 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 ■ CHAPTER 13 Case Study: Using PHP for an XML Application . . . . . . . . . 381 Understanding the Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Setting Up the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Understanding Components of the Weather Portal Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 ■ INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Slide 14: About the Author ■SAS JACOBS is a web developer who set up her own business, Anything Is Possible, in 1994, working in the areas of web development, IT training, and technical writing. The business works with large and small clients building web applications with .NET, Flash, XML, and databases. Sas has spoken at such conferences as Flashforward, webDU (previously known as MXDU), and FlashKit on topics related to XML and dynamic content in Flash. In her spare time, Sas is passionate about traveling, photography, running, and enjoying life. xiii
Slide 16: About the Technical Reviewer ■ALLAN KENT is a born-and-bred South African and still lives and works in Cape Town. He has been programming in various and on diverse platforms for more than 20 years. He is currently the head of technology at Saatchi & Saatchi Cape Town. xv
Slide 18: Acknowledgments Iwant to thank everyone at Apress for their help, support, and advice during the writing of this book. Thanks also to my family who has provided much support and love throughout the process. xvii
Slide 20: Introduction This books aims to provide a “one-stop shop” for developers who want to learn how to build Extensible Markup Language (XML) web applications. It explains XML and its role in the web development world. The book also introduces specific XML vocabularies and related XML recommendations. I wrote the book for web developers at all levels. For those developers unfamiliar with XML applications, the book provides a great starting point and introduces some important client- and server-side techniques. More experienced developers can benefit from exposure to important coding techniques and understanding the workflow involved in creating XML applications. The book starts with an explanation of XML and introduces the different components of an XML document. It then shows some related recommendations, including Document Type Definitions (DTDs), XML schema, Cascading Style Sheets (CSS), Extensible Stylesheet Language Transformations (XSLT), XPath, XLink, and XPointer. I cover some common XML vocabularies, such as Extensible HyperText Markup Language (XHTML), Mathematical Markup Language (MathML), and Scalable Vector Graphics (SVG). The middle section of the book deals with client-side XML applications and shows how to display and transform XML documents with CSS and XSLT. This section also explores how the current web browsers support XML, and it covers how to use JavaScript to work with XML documents. In this section, I also provide an introduction to the Asynchronous JavaScript and XML (Ajax) approach. The book finishes by examining how to work with XML on the server. It covers two serverside languages: PHP 5 and .NET 2.0. The last chapters of the book deconstruct two XML applications: a News application and a Community Weather Portal application. The book includes lots of practical examples that developers can incorporate in their daily work. You can download the code samples from the Source Code area of the Apress web site at http://www.apress.com. I hope you find this book an invaluable reference to XML and that, through it, you see the incredible power and flexibility that XML offers to web developers. xix
Slide 22: CHAPTER 1 Introduction to XML This chapter introduces you to Extensible Markup Language (XML) and explains some of its basic concepts. It’s an ideal place to start if you’re completely new to XML. The concepts that I introduce here are covered in more detail later in the book. Web developers familiar with Extensible HyperText Markup Language (XHTML) are often unsure about its relationship with XML; it’s not always clear why they might need to learn about XML as well. Be assured that both technologies are important for developers. XML is a metalanguage used for writing other languages, called XML vocabularies. XHTML is one of those vocabularies, so when you understand XML, you’ll also understand the rules underpinning XHTML. XHTML is HTML that conforms to XML rules, and you’ll find out more about this shortly. XHTML has a number of limitations. It’s good at structuring and displaying information in web browsers, but its primary purpose is not to mark up data. XHTML can’t carry out advanced functions such as sorting and filtering content. You can’t create your own tags to describe the contents of an XHTML document. The fixed XHTML tags usually don’t bear any relationship to the type of content that they contain. For example, a paragraph tag is a generic container for any type of content. XML addresses all of the limitations evident in HTML. It provides more flexibility than XHTML, as it works in concert with other standards that assist with presentation, organization, transformation, and navigation. XML documents are self-describing; their document structures can use descriptive tags to identify the content that they mark up. I’ll cover these points in more detail within this chapter. I’ll explain more about XML and show why you might want to use it in your work. The chapter will cover: • A definition and a short history of XML • A discussion of how to write XML documents • Information about the processing of XML content When you finish this chapter, you should have a good understanding of XML and see where you might be able to use it in your work. I’ll start by explaining exactly what XML is and where it fits into the world of web development. 1
Slide 23: 2 CHAPTER 1 ■ INTRODUCTION TO XML What Is XML? The first and most important point about XML is that it’s not a language itself. Rather, it’s a metalanguage used for constructing other languages or vocabularies. XML describes the rules for how to create these vocabularies. Each language is likely to be different, but all use tags to mark up content. The choice of tag names and their structures are flexible, and it’s common for groups to agree on standard XML vocabularies so that they can share information. An example of an XML language is XHTML. XHTML describes a standard set of tags that you must use in a specific way. Each XHTML page contains two sections described by the <head> and <body> tags. Each of those sections can include only certain tags. For example, it’s not possible to include <meta> tags in the <body> section. Web developers around the world share the same standardized approach, and web browsers understand how to render XHTML tags. XML is a recommendation of the World Wide Web Consortium (W3C), making it a standard that is free to use. The W3C provides a more formal definition of XML in its glossary at http://www.w3.org/TR/DOM-Level-2-Core/glossary.html: Extensible Markup Language (XML) is an extremely simple dialect of SGML. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. A Brief History of XML XML came into being in 1998 and is based on Standard Generalized Markup Language (SGML). SGML is an international standard that you can think of as a language for defining other languages that mark up documents. HTML was based on SGML. One of the key points about SGML is that it’s difficult to use. XML aims to be much easier. XML also owes much of its existence to HTML. HTML focused on the display of content; you couldn’t use it for more advanced features such as sorting and filtering. HTML wasn’t a very precise language, and it wasn’t case-sensitive. It was possible to write incorrect HTML content but for a browser to display the page correctly. XML addresses many of the shortcomings found in HTML. In 1999, HTML was rewritten using the XML language construction rules as XHTML. The rules for construction of an XHTML document are more precise than those for HTML. The strictness with which these rules are enforced depends on which Document Type Declaration (DOCTYPE) you assign to the XHTML page. I’ll explain more about DOCTYPEs in Chapter 3. Since 1998, it’s been clear that XML is a very powerful approach to managing information. XML documents allow for the sharing of data. A range of related W3C recommendations address the transformation, display, and navigation within XML documents. You’ll find out more about these recommendations in Chapter 2.
Slide 24: CHAPTER 1 ■ INTRODUCTION TO XML 3 Let’s summarize the key points: • XML isn’t a language; its rules are used to construct other languages. • XML creates tag-based languages that mark up content. • XHTML is one of the languages created by XML as a reformulation of HTML. • XML is based on SGML. The Goals of XML After the complexity of SGML, the W3C was very clear about its goals for XML. You can view these goals at http://www.w3.org/TR/REC-xml/#sec-origin-goals: 1. XML shall be straightforwardly usable over the Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process XML documents. 5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. 6. XML documents should be human-legible and reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness in XML markup is of minimal importance. A few things about these goals are worth noting. First, the W3C wants XML to be straightforward; in fact, several of the goals include the terms “easy” and “clear.” Second, the W3C has given XML two targets: humans and XML processors. An XML processor or parser is a software package that processes an XML document. Processors can identify the contents of an XML document; read, write, and change an existing document; or create a new one from scratch. The aim is to open up the market for XML processors by keeping them simple to develop. Stricter construction rules mean that less processing is required. This in turn means that the targets for XML documents can be portable devices, such as mobile phones and PDAs. By keeping documents human-readable, you can access data more readily, and you can build and debug applications more easily. The use of Unicode allows developers to create XML documents in a variety of languages. Unfortunately, a necessary side effect is that XML documents can be verbose, and describing data using XML can be a longer process than using other methods.
Slide 25: 4 CHAPTER 1 ■ INTRODUCTION TO XML UNICODE XML supports the Unicode character set to enable multilanguage support. Unicode provides support for 231 characters. It includes every character you’re likely to need, as well as many that you’ll never see. You can use 8-bit Unicode Transformation Format (UTF-8) to encode Unicode characters so that the characters use the same codes as they do in ASCII. Obviously, this provides good compatibility with older systems. Languages such as Japanese and Chinese need UTF-16 encoding. You can find out more about Unicode at http://www.unicode.org. Third, note the term XML document. This term is broader than the traditional view of a physical document. Some XML documents exist in physical form, but others are created as a stream of information following XML construction rules. Examples include web services and calls to databases where the content is returned in XML format. Now that you understand what XML is, let’s delve into the rules for constructing XML languages. Understanding XML Syntax XML languages use tags to mark up text. As a web developer, you’re probably familiar with the concept of marking up text: <p>Here is an introduction to XML.</p> The previous line is XHTML, but it’s also XML. In XHTML, you know that the <p> tag indicates a paragraph of text. All of the tags within XHTML have predefined meanings. XML allows you to construct your own tags, so you could rewrite the previous markup as: <intro>Here is an introduction to XML.</intro> In this example, the <intro> tag tells you the purpose of the text that it marks up. One big advantage of XML is that tags can describe their content—that’s why XML languages are often called self-describing. XML is flexible enough to allow for the creation of many different types of languages to describe data. The only constraint on XML vocabularies is that they be well-formed. Well-Formed Documents XML documents are well-formed if they meet the following criteria: • The document contains one or more elements. • The document contains a single document element, which may contain other elements. • Each element closes correctly. • Elements are case-sensitive. • Attribute values are enclosed in quotation marks and cannot be empty.
Slide 26: CHAPTER 1 ■ INTRODUCTION TO XML 5 I’ll describe all of these criteria throughout this chapter, but it’s worthwhile highlighting some points now. XML languages are case-sensitive; this means that the tag <intro> is not the same as <Intro> or <INTRO>. In XML, these are three different tags. Prior to the days of XHTML, HTML was case-insensitive, so <body> and <BODY> were equivalent tags. All XML tags need to have an equivalent closing tag written in the same case as the opening tag. So the <intro> tag must have a matching </intro> tag. If no content exists between the opening and closing tags, you can abbreviate it into a single tag, <intro/>. Again, contrast this with HTML, where it was possible to write a single <p> tag to add a paragraph break. The order of tags is important in XML. Tags that are opened first must close last: <chapter><intro>Here is an introduction to XML.</intro></chapter> HTML pages had no such requirement. The following would have been correct in HTML, although unacceptable in XML: <p><strong>Paragraph text</p></strong> In XML, attributes always use quotation marks around their values: <intro type="chapter"> It doesn’t matter whether these are single or double quotation marks, but they must be present. This wasn’t a requirement in HTML. Similarly, some HTML attributes, such as the nowrap attribute in a <td> tag, didn’t need to contain an attribute name and value pair: <td nowrap>A table cell</td> This type of tag construction isn’t possible in XML. You must replace it with something like this: <td nowrap="true">A table cell</td> Understanding the Difference Between Tags and Elements You may have noticed that I’ve used the terms tag and element when talking about XML documents. At first glance, they seem interchangeable, but there’s a difference between the terms. The term element describes opening and closing tags as well as any content. A tag is one part of an element. Tags start with an opening angle bracket and end with a closing angle bracket. Elements usually contain both an opening and closing tag as well as the content between. The following line shows a complete element that contains the <intro> tag. <intro>Here is an introduction to XML.</intro> Now that you understand the construction rules, it’s time to look at a complete XML document.
Slide 27: 6 CHAPTER 1 ■ INTRODUCTION TO XML Viewing a Complete XML Document A complete piece of XML is referred to as a document. It doesn’t matter whether you’re dealing with XML that marks up text, information requested from a server, or records received from a database—all of these are documents. Each XML document is made up of markup and character data. In general, the character data comprises the text between a start tag and an end tag, and everything else is markup. You can further divide markup into elements, attributes, text, entities, comments, character data (CDATA), and processing instructions. The following document illustrates the different parts of an XML document. You can download it, along with the other resource files, from the Source Code area of the Apress web site (http://www.apress.com). The document, called dvd.xml, describes the contents of a small DVD library: <?xml version="1.0" encoding="UTF-8"?> <!-- This XML document describes a DVD library --> <library> <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> <DVD id="2"> <title>Contact</title> <format>Movie</format> <genre>Science fiction</genre> </DVD> <DVD id="3"> <title>Little Britain</title> <format>TV Series</format> <genre>Comedy</genre> </DVD> </library> I’ll walk you through each part of the document. The document starts with an XML declaration: <?xml version="1.0" encoding="UTF-8"?> This declaration is optional and can contain a number of attributes, as you’ll see shortly.
Slide 28: CHAPTER 1 ■ INTRODUCTION TO XML 7 This XML document also includes a comment describing its purpose: <!-- This XML document describes a DVD library --> I’ve added this comment as a guide for anyone reading the XML document. As with XHTML, developers normally use comments to add notations. The document or root element is called <library>. You’ll notice that all elements within the document appear between the opening and closing <library> tags. The document element contains a number of <DVD> elements, and each <DVD> element contains <title>, <format>, and <genre> elements. The <DVD> element also contains an id attribute: <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> The <title>, <format>, and <genre> elements each contain text. You can understand the structure and the contents of this document easily by looking at the tag names. It’s obvious, even without the comment, that this document describes a list of DVDs. You can also easily infer the relationship between all of the elements from the document. Understanding the Structure of an XML Document Each XML document is divided into two parts: the prolog and the document or root element. The prolog appears at the top of the XML document and contains information about the document. It’s a little like the <head> section of an XHTML document. In the XML document example, the prolog includes an XML declaration and a comment. It can also include other elements, such as processing instructions or a Document Type Definition (DTD). You’ll find out more about these later in the “Processing Instructions” and “DTDs and XML Schemas” sections. Well-formed XML documents must have a single document element that may optionally include other content. Any content within an XML document must appear within the document or root element. In the example XML document, the document element is <library>, and it contains all of the other elements. You might wonder about the names that I’ve chosen for the elements within the XML document. You’re free to use any name for elements and attributes, providing that they conform to the rules for XML names. Figure 1-1 shows the structure of an XML document.
Slide 29: 8 CHAPTER 1 ■ INTRODUCTION TO XML Figure 1-1. The structure of an XML document Naming Rules in XML Elements, attributes, and some other constructs have names within XML documents. A name is made up of a starting character followed by name characters. Don’t forget that XML names are case-sensitive. The starting character must be a letter or underscore; it can’t be a number. The name characters can include just about any other character except a space or a colon. Colons indicate namespaces in XML, so you shouldn’t include them within your names. You’ll learn more about namespaces in Chapter 2. To be sure that you’re using legal characters, it’s best to restrict yourself to the uppercase and lowercase letters of the Roman alphabet, numbers, and punctuation, excluding the colon.
Slide 30: CHAPTER 1 ■ INTRODUCTION TO XML 9 If you’re authoring your own XML content as opposed to generating it automatically, it’s probably a good idea to adopt a standardized naming convention. You should also use descriptive names. I prefer to write in CamelCase and start with a lowercase letter, unless the element name is capitalized normally: <camelCaseElementName>Here is an element name</camelCaseElementName> I tend to avoid using underscore characters in my names because I think it makes them harder to read. The use of descriptive names makes it easier for humans to interpret the content. Imagine the difficulty you’d have with this: <zyxtr>Some content</zyxtr> Let’s summarize the rules for XML names: • XML names cannot start with a number or punctuation. • XML names cannot include spaces. • Don’t include a colon in a name unless it indicates a namespace. • XML names are case-sensitive. I’ll describe the contents of an XML document in more detail. I’ll start by showing you the elements that can appear in the prolog. Understanding the XML Document Prolog The prolog of an XML document contains metainformation about the document rather than document content. It may contain the XML declaration, processing instructions, comments, and an embedded DTD or schema. The XML Declaration XML documents usually start with an XML declaration, although this is optional: <?xml version="1.0" encoding="UTF-8"?> It’s a good idea to include the declaration because it tells an application or a human to expect XML content within the document. It also provides processors with additional information about the document, such as the character-encoding type. If you include the XML declaration, it must appear on the first line of the XML document. Nothing can precede an XML declaration—not even white space. If you accidentally include white space before the declaration, XML processors won’t be able to parse the content of the XML document correctly and will generate an error message. The XML declaration may also include attributes that provide information about the version, encoding, and whether the document is standalone: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Slide 31: 10 CHAPTER 1 ■ INTRODUCTION TO XML At the time of writing, the current XML version is 1.1. However, many processors don’t recognize this version, so it’s best to stick with a version 1.0 declaration for backward compatibility. The encoding attribute describes the character set for the XML document. If you don’t include an encoding attribute, it’s assumed that the document uses UTF-8 encoding. The standalone attribute can have either the values yes or no. The value indicates whether external files are required to process the XML document correctly. Each of the attributes in the XML declaration is optional, but the order is important. If you choose to include an encoding attribute, it must appear after the version attribute. The standalone attribute must appear as the last attribute in the declaration. Processing Instructions The prolog can also include processing instructions (PIs) that pass information about the XML document to other applications. The XML processor doesn’t process PIs, but rather passes them on to the application unchanged. PIs start with the characters <? and finish with ?>. They usually appear in the prolog, although they can appear in other places within an XML document. ■ Note An XML declaration also starts with the characters <?xml. Even though the XML declaration looks similar, it’s worth remembering that it’s quite different from a PI. The following PI indicates a reference to an XSL stylesheet: <?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?> The first item in a PI is a name, called the PI target. The preceding PI has the name xml-stylesheet. Names that start with xml are reserved for XML-specific PIs. The PI also has the text string type="text/xsl" href="stylesheet.xsl". Although this looks like two attributes, the content isn’t treated that way. You’ll see more examples of stylesheet PIs in Chapters 6 and 7. Comments Comments can appear almost anywhere in an XML document. The example XML document included a comment in the prolog, so let’s look at comments with the other prolog contents. XML comments look the same as XHTML comments. They begin with the characters <!-- and end with -->: <!-- Here is a comment --> Comments don’t affect the processing of an XML document. They’re normally intended for human readers. If you add a comment, you must be aware of the following rules:
Slide 32: CHAPTER 1 ■ INTRODUCTION TO XML 11 • A comment may not contain the text -->. • A comment may not be included within tag names. • A comment should not hide either the opening or closing tags in an element. • An XML processor isn’t obliged to pass a comment to an application, although most do. DTDs and XML Schemas DTDs and XML schemas provide rules about which elements and attributes can appear within the XML document. In other words, they specify which elements and attributes are valid and which are required or optional. The prolog can include declarations about the XML document, a reference to an external DTD or schema, or both. I’ll explain more about DTDs and schemas in Chapter 2. Understanding Sections Within the XML Document Element The data within an XML document is stored within the document or root element. This element contains all other elements, attributes, text, and CDATA within the document and may also include entities and comments. Elements Elements serve many purposes in an XML document. They • Mark up content • Provide a description of the content they mark up • Provide information about the order of data and its relative importance • Show the relationships between data Elements include a starting and ending tag as well as content. The content can be text, child elements, or both text and elements. The starting tag for an element can also contain attributes. You can position comments inside elements. In the earlier example, you saw the following structure within the <DVD> element: <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> The opening <DVD> tag contains an id attribute and includes three other elements: <title>, <format>, and <genre>. Each of these elements contains text. You saw earlier that it’s necessary to open and close tags in the correct order. It would be wrong to write the following:
Slide 33: 12 CHAPTER 1 ■ INTRODUCTION TO XML <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</DVD> </genre> There are four types of elements: • Empty elements • Elements containing only text • Elements containing only child elements • Elements containing a mixture of child elements and text, or mixed elements You’ll see how important it is to distinguish between these different types when I cover XML schemas in Chapter 2. Empty Elements If an element doesn’t contain any text, it’s an empty element, and you can write it in two different ways. The following code shows two equivalent examples: <elementName></elementName> <elementName/> The tag in the second line uses the shortened form that adds a forward slash at the end before the closing angle bracket. The XHTML <br/> tag is another example of an empty element. Using the empty element syntax can save file size and improve legibility. Elements Containing Only Text Some elements only contain text content. You’ll recall from the previous example that the <title>, <format>, and <genre> elements contain only text: <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> Elements Containing Other Elements It’s possible for an element to contain only other elements. The container element is called the parent, while the elements contained inside are the child elements. The <DVD> element is an example of an element that contains child elements: <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> The family analogy is often used when describing element structures in XML.
Slide 34: CHAPTER 1 ■ INTRODUCTION TO XML 13 Mixed Elements Mixed elements contain both text and child elements. The DVD example doesn’t include any of these types of elements, but the following code block shows a mixed element: <mixedElement>This element contains both text and child elements <childElement>This element contains text</childElement> <emptyElement/> </mixedElement> To summarize, elements have the following requirements: • Elements must contain starting and ending tags, unless there is no content, in which case you can use the shorthand form. • The tag names must obey the XML naming rules. • Elements must be nested correctly. Attributes Another way to provide information in XML documents is by using attributes within the opening tag of an element. Attributes normally provide additional information about the element that they modify. There is no limit to the number of attributes that can appear inside an element. Attributes consist of name and value pairs, with the value enclosed in either double or single quotation marks: <elementName attributeName="attributeValue"/> Attributes provide additional information about an element: <p style="text-align:center;">Introduction to XML</p> In this case, the data Introduction to XML is enclosed in a <p> element. This element tells a web browser to display the information in a separate paragraph. The style attribute provides additional information about how to display the data. Here, you’re telling the browser to center the text. Two common uses of attributes are to convey formatting information and to indicate the use of a specific format or encoding. For example, you could convey a date as <Date Format="mmddyyyy">06081955</Date> or indicate use of an International Organization for Standardization (ISO) date format using <Date Code="ISO8601">1955-06-08</Date> When an element contains an attribute, it’s said to be a complex type element. As you’ll see later, this is important when writing XML schema documents. You can use either a pair of double or single quotes for different attributes within the same element: <elementName att1="value1" att2='value2'>Here is an element</elementName>
Slide 35: 14 CHAPTER 1 ■ INTRODUCTION TO XML Make sure you don’t include one of each in a single attribute, or the document won’t be well formed. ■ Caution Be careful when cutting and pasting attributes from a word-processing document into an XML document. Word processors often use smart quotes, which cause an error in an XML document. You can also write an attribute as a nested child element. For example, you could rewrite the <DVD> element <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> as <DVD> <id>1</id> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> There’s no clear rule about which is the better option. Both alternatives are acceptable. Let’s summarize the rules relating to attributes: • An attribute is made up of a name/value pair. • You must enclose the attribute value in single or double quotes. • Attributes cannot contain an XML tag. • Attribute names must follow the XML naming rules. Text All text within an XML document is contained inside opening and closing tags. Unless you mark the text as CDATA, it will be treated as if it were XML and processed accordingly. This means an opening angle bracket will be treated as if it were part of an XML tag. If you want to use reserved characters within text, you must rewrite them as character entities. For example, you can write the left angle bracket < as &lt;. You can also embed the reserved characters within CDATA.
Slide 36: CHAPTER 1 ■ INTRODUCTION TO XML 15 CDATA Sections CDATA allows you to mark blocks of text so that they’re not processed as XML. As I mentioned before, this is useful for text that contains reserved XML characters: <title><!CDATA[ Why 9 is < 10 ]]</title> This CDATA section starts with <!CDATA[ and ends with ]]. The character data is contained within the opening and closing square brackets. Obviously, the string ]] can’t appear within a CDATA section. You can use CDATA sections in XML documents for embedding code, such as JavaScript, and for adding content that doesn’t need processing. For example, an application that reads data from a database and marks it up in XML might embed all content in CDATA sections to avoid the need to process the reserved characters explicitly. I’ll show you an example of using CDATA with JavaScript in Chapter 3. Entities Character entities are symbols that represent a single character. In XHTML, character entities are used for special symbols such as an ampersand (&amp;) and a nonbreaking space (&nbsp;). You can use character entities to replace the reserved characters in XML documents. All tags start with a left angle bracket, so it isn’t possible to include this character in the text within an element: <expression>10 < 25</expression> If you try to process this element, the presence of the left angle bracket before the text 25 causes a processing error. Instead, you could replace this symbol with the entity &lt;: <expression>10 &lt; 25</expression> You need to consider the following reserved characters: • <, which indicates the start of a tag name • &, which indicates the first character of an entity • xml, which is reserved for referring to parts of the XML language, such as xmlstylesheet Table 1-1 summarizes the character entities that you need to use. Table 1-1. Character Entities Used in XML Documents Character & ' > < " Entity &amp; &apos; &gt; &lt; &quot;
Slide 37: 16 CHAPTER 1 ■ INTRODUCTION TO XML Sometimes you can’t include a literal character in an XML document, perhaps because the character doesn’t exist on a keyboard or because it’s a graphic character. Instead, you can add these as character entities using Unicode or hexadecimal numbers. For example, you can encode the copyright symbol © as &#169; or &#xA9;. If the reference starts with &# and ends with a semicolon, it’s a character reference. The number between is the Unicode code for the character required. If the code is written as a hexadecimal, then it’s prefixed with the character x. You can also define your own entities. For example, you could define the reference &copyright; to mean Copyright 2006 Apress. Each time you want to include this text in the XML document, you could use the entity reference &copyright;. This makes the text easier to manage and update. Let’s move on to look at the processing of XML documents. The XML Processing Model The XML recommendation assumes that an XML document will be processed in a particular way. The model indicates that an XML processor passes the content and structure of the XML document to an application. XML processors are usually called XML parsers, as they parse the XML document; see Figure 1-2. Figure 1-2. The XML document-processing model Common XML processors include Microsoft XML Parser (MSXML), Apache Xerces2, and the Oracle XML parser. You can write an application that uses any of these parsers. Some XML parsers are also available as prepackaged software that install automatically. Extensible Stylesheet Language Transformations (XSLT) processors used to display XML in a web browser fall into this category. MSXML contains both an XML parser and an XSLT processor, and is both an XML processor and an application. It installs automatically with Internet Explorer and other Microsoft software.
Slide 38: CHAPTER 1 ■ INTRODUCTION TO XML 17 XML Processing Types There are two categories of XML processing: tree-based and event-based. Many XML parsers, including later versions of MSXML, support both models. You’ll often hear tree-based parsers referred to as Document Object Model (DOM) parsers, while event-based parsers are referred to as Simple API for XML (SAX) parsers. Both are named after the specifications they support. The DOM is a W3C recommendation that provides an application programming interface (API) to an XML document. Any application can use this API to manipulate an XML document, read information, add new nodes, and edit the existing content. You can find out more about this recommendation at http://www.w3.org/TR/REC-DOM-Level-1/. SAX is not a W3C recommendation, but it does enjoy support from both large and small software companies. A SAX-based parser reads an XML document sequentially, firing off events as it reaches important parts of the document, such as the start or end of an element. You can find out more at http://www.saxproject.org/. DOM Parsing Figure 1-3 shows the dvd.xml document that you’ve been working with represented as a tree structure. Figure 1-3. The dvd.xml document shown as a tree structure Displaying the document in this way reinforces the relationship between the elements, as in a family tree. The <library> element is the parent of the <DVD> element and the grandparent of the <title>, <format>, and <genre> elements. The <DVD> elements are siblings and have the <library> element as a parent or ancestor. The <title>, <format>, and <genre> elements are descendants of the <library> element. DOM parsing allows access to these elements, their values, and all other parts of an XML document through either a programming language or a scripting language such as JavaScript. SAX Parsing A SAX-based parser presents an XML document as a string of events. You must write handlers for each event so that something suitable occurs when the event triggers the handler. This type of parsing works well with languages that have good event-handling properties. For instance, SAX parsing is used extensively with Java. It’s less suitable for the scripting languages often employed on the web, so I don’t cover it in detail here.
Slide 39: 18 CHAPTER 1 ■ INTRODUCTION TO XML Why Have Two Processing Models? Both processing models offer advantages. DOM-based parsing provides full read-write access to an XML document, and you can traverse the document tree to access nodes within the document. It can also validate a document against a DTD or XML schema to determine that the document is valid. However, DOM-based parsing must read the full XML document into memory, so DOM parsing can be slow and memory-intensive when working with large XML documents. It’s difficult to determine exactly what constitutes a large XML document, because processing time depends on computing power, memory, time available, and whether it’s working in a singleuser environment or a multiuser environment such as a web server. As a rule, most systems cope with documents up to tens of megabytes in size, but you need to take care with files above this size. The SAX-based model, on the other hand, is sequential in operation. Once a node has been processed, it is discarded and cannot be processed again. The whole document isn’t loaded into memory at once, so you can avoid problems associated with processing large XML documents. This method of processing puts the onus on you to store any information from the XML document that might be required later. SAX is ideal, for example, as an intermediate routing product in a communications system. An incoming XML document is likely to consist of a small routing header and a larger document for delivery to the end point. Using SAX, a routing device can read the routing information and ignore the document, as the document is irrelevant to its delivery. A DOMbased parser, however, must parse the complete document to be able to deliver it to its ultimate destination. Some XML Tools Developers commonly want to know what tools are available for working with XML documents. There are so many tools available, both as freeware and for purchase, that it’s impossible to summarize them all here. Your choice of tool is likely to be a matter of personal preference. In general, XML development tools fall into several categories: • Extensions to existing programmers’ IDEs • XML-specific IDEs • Individual tools Tools such as Microsoft Visual Studio (http://msdn.microsoft.com/vstudio/) fall into the first category. They have good XML support aimed specifically at developers. At the time of writing, the latest version is Visual Studio 2005 and includes the following features: • It helps you create and edit XML documents, including checking whether a document is well formed. • It offers XML schema support, including the ability to infer a schema from an instance document, validation of documents, and conversion from a DTD. • It offers XSLT support, including the ability to view the results of a transformation.
Slide 40: CHAPTER 1 ■ INTRODUCTION TO XML 19 The dedicated XML IDEs tend to cover similar ground and differ in the depth of their support and their user interfaces. Most of these tools have an XML editor, tools for creating DTDs and XML schemas, and support for XSLT development. Several such tools are available, including this small sample of common ones: • Altova’s XML Suite: http://www.altova.com/suite.html • TIBCO Software’s suite of XML tools: http://www.tibco.com/software/ business_integration/xml_tools.jsp • DataDirect Technologies’ Stylus Studio: http://www.stylusstudio.com/ Many of the suites mentioned include individual tools that you can use for editing XML documents. These include • Altova’s XMLSpy: http://www.altova.com/products_ide.html • Blast Radius’ XMetal: http://www.xmetal.com/index.x?products/xmetal/ • SyncRO Soft’s <oXygen/>: http://www.oxygenxml.com// There are many other excellent tools available that I haven’t mentioned here. You can find out more by searching the Internet or subscribing to mailing lists such as XML-DEV (http://xml.org/xml/xmldev.shtml). Summary In this chapter, you’ve been introduced to some of the basic concepts relating to XML. I’ve covered XML syntax in some detail, and I’ve shown you the benefits that XML provides for web developers. I’ve also shown you some of the tools that you can use to work with XML documents. In Chapter 2, I’ll show you some of the related XML recommendations. You’ll learn how to work with DTDs and XML schemas. You’ll also find a brief introduction to XSLT, XPath, XLinks, and XPointer.
Slide 42: CHAPTER 2 Related XML Recommendations In the previous chapter, you learned about XML documents and their rules for construction. XML is one in a set of related recommendations from the World Wide Web Consortium (W3C). In this chapter, I’ll show you some of the recommendations that you’re likely to encounter when working with XML applications. Specifically, I’ll discuss • The role of namespaces in XML • Defining XML vocabularies with Document Type Definitions (DTDs) and XML schemas • Displaying XML with XSLT • Navigating XML documents using XPath • Linking to XML documents with XLink and XPointer You can download the files referred to in this chapter from the Source Code area of the Apress web site (http://www.apress.com). Let’s start by looking at the importance of namespaces when working with XML documents. Understanding the Role of XML Namespaces XML documents allow you to create your own vocabularies of elements and attributes to describe data. As XML documents become more complex or draw content from other sources, it’s possible that you’ll want to use more than one vocabulary in the same document, and that the same element name will appear in both vocabularies with different meanings. For example, say you want to produce a furniture catalog that contains some embedded XHTML information: <?xml version="1.0" encoding="UTF-8"?> <catalog> <table> <size> <length>2.0</length> <width>0.9</width> 21
Slide 43: 22 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS <height>1.2</height> </size> <description> <table> <tr> <td>This is a lovely table</td> <td>And this is a picture of it</td> </tr> </table> </description> </table> </catalog> In this XML document, the two elements called <table> have completely different meanings. Namespaces allow you to show which elements belong to which vocabulary. You can identify each vocabulary with a unique prefix that you then apply to elements in the XML document: <?xml version="1.0" encoding="UTF-8"?> <cat:catalog> <cat:table> <cat:size> <cat:length>2.0</cat:length> <cat:width>0.9</cat:width> <cat:height>1.2</cat:height> </cat:size> <cat:description> <xhtml:table> <xhtml:tr> <xhtml:td>This is a lovely table</xhtml:td> <xhtml:td>And this is a picture of it</xhtml:td> </xhtml:tr> </xhtml:table> </cat:description> </cat:table> </cat:catalog> The prefix you choose isn’t significant, although you can follow some conventions. In the previous example, the first prefix, cat, refers to catalog items. You could equally call this dog or catalog. The second prefix, xhtml, refers to XHTML elements within the document. This is an example of a namespace convention. Namespaces use Uniform Resource Identifiers (URIs) to identify each vocabulary. In the case of the previous XHTML content, the W3C controls the URI because it controls the XHTML standard. However, you can associate the cat prefix with any URI under your control. It’s important to note that the URI doesn’t have to point to an actual document or directory. The only requirement is that it’s unique in the XML document. However, many processors, including XML schema, XHTML, and XSLT processors, use the URI to indicate
Slide 44: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 23 that they must process certain parts of the document. Therefore, you must use the correct URI for these applications. You can find the W3C’s “Namespaces in XML” recommendation at http://www.w3.org/TR/ REC-xml-names/. Adding Namespaces to XML Documents You reference a namespace by adding it as an attribute of any node that contains elements belonging to the namespace. Frequently, you add the namespace to the document element, because it contains all other elements. In the previous XML document, you could rewrite the opening element as follows: <cat:catalog xmlns:cat="http://www.apress.com/ns/furniture" xmlns:xhtml="http://www.w3.org/1999/xhtml"> This determines that the cat namespace refers to the URI http://www.apress.com/ns/ furniture. The cat namespace can precede any element name, providing it is separated by a colon: <cat:catalog> Adding Default Namespaces Quite often, a large portion of an XML document belongs to a single XML vocabulary. In this case, you can define a default namespace instead of repeating the namespace prefix for each element. You can use the xmlns keyword to define a default namespace. If you do this, you don’t need to assign a prefix to elements within this namespace. For example, you can set the catalog namespace as the default namespace: <catalog xmlns="http://www.apress.com/ns/furniture" xmlns:xhtml="http://www.w3.org/1999/xhtml"> Because this is now the default namespace, you don’t need to use a prefix in front of element names from this namespace. You can define a default namespace at any point in the document. When you do this, the default applies to the element containing the namespace declaration and any descendants. The declaration overrides any earlier default declarations. The following XML document shows how to use multiple default namespaces: <?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="http://www.apress.com/ns/furniture" > <table> <size> <length>2.0</length> <width>0.9</width> <height>1.2</height> </size> <description> <table xmlns="http://www.w3.org/1999/xhtml"> <tr>
Slide 45: 24 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS <td>This is a lovely table</td> <td>And this is a picture of it</td> </tr> </table> </description> </table> </catalog> The default catalog namespace applies to all elements except those contained within the second <table> element. Because you added the namespace declaration, the following elements use the XHTML namespace as the default: <table xmlns="http://www.w3.org/1999/xhtml"> <tr> <td>This is a lovely table</td> <td>And this is a picture of it</td> </tr> </table> A final point on namespaces is their use with attributes. By default, an attribute belongs in the same namespace as its containing element. Unless you use an attribute defined in a different namespace from its containing element, it doesn’t need to be qualified. You’ll see the importance of namespaces as I show you how to define XML vocabularies using DTDs and XML schemas. Defining XML Vocabularies Languages based on XML are called vocabularies, and you can define them using a DTD, XML schema, or some other schema language. Many industry groups have come together to define their own XML vocabularies. If you want to use an XML vocabulary, you need to know the rules for its construction. The rules ensure that you can generate valid XML documents that match the language construction criteria. Knowing the rules also allows XML processors to check that the XML document conforms. This process is called validation, and processors that do this are called validating parsers. Chapter 1 provides information about how XML documents are processed. You can share the rules for XML vocabularies by writing a schema. This is a formal description that people or validating parsers can use. If you’re using an XML document for a one-off application, it’s probably overkill to document the vocabulary. The real benefit comes when you want to share the language with other people or applications so that either can check that the document is constructed correctly. There are two common types of schemas: the DTD and the XML schema. The W3C defines and controls both of these. In fact, the DTD is part of the XML recommendation itself. I’ll use the DVD library example from Chapter 1:
Slide 46: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 25 <?xml version="1.0" encoding="UTF-8"?> <!-- This XML document describes a DVD library --> <library> <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> <DVD id="2"> <title>Contact</title> <format>Movie</format> <genre>Science fiction</genre> </DVD> <DVD id="3"> <title>Little Britain</title> <format>TV Series</format> <genre>Comedy</genre> </DVD> </library> Let’s start by looking at how you could construct a DTD to describe this vocabulary. The Document Type Definition A DTD describes the structure of a document. Among other things, it indicates how many times an element can appear, whether it’s optional, and whether it contains attributes. Validating parsers can check an XML document against its DTD to see if it’s valid. If it isn’t, the parser will report an error. An XML document that complies with a DTD is called a document instance of that DTD. This book isn’t intended as a complete reference to DTDs, but it includes enough information so you can understand how to construct a DTD. The following DTD defines the DVD library document: <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT library (DVD+)> <!ELEMENT DVD (title, format, genre)> <!ELEMENT title (#PCDATA)> <!ELEMENT format (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ATTLIST DVD id CDATA #REQUIRED> You’ll find the document saved as dvd.dtd with your resources. I’ve called the XML document that refers to this DTD dvd_dtd.xml, but the name isn’t significant. This DTD shows two types of declarations: one for declaring elements and the other for attributes. You can also add entity and notation declarations. Notation declarations are uncommon, so I’ll cover only entity declarations later in the “Entity Declarations” section.
Slide 47: 26 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS Element Type Declarations An element type declaration gives information about an element. The declaration starts with the !ELEMENT text and lists the element name and contents. The content can be a data type or other elements listed in the DTD: <!ELEMENT elementName (elementContents)> Empty elements show the word EMPTY: <!ELEMENT elementName (EMPTY)> In the sample DTD, the <DVD> element contains three other elements: <title>, <format>, and <genre>: <!ELEMENT DVD (title, format, genre)> The order of these elements dictates the order in which they should appear within an XML document instance. Parsed Character Data (PCDATA) indicates that the element’s content is text, and that an XML parser should parse this text to resolve character and entity references. The <title>, <format>, and <genre> declarations define their content type as PCDATA: <!ELEMENT title (#PCDATA)> <!ELEMENT format (#PCDATA)> <!ELEMENT genre (#PCDATA)> You can use several modifiers to provide more information about child elements. Table 2-1 summarizes these modifiers. Table 2-1. Symbols Used in Element Declarations Within DTDs Symbol , + | () * ? Explanation Specifies the order of child elements. Signifies that an element must appear at least once (i.e., one or more times). Allows a choice between a group of elements. Marks content as a group. Specifies that the element is optional and can appear any number of times (i.e., zero or more times). Specifies that the element is optional, but if it’s present, it can appear only once (i.e., zero or one times). No symbol indicates that an element must appear exactly once. The declaration for the <DVD> element includes a + sign, which indicates that the element must appear at least once, but can appear more often: <!ELEMENT library (DVD+)> fa938d55a4ad028892b226aef3fbf3dd
Slide 48: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 27 Attribute List Declarations Attribute declarations, which appear after element declarations, are a little more complicated. You can indicate that an element has attributes by including an attribute list declaration: <!ATTLIST DVD id CDATA #REQUIRED> In this line, the element <DVD> has a required attribute called id that contains CDATA. ■ Note Setting a required attribute doesn’t affect any of the other element declarations within the DTD. It would be entirely possible to include another child element, also called id, within this element. The most common type of attribute is CDATA, but you can declare other types as well: • ID: a unique identifier • IDREF: the ID of another element • IDREFS: a list of IDs from other elements • NMTOKEN: a valid XML name • NMTOKENS: a list of valid XML names • ENTITY: an entity name • ENTITIES: a list of entity names • LIST: a list of specified values The keyword #REQUIRED indicates that you must include this attribute. You could also use the word #IMPLIED to indicate an optional attribute. Using the word #FIXED implies that you can only use a single value for the attribute. If the XML document doesn’t include the attribute, the validating parser will insert the fixed value. Using a value other than the fixed value generates a parser error. If you need to specify a choice of values for an attribute, you can use the pipe character (|): <!ATTLIST product color (red|green|blue) "red"> This line indicates that the <product> element has a color attribute with possible values of red, green, or blue and a default value of red. Entity Declarations In Chapter 1, you saw how to use the built-in entity types, and I mentioned that you can define your own entities to represent fixed data. For example, you could assign the entity reference &copyright; to the text Copyright 2006 Apress. You’d use the following line to define this as an entity in the DTD: <!ENTITY copyright "Copyright 2006 Apress">
Slide 49: 28 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS This is a simple internal entity declaration. You can also reference an external entity and use it to include larger amounts of content in your XML document. This is similar to using a server-side include file in an XHTML document. The following XML document refers to several entities: <book> <content> &tableOfContents; &chapter1; &chapter2; &chapter3; &appendixA; &index; <content> </book> This XML document takes its content from several entities, each representing an external XML document. The DTD needs to include a declaration for each of the entities. For example, you might define the tableOfContents entity as follows: <!ENTITY tableOfContents SYSTEM "entities/TOC.xml"> Associating a DTD with an XML Document So far, you’ve seen how to construct a DTD, but you haven’t yet seen how to associate it with an XML document. You can either embed the DTD in the XML document or add a reference to an external DTD. You can reference an external DTD from the XML document in the prolog: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE library SYSTEM "dvd.dtd"> You can also embed a DTD within the prolog of the XML document: <?xml version="1.0" encoding="UTF-8"?> <!-- This XML document describes a DVD library --> <!DOCTYPE library [ <!ELEMENT library (DVD+)> <!ELEMENT DVD (title, format, genre)> <!ELEMENT title (#PCDATA)> <!ELEMENT format (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ATTLIST DVD id CDATA #REQUIRED> ]> <library> ... </library> You can find this example saved as dvd_embedded_dtd.xml within your resources files.
Slide 50: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 29 It’s possible to have both an internal and external DTD. The internal DTD takes precedence if a conflict exists between element or attribute definitions. It’s probably more common to use an external DTD. This method allows a single DTD to validate multiple XML documents and makes maintenance of the DTD and document instances easier. You can then use an embedded DTD if you need to override the external DTD. This approach works much the same way as using embedded Cascading Style Sheets (CSS) declarations to override external stylesheets. If you’re creating a one-off document that needs a DTD, it may be easier to use embedded element and attribute declarations. Even if you don’t want to define the elements and attributes, you might want to define entities. ■ Note If you include a reference to an external DTD that includes entities, you must change the standalone attribute in the XML declaration to no: <?xml version="1.0" encoding="UTF-8" standalone="no"?> Let’s turn to the other commonly used XML validation language, XML schema. XML Schema XML schemas share many similarities with DTDs; for instance, you use both to specify the structure of XML documents. You can find out more about XML schemas by reading the W3C primer at http://www.w3.org/TR/xmlschema-0/. DTDs and XML schemas also have many differences. First, the XML schema language is a vocabulary of XML. XML schemas are more powerful than DTDs and include concepts such as data typing and inheritance. Unfortunately, they’re also much more complicated to construct compared with DTDs. A further disadvantage is that XML schemas offer no equivalent of a DTD entity declaration. One important aspect of XML schemas is that a schema processor validates one element at a time in the XML document. This allows different elements to be validated against different schemas and makes it possible to examine the validity of each element. A document is valid if each element within the document is valid against its appropriate schema. A side effect of this element-level validation is that XML schemas don’t provide a way to specify which is the document element. So, providing the elements are valid, the document will be valid, regardless of the fact that a document element may not be included. Let’s start by looking at the schema that describes the dvd.xml document: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element name="DVD" minOccurs="0" maxOccurs="unbounded"> <xs:complexType>
Slide 51: 30 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="format" type="xs:string"/> <xs:element name="genre" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:integer" use="required"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Straight away, you can see some big differences between this schema and the previous DTD. The most obvious difference is that the schema is tag-based and uses a namespace. By using XML to create the schema vocabulary, you can take advantage of standard XML creation tools. The XML schema also includes data types for both the elements and attribute. For example, the id attribute uses the type xs:integer. Let’s work through this schema document. The schema starts with a standard XML declaration. The document element is called schema, and it includes a reference to the XML schema namespace http://www.w3.org/2001/XMLSchema: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> By convention, this namespace is usually associated with the prefixes xsd or xs. This example uses the xs prefix. This schema uses Russian doll notation, where element declarations are positioned at the appropriate position in the document. In other words, the element declarations nest to indicate the relative position of elements. It’s possible to organize schema documents differently. The first element defined is the document element <library>. It has global scope because it’s the child of the <xs:schema> element. This means that the element definition is available for use anywhere within the XML schema. You might reuse the element declaration at different places within the schema document. Global elements can also be the document element of a valid document instance. The definition includes the following: <xs:element name="library"> <xs:complexType> <xs:sequence>
Slide 52: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 31 These statements define the element as a complex type element and indicate that it contains child elements in some order (<xs:sequence>). Complex type elements contain other elements or at least one attribute. Because the <library> element contains the remaining elements in the document, you must declare it as a complex type element. I’ll show you an example of declaring simple type elements shortly. You’ve declared that the <library> element contains a sequence of child elements by using <xs:sequence>. This seems a little strange, given that it only contains a single element that may be repeated. You could also select one element from a choice of elements using <xs:choice>, or you could select all elements in any order using <xs:all>. The <library> element contains a single <DVD> element that appears at least once and can appear multiple times. You specify this using <xs:element name="DVD" minOccurs="0" maxOccurs="unbounded"> If the element can occur exactly once, omit the minOccurs and maxOccurs attributes. The <DVD> element contains child elements, so it’s a complex type element containing other elements, also in a sequence: <xs:element name="DVD" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> The child elements are simple type elements because they contain only text. If they included an attribute, they would automatically be complex type elements, but the only attribute in the document is included in the <DVD> element. Define simple type elements by specifying their name and data type: <xs:element name="title" type="xs:string"/> <xs:element name="format" type="xs:string"/> <xs:element name="genre" type="xs:string"/> The XML schema recommendation lists 44 built-in simple data types, including string, integer, float, decimal, date, time, ID, and Boolean. You can find out more about these types at http://www.w3.org/TR/xmlschema-2/. You can also define your own complex data types. The <DVD> element also includes an attribute id that is defined after the child element sequence. All attributes are simple type elements and are optional unless otherwise specified: <xs:attribute name="id" type="xs:integer" use="required"/> It’s also possible to add constraints to the attribute value to restrict the range of possible values. Figure 2-1 shows the XML document and schema side by side in Altova XMLSpy.
Slide 53: 32 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS Figure 2-1. The XML document and related schema An Alternative Layout In the previous example, only the <library> element was declared as a child of the <xs:schema> element, so this is the only element available globally. If you want to be able to use other elements globally, you can change the way they’re declared by using the ref attribute. The following code shows the schema document reworked to make the <DVD> element global: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="library"> <xs:complexType> <xs:sequence> <xs:element ref="DVD" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="DVD"> <xs:complexType> <xs:sequence>
Slide 54: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 33 <xs:element name="title" type="xs:string"/> <xs:element name="format" type="xs:string"/> <xs:element name="genre" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:integer" use="required"/> </xs:complexType> </xs:element> </xs:schema> You can find this document saved as dvd_global.xsd with your resources. The changes are relatively small. Instead of the complete <DVD> declaration being included within the <library> declaration, it is now a child of the <xs:schema> element. This means that any other definition can access the declaration using the ref keyword. The changed lines appear in bold in the code listing. You can see both the XML document and alternative schema within Figure 2-2. Figure 2-2. The XML document and alternative related schema Creating schema documents with this structure is useful if the same element appears in more than one place. The XML schema has no concept of the document element of an instance document, so you can include more than one global element. The downside is that a validating parser could accept either element as the document element.
Slide 55: 34 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS Defining Data Types The sample XML schema uses only the built-in simple data types included in the XML schema recommendation. You can also define your own data types. For example, if an attribute can only have a value of yes or no, it might be useful to define a custom data type to reflect this: <xs:simpleType name="YesNoType"> <xs:restriction base="xs:string"> <xs:enumeration value="no"/> <xs:enumeration value="yes"/> </xs:restriction> </xs:simpleType> These declarations create a simple type element with the name YesNoType. The element is based on the xs:string data type and has two possible values: yes and no. Once defined, declarations can then access the data type in the same way as the built-in data types: <xsd:attribute name="availableForLoan" type="YesNoType" use="optional"/> If you want to make this data type available to other schemas, you can include the schema in much the same way as you’d use server-side include files in a web site. You could save the data type in a schema document and use the <xs:include> statement. The data type definition is saved in the file customDataType.xsd. You can include it by using the following statement in your schema document: <xs:include schemaLocation="customDataType.xsd"/> You can find the files customDataType.xsd and dvd_include.xsd with the resource file downloads. ■ Note An included schema is sometimes referred to as an architectural schema, as its aim is to provide building blocks for the document schemas against which documents will be validated. Schema Structures You’ve seen three different approaches for creating schemas: declaring all elements and attributes within a single element (Russian doll), defining global elements using the ref data type, and defining named data types. In general, if you’re creating a schema specific to a document, the Russian doll approach works well. If you’re creating a schema that you might use for several different document instances, it may be more flexible to use global definitions for at least some of your elements. If you always want an element to be referenced by the same name, then define it as an element. Where there’s a chance that elements with different names might be of the same structure, define a data type. For example, say you have a document that contains an address that you use for multiple purposes, such as a postal address, a street address, and a delivery address. One approach would be to reuse an <address> element throughout the document. However, if you want to
Slide 56: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 35 use the sample element structure with different element names, it would be more appropriate to define a global address data type and use it for <postalAddress>, <streetAddress>, and <deliveryAddress> elements. Schemas and Namespaces The subject of XML schemas is so complex that it could take up an entire book. For now, let’s discuss the relationship between schemas and namespaces. When defining a schema, it’s possible to define the namespace within which an instance document must reside. You do this by using the targetNamespace attribute of the <xs:schema> element. If you do this, any reference to these elements within the schema must also use this namespace. It avoids complications if you define this as the default namespace of the XML schema. An example follows: <xs:schema targetNamespace="http://www.apress.com/schemas" xmlns="http://www.apress.com/schemas" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> The example also sets the elementFormDefault attribute to qualified and the attributeFormDefault to unqualified. These attributes determine whether locally declared elements and attributes are namespace-qualified. A locally declared element is one declared inside a complex type element. Setting the elementFormDefault attribute to qualified means that the local elements in the instance document must not be qualified. The attributeFormDefault setting ensures that attributes are treated as belonging to the namespace of their containing element, which is the default for XML. Assigning a Schema to a Document Once you create a schema document, you need to reference it from the instance document so that a validating XML parser can validate the document. You can do this with either the schemaLocation or noNamespaceSchemaLocation attribute. Use the latter if the schema has no target namespace. These attributes are part of a W3C-controlled namespace known as the XML Schema Instance namespace. This is normally referred to with the prefix xsi. You need to declare this namespace within the document instance. The schema document is not within a namespace, so use the noNamespaceSchemaLocation attribute as the example document element: <library xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceSchemaLocation="dvd.xsd"> You can find the completed document saved as dvd_schema.xml with your code download files. Note the syntax of the xsi:noNamespaceSchemaLocation attribute. In this case, the document uses a local reference to the schema document, but it could have used a fully qualified URI to find the schema document on the Internet.
Slide 57: 36 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS If you use the schemaLocation attribute, the value is made up of a namespace URI followed by a URI that is the physical location of the XML schema document for that namespace. You can rewrite the document element to reference a namespace: <library xmlns="http://www.apress.com/schemas" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.apress.com/schemas http://www.apress.com/schemas/dvd.xsd"> You can use either a local reference or a fully qualified URI, as shown in the preceding example. It’s worth noting that the value of the xsi:schemaLocation attribute can be any number of pairs of URIs, with the first part being the URI of a namespace and the second being the location of the associated XML schema. This allows you to associate several XML schema documents with one document instance. Schemas and Entity Declarations One of the advantages of using DTDs is that they provide a way to define custom entity references. As mentioned, these are not available when you use an XML schema to declare XML vocabularies. If you need to include entity references when using an XML schema, you can also include a DTD in your document instance. The XML schema is used for validation while the DTD declares entity references: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE library [ <!ENTITY copyright "Copyright 2006 Apress"> ]> <library xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceSchemaLocation="dvd.xsd"> Comparing DTDs and Schemas You’ve seen how DTDs and XML schemas specify the rules for an XML vocabulary. While both types of documents serve the same purpose, there are some differences between them. A comparison of the two follows: • DTDs and XML schemas both allow you to define the structure of an XML document so you can check it with a validating parser. • DTDs allow you to define entities; you can’t do this within XML schemas. • XML schemas allow you to assign data types to character data; DTDs don’t. • XML schemas allow you to define custom data types; you can’t do this within DTDs. • XML schemas support the derivation of one data type from another; you can’t derive data types in DTDs. • XML schemas support namespaces; DTDs don’t support namespaces.
Slide 58: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 37 • XML schemas allow for modular development by providing <xsd:include> and <xsd:import>; DTDs don’t offer similar functionality. • XML schemas use XML markup syntax so you can create and modify them with standard XML processing tools; DTDs don’t follow XML vocabulary construction rules. • DTDs use a concise syntax that results in smaller documents; XML schemas use less concise syntax and usually create larger documents. • The XML schema language is newer than the DTD specification and has addressed some of DTDs’ weaknesses. DTDs and XML schemas are two of the many available schema languages. In some circumstances, it can be useful to consider alternative types of schemas. Other Schema Types Both DTDs and XML schemas are examples of closed schema languages. In other words, they forbid anything that the schema doesn’t allow explicitly. The XML schema language offers some extensibility, but it’s still fundamentally a closed language. Other schema languages are open, allowing additional content that the schema doesn’t forbid explicitly. You can use these languages either as an alternative to DTDs or XML schemas, or as an addition. Their processing occurs after the processing of the closed schema. You may wish to use an alternative schema type if you wish to impose a constraint that isn’t possible using a DTD or XML schema. For example, a tax system may have the following rule: “If the value of gender is male, then there must not be a MaternityPay element.” An application often includes such business rules, but a different schema type might allow you to represent the constraint more easily. Examples of these alternative schema languages include • Schematron http://www.ascc.net/xml/resource/schematron/schematron.html • REgular LAnguage for XML Next Generation (RELAX NG): http://www.oasis-open.org/ committees/tc_home.php?wg_abbrev=relax-ng • XML-Data Reduced (XDR): http://www.ltg.ed.ac.uk/~ht/XMLData-Reduced.htm Schematron uses XSLT and XPath, so you can embed Schematron declarations in an XML schema document to expand its scope. I’ll explain more about XSLT and XPath in this chapter’s “Understanding XSLT” and “XPath” sections. There are currently many different XML vocabularies in use. The next section introduces you to some popular vocabularies. XML Vocabularies In this chapter, you’ve seen how to define an XML vocabulary using a DTD or XML schema. Many XML vocabularies have become industry standards, so before defining your own language, it might be worthwhile to see what vocabularies already exist. You’ve already seen some XML vocabularies such as XHTML and XML schema, and I’ll show you more in Chapter 3. Table 2-2 lists some common XML vocabularies.
Slide 59: 38 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS Table 2-2. Common XML Vocabularies XML Language Architecture Description Markup Language (ADML) Chemical Markup Language (CML) Use Provides interoperability of architecture information Covers macromolecular sequences to inorganic molecules and quantum chemistry Enables the transmission of digital pictures, orders, and commerce information Allows enterprises to conduct business using the Internet XML specification for astronomical data, such as images, spectra, tables, and sky atlases Enables enterprise applications to communicate with mechanical and electrical systems in buildings Describes mathematics Used for exchanging business information within the meat and poultry supply-andmarketing chain Enables sharing of stock market information Coordinates the display of multimedia on web sites Describes vector shapes Enables electronic communication of business and financial data Reference http://www.opengroup.org/ architecture/adml/ adml_home.htm http://www.xml-cml.org/ Common Picture eXchange environment (CPXe) Electronic Business XML (ebXML) Flexible Image Transport System Markup Language (FITSML) Open Building Information Exchange (oBIX) http://www.i3a.org/ i_cpxe.html http://www.ebxml.org/ http://www. service-architecture.com/ xml/articles/nasa.html http://www.oasis-open.org/ committees/tc_home. php?wg_abbrev=obix Mathematical Markup Language (MathML) Meat and Poultry XML (mpXML) http://www.w3.org/Math/ http://www.mpxml.org/about/ Market Data Definition Language (MDDL) Synchronized Multimedia Integration Language (SMIL) Scalable Vector Graphics (SVG) eXtensible Business Reporting Language (XBRL) http://www.mddl.org/ default.asp http://smw.internet.com/ smil/smilhome.html http://www.w3.org/TR/SVG/ http://www.xbrl.org/Home/ Now that you’ve seen some examples of XML vocabularies, it’s time to discover how to display the content within XML documents. Displaying XML At some stage, you’re likely to need to display the contents of an XML document visually. You might need to see the contents in a web browser or print them out. In the DVD example, you also might want to refine the display so that you see just a list of the titles. You might even want to sort the document by alphabetical order of titles or by genre. In this section, I’ll introduce the XML document display technologies: CSS and XSLT.
Slide 60: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 39 XML and CSS You can use CSS with XML in exactly the same way that you do with XHTML. This means that if you know how to work with CSS already, you can use the same techniques with XML. I’ll discuss CSS and XML in more detail in Chapter 5; this section just covers some of the main points. To display an XML document with CSS, you need to assign a style to each XML element name just as you would with XHTML. In XML, one difference is that the stylesheet is associated with an XML document using a processing instruction placed immediately after the XML declaration: <?xml-stylesheet type="text/css" href="style.css"?> In XHTML pages, the text that you wish to style is character data. With XML, that might not be the case. For example, the content might consist of numeric data that a human can’t easily interpret visually. When working in CSS, it’s not easy to add explanatory text when rendering the XML document. This limitation might not be important when you’re working with documents that contain only text, but it might be a big consideration when you’re working with other types of content. Another limitation of CSS is that it mostly renders elements in the order in which they appear in the XML document. It’s beyond the scope of CSS to reorder, sort, or filter the content in any way. When displaying XML, you may need more flexibility in determining how the data should be displayed. You can achieve this by using XSL. XSL Extensible Stylesheet Language (XSL) is divided into two parts: XSL Transformations (XSLT) and XSL Formatting Objects (XSL-FO). The former transforms the source XML document tree into a results tree, perhaps as an XHTML document. The latter applies formatting, usually for printed output. Figure 2-3 shows how these two processes relate. Figure 2-3. Applying a transformation and formatting to an XML document Once the XSLT processor reads the XML document into memory, it’s known as the source tree. The processor transforms nodes in the source tree using templates in a stylesheet. This process produces result nodes, which together form a result tree. The result tree is also an XML document, although you can convert it to produce other types of output. The conversion process is known as serialization. As I mentioned earlier, the
Slide 61: 40 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS result tree will usually be serialized as XHTML. You can also produce printed output from the result tree with XSL-FO. Nowadays, when someone refers to XSL, they’re usually referring to XSLT. You can use XSL-FO to produce a printed output, a PDF file, or perhaps an aural layout. Understanding XSLT I’ll delve into XSLT in much more detail in Chapters 6 and 7, but here I’ll work through a simple example so you can see the power of XSLT. You’ll see how to use XSLT to convert your DVD document into an XHTML page that includes CSS styling. This process is different from styling the XML content directly with CSS, which I’ll cover in Chapter 5. Earlier, you saw that CSS styles the source document using a push model, where the structure of the input defines the structure of the output. XSLT allows both a push model and a pull model, where the structure of the stylesheet defines the structure of the output. In this example, you’ll see how to use both. You’ll use the source document to define the display order, but the stylesheet will provide the structuring information. You’ll create a list of all DVDs to display in a table on an XHTML page, and you’ll add a little CSS styling to improve the appearance. You can find the files used in the example saved as dvd_XSLT.xml and dvdtoHTML.xsl. They are saved within this chapter’s ZIP file in the Source Code area of the Apress web site (http://www.apress.com). Figure 2-4 shows the web page produced by the XSLT stylesheet. Figure 2-4. The transformed dvd.xml document shown in Internet Explorer The web page is created by applying the following stylesheet to the source XML document: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" version="4.0"/> <xsl:template match="/"> <html> <head> <title>DVD Library Listing</title> <link rel="stylesheet" type="text/css" href="style.css"/> </head> <body>
Slide 62: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 41 <table width="40%"> <tr> <th>Title</th> <th>Format</th> <th>Genre</th> </tr> <xsl:for-each select="/library/DVD"> <xsl:sort select="genre"/> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="format"/></td> <td><xsl:value-of select="genre"/></td> </tr> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet> The stylesheet starts with a stylesheet declaration. It uses the xsl prefix to denote the XSLT namespace, which is declared in the document element, <stylesheet>. You’re also required to declare the version of XSLT that you’re using—in this case, 1.0: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> Next, the stylesheet declares the output type—in this case, HTML 4.0: <xsl:output method="html" version="4.0"/> You could also choose the output method xml or text. If you choose the output type xml, you can generate well-formed XML or XHTML. The output type text is useful if you want to create a comma-delimited file for import into a spreadsheet or database. The next section of the stylesheet uses a template to generate the <html>, <head>, and opening <body> tags. I left out the DOCTYPE declaration to simplify the example: <xsl:template match="/"> <html> <head> <title>DVD Library Listing</title> <link rel="stylesheet" type="text/css" href="style.css"/> </head> <body> <table width="40%"> <tr> <th>Title</th> <th>Format</th> <th>Genre</th> </tr>
Slide 63: 42 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS The first line specifies what nodes in the source tree the template matches. It uses an XPath expression to determine the node. You’ll find out more about XPath a little later in the chapter. In this case, you’re matching the root node, which is indicated by a slash (/). ■ Note Technically, the root node isn’t the same as the root element. The root note is at a higher level in the document and has the root element as a child. This allows the stylesheet to access information in the prolog and epilog, as well as information in elements. The template specifies what should happen when the XSLT processor encounters the root. In this case, the result tree includes the HTML tags indicated within the template. It should generate the following output: <html> <head> <title>DVD Libarary Listing</title> <link rel="stylesheet" type="text/css" href="style.css"/> </head> <body> <table width="40%"> <tr> <th>Title</th> <th>Format</th> <th>Genre</th> </tr> The result tree sets up the HTML document and adds a link to an external CSS stylesheet called style.css. The closing <table> and <body> tags appear after the other content that you include. The next section within the stylesheet includes each <DVD> element as a row in the table using another template. This time the template matches each <DVD> element. Because there are multiple DVD elements, it’s appropriate to use an xsl:for-each statement: <xsl:for-each select="/library/DVD"> <xsl:sort select="genre"/> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="format"/></td> <td><xsl:value-of select="genre"/></td> </tr> </xsl:for-each> The xsl:for-each statement finds the <DVD> node using the XPath expression /library/DVD. In other words, start with the root node, locate the <library> element, and move to the <DVD> node. This statement retrieves all of the <DVD> nodes in the XML document.
Slide 64: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 43 The next statement dictates the sorting for the group of nodes using the xsl:sort statement. In this case, the stylesheet sorts in order of the genre. Because the template refers to the /library/DVD path, it’s appropriate to use a relative path to specify the <genre> node. Within the xsl:for-each statement, the xsl:value-of element selects a specific element for inclusion in the table cell. The stylesheet repeats the statement three times—one for each of the <title>, <format>, and <genre> elements. This transformation results in the following results tree: <html> <head> <title>DVD Library Listing</title> <link rel="stylesheet" type="text/css" href="style.css" /> </head> <body> <table width="40%"> <tr> <th>Title</th> <th>Format</th> <th>Genre</th> </tr> <tr> <td>Breakfast at Tiffany's</td> <td>Movie</td> <td>Classic</td> </tr> <tr> <td>Little Britain</td> <td>TV Series</td> <td>Comedy</td> </tr> <tr> <td>Contact</td> <td>Movie</td> <td>Science fiction</td> </tr> The remaining section of the stylesheet adds the closing </table>,</body>, and </html> tags: </table> </body> </html> </xsl:template> </xsl:stylesheet> If you want to see some of the power of XSLT, you can modify the stylesheet to change the sort order. You can also filter the content to display specific records; you’ll see this in Chapters 6 and 7.
Slide 65: 44 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS XSLT Summary This section shows some of the functionality of XSLT, and you should remember these key points: • CSS applies styles to an XML document based on the current structure of the document tree. This is called a push model. • XSLT can transform a source XML document into any well-formed XML document that can be serialized as XML, HTML, or text. • XSLT stylesheets can produce a result tree in a different order from the source tree. • XSLT can add text and markup during the transformation. • XSLT is template-based, making it mainly a declarative language. • XSLT makes extensive use of XPath to locate nodes in the source tree. I’ve mentioned XPath during this discussion of XSLT, so it’s worthwhile exploring it in a little more detail. XPath You saw that the XSLT stylesheet relied heavily on the use of XPath to locate specific parts of the source XML document tree. Other recommendations, such as XPointer, also rely on the XPath specification, so it’s useful to have an understanding of the basics. One important thing to realize is that XPath doesn’t use XML rules to construct expressions. You use XPath by writing expressions that work with the XML document tree. Applying an XPath expression to a document returns one of the following: • A single node • A group of nodes • A Boolean value • A floating point number • A string XPath expressions can’t address the XML declaration in a document because it isn’t part of the document tree. They also don’t address embedded DTD declarations or blocks of CDATA. XPath treats an XML document as a hierarchical tree made up of nodes. Each tree contains • Element nodes • Attribute nodes • Text nodes
Slide 66: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 45 • Processing instructions • Comments • Namespaces The root node is the starting point for the XML document tree, and there’s only one root node in an XML document. The XML document itself is a node in the tree, and it’s a child of the root node. Other children of the root node include processing instructions and comments outside of the document node. You write XPath expressions to locate specific nodes in the tree. XPath Expressions XPath expressions use an axis name and two colon characters (::) to identify nodes in the XML document: /axis::nodetest[predicate] XPath expressions include location paths that you read from left to right to identify the different parts of an XML document. The expression separates each step in the path with a slash (/): /axis::nodetest[predicate]/axis::nodetest[predicate] These paths indicate how nodes relate to each other and their context. The starting point of the path provides the context for the node. Using a slash means that the root element provides the context. The processor evaluates XPath expressions without this character against the current node. The axis or axes used in the path describe these relationships. The nodetest identifies the node to select. It may optionally include one or more predicates that filter the selection. The following expression refers to any <DVD> descendants of the root element. The root element provides the context. The descendant axis specifies that the expression should select the descendants of the <DVD> node: /descendant::DVD XPath recognizes the following axes: • ancestor • ancestor-or-self • child • descendant • descendant-or-self • following • following-sibling • preceding
Slide 67: 46 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS • preceding-sibling • parent • self The axis names are self-explanatory; it’s beyond the scope of this book to go into them in too much detail. It’s worth mentioning, however, that you can write a shortened form of XPath expressions for the child, parent, and self axes. Table 2-3 provides some examples of the long and short forms of expressions. Table 2-3. Examples of Long and Short Forms of XPath Expressions Long Form child::DVD DVD/attribute::id self::node() parent::node() Abbreviation DVD DVD/@id . .. You saw the use of abbreviated XPath expressions in the previous section on XSLT. For example, you could refer to the <DVD> nodes using /library/DVD. When you want to refer to a child node, use title rather than child::title. Identifying Specific Nodes XPath allows you to navigate to a specific node within a collection by referring to its position: /library/DVD[2] This expression refers to the second <DVD> node within the <library> node. You also can apply a filter within the expression: /library/DVD/[genre='Comedy'] The preceding expression finds the <DVD> nodes with a child <genre> node containing Comedy. Including Calculations and Functions XPath expressions can include mathematical operations, and you can use the + (addition), – (subtraction), * (multiplication), div (division), and mod (modulus) operators. Obviously, you can’t use the / symbol for division because it’s included in the location path. These expressions might be useful if you want to carry out calculations during a stylesheet transformation. You can also include functions within XPath expressions. These include node set, string, Boolean, and number functions. Again, it’s beyond the scope of this book to explore these in detail, but it’s useful to know that they exist. If you want to find out more about the XPath recommendation, visit http://www.w3.org/TR/1999/REC-xpath-19991116.
Slide 68: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 47 XPath Summary The following list summarizes the main points to consider when working with XPath expressions: • You can use XPath in XSLT stylesheets and XPointers to specify a location in an XML tree. • XPath expressions identify the location using an axis name, a node test, and, optionally, a predicate. The expressions read from left to right with each point in the path separated by a forward slash (/). • You can abbreviate some XPath expressions to use a shortened form. • You can include mathematical operators and functions within an XPath expression if you want to perform calculations during a transformation. You saw earlier that XPath expressions specify locations in XSLT stylesheets. These expressions can also be used in XPointers, which point to a specific location within an XLink. Before we see this, let’s look at XLinks. Linking with XML XLinks provide a powerful alternative to traditional XHTML links. XHTML links allow you to link from a source to a destination point, in one direction. XLinks allow you to • Create two-way links • Create links between external documents • Change the behavior of links so that they trigger when a page loads • Specify how the linked content displays You can find out more about the W3C XLink recommendation at http://www.w3.org/TR/ 2001/REC-xlink-20010627/. The XPointer recommendation is split into the element (http:// www.w3.org/TR/2003/REC-xptr-element-20030325/), the framework (http://www.w3.org/TR/ 2003/REC-xptr-framework-20030325/), and the xmlns scheme (http://www.w3.org/TR/2003/ REC-xptr-xmlns-20030325/). At the time of writing, a fourth recommendation is in development—the xpointer() scheme (http://www.w3.org/TR/2002/WD-xptr-xpointer-20021219/). This recommendation adds advanced functionality to XPointer, including the ability to address strings, points, and ranges within an XML document. Currently, XML tools offer very limited support for XLink and XPointer. However, the recommendations are important and their usage is likely to be extended in the future, so it’s worthwhile having an understanding of how they fit into the XML framework. Let’s start by looking at the two different types of XLink that you can create: simple and extended.
Slide 69: 48 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS Simple Links A simple link connects a single source to a single target, much like an XHTML link. Before you can include an XLink, the XML document that includes the XLink must also include a reference to the XLink namespace. You can do this in the document element as follows: <?xml version="1.0"?> <library xmlns:xlink="http://www.w3.org/1999/xlink"> By convention, developers use xlink to preface this namespace. In XHTML, the <a> element indicates a link. Web browsers understand the meaning of this element and display the link accordingly. In XML, you can add a link to any element within the XML document. Let’s look at an example of a simple link: <elementName xlink:type="simple" xlink:href="http://wwww.apress.com" xlink:title="Apress" xlink:show="replace" xlink:actuate="onRequest"> Here is a linked element </elementName> This XLink provides a link to http://www.apress.com. It includes an xlink:type attribute indicating that it’s a simple link. It uses the attribute xlink:href to provide the address of the link. The link has a title that is intended to be read by humans. The XLink includes an xlink:show behavior of replace, which indicates that the link should replace the current URL. You could also specify xlink:show = "new", which is akin to the XHTML target="_blank". Other values include embed, other, and none. Choosing embed is similar to embedding an image in an XHTML page—the target resource replaces the link definition in the source. A value of other leaves the link action up to the implementation and indicates that it should look for other information in the link to determine its behavior. The value none also leaves the behavior up to the implementation, but with no hints in the link. The xlink:activate attribute determines when the link opens. In this example, using onRequest indicates that the document will await user action before activating the link. The attribute could also use values of onLoad, other, or none. Setting the attribute value to onLoad causes the link to be followed immediately after the resource loads. You could use this value with xlink:show="embed" to create a display from a set of linked source documents. The values other and none have the same meanings as in the xlink:show attribute. The preceding example creates a link that’s very similar to a traditional XHTML link, with some additional capabilities. An extended XLink offers much more powerful capabilities.
Slide 70: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 49 Extended Links Extended links provide much more complex linking abilities. You can • Link more than two resources • Create a link between resources outside of the source (out-of-line linking) • Separate the direction of the link from the definition of the resources being linked Currently, no web browser supports extended XLinks, so I’ll give you a brief introduction only. To use extended links, you must use more than one element and several attributes. Let’s start by looking at how you could link more than two resources. Linking More Than Two Resources Web developers often create links that effectively move from a single point to multiple destinations. You can see this in the following analogy. Consider a web site for DVD movies. Any page providing information about a single DVD might contain references to other pages about the actors or the director. For example, if you’re looking at The Lord of the Rings: The Fellowship of the Ring, you might want to see other films starring Sir Ian McKellen. The link from this page goes to multiple destinations, each referring to a film including the actor. In XHTML, you could write several links to the other films starring Sir Ian McKellen. In XML, you can use a single extended link. XLink doesn’t define the presentation of these links. You could use an XSLT stylesheet to display them as a list of XHTML links or a drop-down list. Out-of-Line Linking When you use XHTML links and simple XLinks, you define the link at its source point. With an extended XLink, you can define both the source and destination from an unrelated point. You don’t need to include the link in either the source or the destination document. This could be useful if you need to add links from documents where you don’t have write permission. You can effectively build your own links to other people’s documents. Out-of-line links are likely to be useful to build up a set of information resources. You can also update links more easily because they’re stored in a single location. Separating the Direction of the Link from the Resource Definitions In an extended link, the xlink:type="locator" attribute identifies elements participating in the link. Elements with the xlink:type of arc define the connections. This construction allows you to traverse links in both directions, rather than having the fixed source and target present in the simple link. Returning to the DVD example, you can define extended XLinks that can be followed either way. You can use the link to find out which actors appeared in a film. You can also follow a link from the actors to the films they’ve appeared in or see which other actors appeared in the same film. All you need to do is build a “link database” containing a list of all the linked resources and the definitions of a set of arcs to be followed. A simple example follows:
Slide 71: 50 CHAPTER 2 ■ RELATED XML RECOMMENDATIONS <allFilms xlink:type="extended"> <film xlink:type="locator" xlink:href="fellowshipofthering.xml" xlink:label="fellowship"/> <actor1 xlink:type="locator" xlink:href="ianmckellen.xml" xlink:label="actor1"/> <actor2 xlink:type="locator" xlink:href="elijahwood.xml" xlink:label="actor2"/> <arcName xlink:type="arc" xlink:from="fellowship" xlink:to="actor1"/> <arcName xlink:type="arc" xlink:from="fellowship" xlink:to="actor2"/> <arcName xlink:type="arc" xlink:from="actor1" xlink:to="actor2"/> <arcName xlink:type="arc" xlink:from="actor2" xlink:to="actor1"/> </allFilms> So far, you’ve seen XLinks that link to a complete resource. Now it’s time to discuss the role of XPointers, which allow you to link to a specific section within an XML document. XPointer In the preceding section, all links examples referred to complete documents. However, you may want the source or destination to be a point within a document or a part of a document. You can achieve this using XPointers. In a way, this is similar to using an anchor within an XHTML link: <a href="movies.htm#fellowshipofthering"> When someone clicks this link, the document loads and positions the screen at the named anchor fellowshipofthering. If you use an XPointer, you don’t need to mark part of the document with a named anchor. Instead, you can use the following construction: <xlink:simple xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="movies.xml#xpointer(/library/DVD/title[5])" xlink:title="Fellowship of the Ring" xlink:show="replace" xlink:actuate="onRequest"/> The XPointer appears at the end of the xlink:href attribute and uses the keyword #xpointer. It includes an XPath expression to identify the destination for the link. In this case, you’re linking to the fifth <title> node within the <DVD> node in the <library> node. Because you don’t need to add a named anchor to the destination link, you can be more flexible when creating out-of-line extended links. XPointer also allows you to specify a range of locations to view a small part of a large document. You can use the xlink:show="embed" attribute with an XPointer to embed a specific fragment of one XML document within another. You can do this without altering any of the source documents. I’m sure you can see how much more flexibility this approach to linking offers.
Slide 72: CHAPTER 2 ■ RELATED XML RECOMMENDATIONS 51 XML Links Summary XLink and XPointer combine to provide powerful linking opportunities, which, unfortunately, aren’t yet supported in web browsers. In this section, I’ve only scratched the surface of what’s possible. The following list summarizes the main points about XML links: • In XHTML, a link has a fixed behavior. You click on the link to arrive at a destination. In XML, you can specify additional behaviors. • XHTML links have a single source and a single destination. XML links can have multiple destinations. • XHTML links always link from the anchor (<a>) element to a destination. XML links can be bidirectional from any element. • XHTML links use source and destination points that are embedded in documents. XML links can be completely separate from either end point. • To link to a specific location in the destination point, XHTML links require the inclusion of a named anchor in the destination document. XLinks can use an XPointer containing an XPath expression instead and don’t need to modify the destination document in any way. • In XTML, a named anchor refers to a single point in the destination document. In XML, you can use XPointers to refer to a portion of the document. Summary In this chapter, I’ve covered some of the related XML recommendations from the W3C, including the role of namespaces, the use of DTDs and XML schemas in specifying XML vocabularies, and the application of XSLT in transforming XML documents for different purposes. I’ve also provided a brief introduction to XPath and shown you some of the main points about XLinks and XPointers. In Chapter 3, I’ll show you some web-specific XML vocabularies and examine XHTML, Mathematical Markup Language (MathML), Scalable Vector Graphics (SVG), and web services in detail.
Slide 74: CHAPTER 3 Web Vocabularies As XML grows in popularity, the number of XML vocabularies used within various industry and community sectors increases. These groups use XML to store database information, exchange structured information, and even describe concepts. XML is a mechanism for storing data. When first applied to the web, XML addressed many of the shortcomings associated with HTML. Although you can view any XML document on the web, some vocabularies were created specifically for this medium. In this chapter, I’ll focus on web vocabularies such as • XHTML • Mathematical Markup Language (MathML) • Scalable Vector Graphics (SVG) • Web services (WSDL and SOAP) You can use these vocabularies in web browsers and other web-enabled devices. You can download the files referred to in this chapter from the Source Code area of the Apress web site (http://www.apress.com). Let’s start with a closer look at XHTML. XHTML XHTML is probably the most widespread web vocabulary of all; web developers have been using it for several years. XHTML enjoys support in modern web browsers such as Internet Explorer 6 for Windows, Mozilla Firefox 1.x for Windows, and Safari 1.x for Macintosh. The W3C states that XHTML is HTML reformulated in XML. XHTML 1.0 is nothing other than HTML 4.01 in XML syntax. It’s an XML-compliant version of HTML. XHTML is a great starting point for a discussion of XML vocabularies. XHTML provides a number of benefits compared with HTML. First, XHTML separates presentation from content. In XHTML, content is made up of data as well as the structural elements that organize that data. HTML was concerned with both information and its display, whereas its replacement, XHTML, is concerned with both information and the way it’s structured. XHTML also uses much stricter construction rules compared with HTML, as XHTML web pages must be well formed. You learned about well-formed documents in Chapter 1. 53
Slide 75: 54 CHAPTER 3 ■ WEB VOCABULARIES Because XHTML is based on XML, you can use XML-specific tools and technologies to create modular documents. Throughout the chapter, I’ll show you how to merge other vocabularies into XHTML. Let’s begin by looking more closely at the benefits of XHTML. Separation of Presentation and Content The separation of content from presentation is perhaps the single most important concept in web development today. This fundamental principle underpins most modern web specifications. Content refers to the basic data and structures that make up a document. Within XHTML, this includes elements such as headings, paragraphs, tables, and lists. Presentation determines how these structures appear within the viewing device and might include font faces, colors, borders, and other visual information. Cascading Style Sheets (CSS) control the presentation of a document. ■ Note When working with XML applications, you can separate the content into both data and data structures. In XML applications, an XML document supplies the data, while XSLT stylesheets provide the structure. You still apply styling through CSS stylesheets. It’s important to separate content from presentation because it allows you to repackage the content for different audiences. If you want to provide the same information to a web browser, a mobile phone, and a screen reader, the presentation layer must be different for each device. You can achieve this by excluding the presentation of information from web documents. Separating presentation from content has four major benefits: • Accessibility • Targeted presentation using stylesheets • Streamlined maintenance • Improved processing Let’s look at each of these benefits in more detail. Accessibility In recent times, the W3C has focused on making XHTML more accessible to people with disabilities. For example, people with visual impairments can use screen readers and voice browsers when working with XHTML documents. Documents that follow the XHTML construction rules often require little or no change, so users can access them with a screen reader. Many countries have legislation requiring web sites to be accessible to people with disabilities. In the United States, Section 508 of the Rehabilitation Act of 1973 requires people
Slide 76: CHAPTER 3 ■ WEB VOCABULARIES 55 with disabilities to have access to federal agency electronic information. You can find out more about this regulation at http://www.usability.gov/accessibility/. The W3C Web Accessibility Initiative web site (http://www.w3.org/WAI/) provides information about how to make web sites accessible. The site includes quick tips for accessibility (http://www.w3.org/WAI/References/QuickTips/), as well as a list of tools to help you evaluate whether your site is currently accessible (http://www.w3.org/WAI/ER/existingtools.html). By separating the visual elements from the actual content of your page, you make the content instantly more accessible. Screen readers and other text-based browsers, such as Lynx for Unix and Linux, can interpret the flow of the document easily. Ultimately, users of your site will have a better experience. Targeted Presentation If you separate the presentation layer from your content, you’ll be able to target its appearance for specific devices. You can do this by storing all style information within a stylesheet and linking a specific stylesheet for each device that you want to support. Storing the style information in one place makes it easier to reuse stylesheets and maintain a consistent look. Several types of stylesheets exist, but the most popular are CSS and XSLT. I’ll explain these stylesheets in detail in Chapters 5 to 7. Streamlined Site Maintenance Storing the content and structure separately from the presentation layer makes it easier to maintain your web site. Pages no longer contain presentational elements mixed in with the XHTML structures and data. When working through long blocks of code, you only need to concern yourself with the structural elements because the presentation layer exists elsewhere. This streamlines the site maintenance process and speeds up workflow. Improved Processing Accessibility and targeted presentation were important concerns in HTML even before XHTML was introduced. XHTML, however, directly addresses the need for an improved processing model. Because the rules for XML are so strict, processing XHTML documents becomes easier than processing its predecessor HTML. Software programs can perform XML-related tasks, such as designing XSLT stylesheets. See the WYSIWYG XSLT Designer by Stylus Studio (http://www.stylusstudio.com/xhtml.html) for one such example. Because the rules for constructing HTML were less strict than XHTML rules, it was possible for HTML pages to contain mistakes that didn’t affect their display. For example, you could leave out a closing </body> tag but still be able to view the page within a browser. In addition, some web browsers rendered elements slightly differently, so browser manufacturers started adding proprietary extensions to their browsers. Ultimately, this led to incompatible browsers and lack of compliance with the HTML specification. You can instruct more recent browser versions and software tools to discard XHTML documents that aren’t authored correctly and don’t use valid, well-formed XHTML. Modern browsers feature improved page processing because they don’t need to deal with malformed documents.
Slide 77: 56 CHAPTER 3 ■ WEB VOCABULARIES Cell phones and personal digital assistants (PDAs) are capable of viewing web documents using either Wireless Markup Language (WML) or XHTML Basic. WML is an XML vocabulary for Wireless Application Protocol (WAP)-enabled phones, and XHTML Basic is a cut-down version of XHTML that includes only basic markup and text. XHTML Basic was created using XHTML’s modularization framework, which I’ll discuss in more detail in the “XHTML Modularization” section. XHTML Construction Rules The rules for constructing XHTML pages are a little different compared with HTML pages. You must follow these rules in XHTML: • Include a DOCTYPE declaration specifying that the document is an XHTML document. • Optionally include an XML declaration. • Write all tags in lowercase. • Close all elements. • Enclose all attributes in quotation marks. • Write attributes in full (i.e., don’t minimize attributes). • Use the id attribute instead of name. • Nest all tags correctly. • Specify character encoding. • Specify language. You’ll see how these rules are applied in the following section, which covers DOCTYPE declarations. I’ll also work through some sample XHTML documents. DOCTYPE Declarations In any web vocabulary, you need to determine which elements and attributes are valid. In Chapter 2, you saw how you can do this using Document Type Definitions (DTD) and XML schemas. XHTML 1.0 allows for three different DOCTYPE declarations that determine which DTD to use. You can write the following XHTML documents: • Transitional • Strict • Frameset A DOCTYPE declaration tells a validator how to check your web page. It also instructs a web browser to render your page in standards-compliant mode. Using an outdated or incorrect DOCTYPE makes browsers operate in “Quirks” mode, where they assume that you’re writing old-style HTML.
Slide 78: CHAPTER 3 ■ WEB VOCABULARIES 57 XHTML 2.0 At the time of writing, the W3C had prepared a working draft of the XHTML 2.0 specification (http:// www.w3.org/TR/xhtml2/). This vocabulary removes backward compatibility and all presentation elements in favor of stylesheets. It allows for more flexible organization using sections and headers, and it introduces separator and line elements, as well as navigation lists. It introduces links to every element and overhauls tables and forms. The most recent XHTML specification, XHTML 1.1, became a recommendation in May 2001. It has only one document type to choose from: XHTML 1.1, which is very similar to XHTML 1.0 strict. Each of these four document types has a slightly different set of allowable elements. Choosing the right type of document should be the first step in building your XHTML page. I’ll explain each of these document types in more detail. The examples in this chapter show you how to create pages for an imaginary web site called “Mars Travel.” I’ll keep the examples simple so you can focus on the XHTML content. Transitional XHTML Documents You use the transitional document type for web sites that need to work in many different web browsers, because it supports the deprecated elements not allowed in the strict DTD. If you’re not ready or able to remove all presentation from your documents, you should use the transitional DTD. Let’s look at an example of some transitional markup: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Mars Travel</title> </head> <body bgcolor="#FFFFFF"> <h1 align="center">Mars Travel<br /> <i>Visits to a faraway place </i> </h1> <hr width="100%" /> <h2 align="center">Your spacecraft</h2> <p align="center"> Your spacecraft is the Mars Explorer, which provides the latest in passenger luxury and travel speed. </p> <hr width="100%" /> <p align="center">XHTML 1.0 Transitional Document</p> </body> </html>
Slide 79: 58 CHAPTER 3 ■ WEB VOCABULARIES You can find this document saved as marstransitional.htm with your code download. The document begins with an XML declaration: <?xml version="1.0" encoding="UTF-8"?> XHTML documents don’t require an XML declaration, but it’s recommended that you include one. If you include the declaration, web browsers can check that the document is well formed. Immediately following the XML declaration, a DOCTYPE declaration tells the web browser exactly what kind of document you’re writing: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> The document root <html> contains a reference to the XHTML namespace: <html xmlns="http://www.w3.org/1999/xhtml"> The markup is well-formed XML, but it still contains some presentational information— in particular, align and bgcolor attributes. You can download this example, called marstransitional.htm, in the Source Code area of the Apress web site (http://www.apress.com). If you open the file in a web browser, you should see something like the screen shot shown in Figure 3-1. Figure 3-1. The marstransitional.htm page displayed in Internet Explorer The XHTML transitional DTD can be useful if you need to support older browsers. Otherwise, you should try to use the strict or XHTML 1.1 document types.
Slide 80: CHAPTER 3 ■ WEB VOCABULARIES 59 Strict XHTML Documents Strict XHTML documents allow you to work with only structural tags, such as headings (<h1>, <h2>, <h3>, <h4>, <h5>, <h6>), paragraphs (<p>), and lists (<ul>, <ol>, <dl>). All of the presentational elements and attributes, such as align and bgcolor, are removed. The XHTML 1.1 specification has also completely removed presentational markup. In both strict and XHTML 1.1 document types, you should always use stylesheets to control how your document appears in various browsers. Let’s look at a sample of a strict XHTML document: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Mars Travel</title> <link href="styles.css" type="text/css" rel="stylesheet" /> </head> <body> <h1>Mars Travel<br /> <em>Visits to a faraway place </em> </h1> <hr /> <h2>Your spacecraft</h2> <p class="centered"> Your spacecraft is the Mars Explorer, which provides the latest in passenger luxury and travel speed. </p> <hr /> <p class="footer">XHTML 1.0 Strict Document</p> </body> </html> You can find this file saved as marsstrict.htm with your resources. The strict XHTML document is much shorter and doesn’t contain any presentational markup. Instead, it contains a link to a stylesheet called styles.css, which includes the presentational elements. It also replaces the presentational <i> element with the structural <em> element. If you view the file in a web browser, it will look much the same as the first XHTML document. The styles.css stylesheet contains the following presentational elements: h1 { font-weight: bold; font-size: 24px; text-align: center; } h2 { font-weight: bold; font-size: 20px;
Slide 81: 60 CHAPTER 3 ■ WEB VOCABULARIES text-align: center; } hr { width: 100%; } .centered { text-align: center; } .footer{ text-align: center; } The declarations redefine the <h1>, <h2>, and <hr> elements and create classes called centered and footer. I’ll explain CSS in more detail in Chapter 5. You can change the look of the web page easily by modifying the CSS. If you apply the same stylesheet to multiple pages, you can update all pages at once by making changes. Figure 3-2 shows the same web page with a modified style sheet. Figure 3-2. A revised presentation of the marsstrict.htm file You can find these files saved as marsstrict2.htm and styles2.css. The stylesheet tells the browser to set the sizes and colors for the <h1> and <h2> elements. It also changes the font for the entire page and defines a color for the <hr> element. The two classes centered and footer inherit the default font and center the text. The footer class uses a smaller font size.
Slide 82: CHAPTER 3 ■ WEB VOCABULARIES 61 Frameset XHTML Documents XHTML allows you to write a third kind of document called a frameset document. You use frameset documents with web pages that use frames. Frames are no longer recommended for a variety of reasons, so I’ll discuss this topic only briefly. Use the following DOCTYPE declaration to reference a frameset DTD: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> ■ Note A frameset can include both transitional and strict documents. You can also include one frameset document within another, allowing you to have nested frames. XHTML 1.1 Documents XHTML version 1.1 is a modular version of the XHTML 1.0 strict document type. As it’s based on the strict document type, you can’t include any presentation elements or attributes; you need to declare these in a stylesheet. Frames, which are often presentational, have been moved to a separate “module” that is not enabled by default. XHTML is modular, which means that parts of the XHTML document have been divided into separate modules that you can add or remove. When I discuss XHTML Modularization later in the chapter, I’ll show you how to mix web vocabularies using different XHTML 1.1 modules. Take a look at this simple XHTML 1.1 document: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Mars Travel</title> <link href="styles.css" type="text/css" rel="stylesheet" /> </head> <body> <h1>Mars Travel<br /> <em>Visits to a faraway place </em> </h1> <hr /> <h2>Your spacecraft</h2> <p class="centered"> Your spacecraft is the Mars Explorer, which provides the latest in passenger luxury and travel speed. </p> <hr /> <p class="footer">XHTML 1.1 Document</p> </body> </html>
Slide 83: 62 CHAPTER 3 ■ WEB VOCABULARIES This document is saved as marsxhtm1-1.htm with your resources. As you can see, the XHTML 1.0 strict and XHTML 1.1 documents are almost identical. The major difference is the DOCTYPE declaration that specifies which DTD to use. Although most of the internal reorganization is invisible to you, web browsers can understand the modular structure much more easily. Viewing the document gives almost the same results as shown in Figure 3-1. You could modify the display by changing the stylesheet declarations, exactly as you did with the strict document. The next requirement for XHTML documents is that tags are written in lowercase. Case Sensitivity Unlike HTML, XHTML is a case-sensitive vocabulary. This means that you must write all elements and attributes in lowercase in order to make them valid. Of course, the text within the element and attribute values is not case-sensitive. In HTML, you had to write element names in uppercase. However, this wasn’t enforced, so any of the following was allowable: <HTML> <Html> <html> In XHTML, however, the only allowable element construction is <html> Likewise, you must specify attributes using lowercase names. In HTML, any of the following were allowable: <IMG SRC="images/flower.gif"> <img src="images/flower.gif"> <Img Src="images/flower.gif"> In XHTML, all element and attribute names must be lowercase: <img src="images/flower.gif"> XHTML is case-sensitive because it’s a requirement in XML. Case sensitivity is a major step in internationalization efforts. Although you can easily convert uppercase English characters to lowercase ones, or lowercase characters to uppercase, it’s not so easy in other languages. Often there are no equivalent uppercase or lowercase characters, and some case mapping depends on region. Case sensitivity is important in order to allow the specification to use other languages and character sets. Closing Elements In HTML, you didn’t need to close some elements, including <img>, <br>, <hr>, and <input>. These elements didn’t mark up text, so they didn’t have a corresponding closing element. In XML, this type of element, referred to as an empty element, may contain attributes but doesn’t mark up text. You must close all elements for an XHTML document to be well formed.
Slide 84: CHAPTER 3 ■ WEB VOCABULARIES 63 In HTML, empty elements appeared like this: <IMG SRC="flower.gif"> In XHTML, empty elements can either appear with an immediate opening and closing tag, such as <img src="flower-.gif"></img> or in the short form, such as <img src="flower.gif"/> In the short form, you add a forward slash (/) before the closing angle bracket (>). This tells the XML or XHTML parser that the element is empty. Although both forms are legal XHTML, very old browsers have problems reading opening and closing tags for elements that are empty. It’s much better to use the short form for empty elements. These browsers also may have difficulty with the forward slash character, so, if you’re targeting them, it’s also good practice to add a space before the character (<br />). Attributes In addition to using the proper case for attribute names, you also need to make sure that you write them correctly. In HTML, you could write attribute values without quotation marks. For example, the following was legal in HTML: <TD colspan=4> HTML also allowed you to minimize attributes: <OPTION selected>An option</OPTION> Neither of these options is acceptable in XHTML. All attributes must have a value, even if it’s blank, and you must enclose all values in matching quotation marks: <td colspan="4"> <option selected='selected'>An option</option> In the preceding <td> element, you add quotation marks around the attribute value 4. In the <option> element, you remove the minimization of the selected attribute and use single quotes around the attribute value. The value for the selected attribute is selected. Names and IDs In HTML, the name attribute identified an element within the document. Later versions also allowed the use of id to replace the name attribute. In HTML 4.0 and XHTML 1.0, you can use the name attribute, the id attribute, or both. For example, you can identify the anchor element, <a>, with either attribute: <a name="Section1" /> <a id="Section1" /> <a name="Section1" id="Section1" />
Slide 85: 64 CHAPTER 3 ■ WEB VOCABULARIES In XHTML 1.1, however, the W3C permits only the id attribute: <a id="Section1" /> Again, older browsers expect you to use the name attribute. Because of this, some XHTML 1.1 pages don’t work in early browser versions. Nesting Tags The HTML language didn’t specify how you should nest tags, so writing something like the following didn’t cause an error: <H1><EM>A heading</H1></EM> This doesn’t work in XHTML; you need to rewrite the code so the tags close in the correct order: <h1><em>A heading</em></h1> Character Encoding Specifying the document encoding is very important, and in some cases required, so that the document displays correctly within different web browsers. Document encoding defines a numeric value for each character. Different encoding schemes sometimes use these values in different ways. Most browsers and computers support ASCII encoding, which assigns values to the 128 most commonly used characters. These characters are compatible across different platforms. If you’re using characters with values higher than 128, you must specify the character set so that the browser knows which character to display for a given value. Within XHTML, you can specify the character set that your document is using in several ways, including • Using the XML declaration • Using the <meta> element • Using external means You can use any of these methods alone or in combination. Using all methods together ensures that the browser understands the document’s encoding, even if it doesn’t support that encoding. Again, including encoding declarations may confuse some older browsers. Let’s look at each of the methods more closely. Specifying encoding using the XML declaration is very easy, and you’ve seen it in the examples in Chapter 1: <?xml version="1.0" encoding="UTF-8"?> You can specify encoding in a <meta> tag by adding the following element to the <head> section of your XHTML document: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Slide 86: CHAPTER 3 ■ WEB VOCABULARIES 65 CHOOSING AN ENCODING UTF-8 is a Unicode character set that supports the first 128 ASCII characters, as well as additional characters. Documents using only simple ASCII characters can use UTF-8 encoding. The basic ASCII character set doesn’t include European characters that include accents, and the numeric values for each character may vary depending on the specified encoding. If you’re running an English version of Windows, your default encoding is compatible with ISO-8859-1. This encoding is supported widely, so changing the encoding declaration to ISO-8859-1 allows European characters to display correctly. Encoding rules are often complex. XML supports UTF-8 and UTF-16 encoding by default. UTF-16 is a large character set that includes many Chinese and Japanese characters, among others. In order to have numeric values for all of the characters, it uses two or more bytes for each character, instead of one byte as in UTF-8 and ASCII. Simple text editors may not support encoding other than UTF-8 or ASCII. For more information about different encoding specifications, visit http://www.unicode.org/. Again, this line tells the browser what type of content the document contains. In the preceding <meta> tag, you specify text/html as the document type and ISO-8859-1 as the encoding. If a document contains both the XML declaration and the <meta> element, the browser uses the encoding value in the XML declaration. Browsers that don’t support the XML declaration use the <meta> value. You can also use the HTTP header Content-Type to specify encoding on the web server. This approach provides the most reliable way to specify the encoding in an XHTML document. You can set the header using any server-side technology. Specifying Language HTML 4.0 and XHTML 1.0 allow you to specify the language for a document or element using the lang attribute. Web browsers can use this information to display elements in languagespecific ways. For example, hyphenation may change depending on the language in use. Additionally, screen readers may read the text using different voices, depending on the language specified. The following lang attribute specifies the U.S. version of English as the language for the document: <body lang="en-US"> You can find out more about which attribute values to use at http://www.w3.org/TR/ REC-html40/struct/dirlang.html. XHTML 1.1 replaces the lang attribute with xml:lang. In addition to XHTML, many other web vocabularies use this attribute from the xml namespace. This makes XHTML much more compatible with other XML applications. If you want a quick refresher on namespaces, see the section, “Understanding the Role of XML Namespaces,” in Chapter 2.
Slide 87: 66 CHAPTER 3 ■ WEB VOCABULARIES XHTML Tools You can use three kinds of tools to edit your XHTML documents: • Simple text editors • XML editors • XHTML editors Each of these tool types offers different benefits. Let’s explore these types in more detail. Text Editors Because XHTML is a text-based format, you can create document markup in text editors, including Notepad on Windows, SimpleText on Macintosh, and Vim on Linux. These editors aren’t specifically designed to create XHTML or XML documents, so they have very few features that can assist with authoring. They can’t provide information about whether a document is well formed or valid, and they don’t provide any type of color-coding for the text. Although they have significant limitations, text editors are often useful because they exist on almost all computers and start up very quickly. The most useful text editors can display line numbers, which are invaluable for tracking down parser errors. XML Editors Many XML editors are designed to work specifically with XML documents. These editors offer many advantages over text editors, not the least of which is automatic color coding for elements within the document. Although not written specifically for XHTML, XML editors can still provide tag completion so your elements close automatically. In addition, XML editors allow you to check that your document is well formed and valid, based on its DTD or XML schema. Some popular XML editors include • Altova’s XMLSpy: http://www.altova.com/products_ide.html • Stylus Studio’s XML Editor: http://www.stylusstudio.com/xml/editor/ • Topologi’s Markup Editor: http://www.topologi.com/products/tme/index.html • TIBCO’s Turbo XML: http://www.tibco.com/software/business_integration/ turboxml.jsp • SyncRO Soft’s <oXygen/>: http://www.oxygenxml.com/index.html/ • Blast Radius’ XMetal: http://www.xmetal.com/en_us/products/xmetal_author/index.x • Wattle Software’s XMLwriter: http://www.xmlwriter.net/ Most of these products offer a trial version so that you can test whether they’ll suit your needs.
Slide 88: CHAPTER 3 ■ WEB VOCABULARIES 67 XHTML Editors Editors written specifically for XHTML documents can provide the most features. These tools often come with XHTML document templates and can warn you about potential display problems. Most importantly, many XHTML editors allow you to design XHTML visually without needing to see the markup. This can be very useful when designing complex layouts. Some common XHTML editors include • Adobe’s (formerly Macromedia) Dreamweaver: http://www.macromedia.com/software/dreamweaver/ • Microsoft’s FrontPage: http://www.microsoft.com/frontpage/ • W3C’s Amaya: http://www.w3.org/Amaya/ • Chami.com’s HTML-Kit: http://www.chami.com/html-kit/ • Adobe’s (formerly Macromedia) HomeSite: http://www.macromedia.com/software/ homesite/ • Belus Technology’s XStandard: http://xstandard.com/?program=google1 • Bare Bones Software’s BBEdit: http://www.barebones.com/products/bbedit/ index.shtml • NewsGator Technologies’ TopStyle: http://www.bradsoft.com/topstyle/ Again, you can often download a trial version so you can test the software against your needs. Well-Formed and Valid XHTML Documents Even if you follow the XHTML construction rules, you need to make sure that the document is both well formed and valid. These concepts are critical regardless of which XML vocabulary you use. In Chapter 1, you learned that an XML document must be well formed before it can be processed by an XML parser. Well-formed means that • The document contains one or more elements. • The document contains a single document element, which may contain other elements. • Each element closes correctly. • Elements are case-sensitive. • Attribute values are enclosed in quotation marks and cannot be empty. A document is valid if, in addition to being well formed, it uses the correct elements and attributes for the specified vocabulary. In XHTML, the DOCTYPE declaration determines which DTD is used and hence, the validity of elements and attributes.
Slide 89: 68 CHAPTER 3 ■ WEB VOCABULARIES Validity is an important concept for web developers because creating valid documents guarantees that your web site is interoperable with virtually any XML application. A number of online tools can check XHTML documents for validity. Online Validators In addition to the tools I mentioned previously, several web sites offer free online validation services. You can use them to check that your document is valid against specific versions of the XHTML specification. Two popular online validators include • W3C Markup Validation Service: http://validator.w3.org/ • WDG HTML Validator: http://www.htmlhelp.com/tools/validator/ I’ll validate one of the XHTML documents that you saw previously to show you how the W3C Markup Validation Service works. You need to use the Validate by File Upload option to validate an offline file. Open the web site (http://validator.w3.org/) and click the Browse button to select your file. In Figure 3-3, I’m validating the file marsstrict.htm. Figure 3-3. Uploading a file for validation at the W3C Markup Validation Service Click the Check button to validate the document. After validating, you can see whether the document is valid. You also might see some other messages about the page, as shown in Figure 3-4.
Slide 90: CHAPTER 3 ■ WEB VOCABULARIES 69 Figure 3-4. The validation results In addition to errors, the W3C validator may return warnings. Often, these warnings refer to possible character encoding or DOCTYPE problems. The warnings normally offer suggestions that allow you to address the issues. If you’re able to validate your entire site, you can display the W3C XHTML logo on your web page. If your validation produces an error message, fix the error and validate the document again. Where you’re notified of multiple errors, it’s usually easier to revalidate after fixing each error, because a single error can often cause multiple errors later in the document. I’ll deliberately introduce errors into the marstransitional.htm page so you can see the effect on validation. I’ve left out the closing </h1> tag and introduced an <unknown> element. The document now reads like this: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Mars Travel</title> </head> <body bgcolor="#FFFFFF"> <unknown>Some text</unknown> <h1 align="center">Mars Travel<br /> <i>Visits to a faraway place </i>
Slide 91: 70 CHAPTER 3 ■ WEB VOCABULARIES <hr width="100%" /> <h2 align="center">Your spacecraft</h2> <p align="center"> Your spacecraft is the Mars Explorer, which provides the latest in passenger luxury and travel speed. </p> <hr width="100%" /> <p align="center">XHTML 1.0 Transitional Document</p> </body> </html> I’ve saved this document as marstransitionalerror.htm if you want to try validating it yourself. Figure 3-5 shows the effect of validating this document. Figure 3-5. Validation errors Validating a web site is an important step. The next section looks at some common practices that can cause validation errors. Validation Errors Unfortunately, the everyday practices of web professionals can cause validation errors. Some common issues involve
Slide 92: CHAPTER 3 ■ WEB VOCABULARIES 71 • Including JavaScript in your page • Embedding advertising information • Including unsupported elements and attributes In this section, I’ll show you some practical tips to address these issues. Many of these tips may be helpful when working with other web vocabularies. Including JavaScript in Your Page For validity, it’s best to store your JavaScript in a separate file and refer to it with the <script> element: <script type="text/javascript" src="mars.js" /> If you can’t avoid embedding JavaScript in an XHTML document, place the JavaScript code within a <![CDATA[...]]> element so that it is not interpreted as XHTML by the browser. JavaScript can include characters that otherwise cause the document to fail the well-formed test. Instead of using the following code <script type="text/javascript"> <!-function maxnumber(a, b) { if (a > b) then return a; if (a < b) then return b; if (a = b) then return a; } --> </script> rewrite it like this: <script type="text/javascript"> <![CDATA[ function maxnumber(a, b) { if (a > b) then return a; if (a < b) then return b; if (a = b) then return a; } ]]> </script>
Slide 93: 72 CHAPTER 3 ■ WEB VOCABULARIES Embedding Advertising Information Many web sites display advertising information on their pages. If the advertisement isn’t valid XHTML, you must make sure that you’re using the XHTML 1.0 transitional DTD. You can also add the advertiser information to the page using JavaScript. This ensures that the content displays in the browser, but at the same time, you can ensure that the XHTML page is valid. Make sure that you follow the preceding JavaScript guidelines. I’ll cover some advanced JavaScript techniques in Chapter 8. Including Unsupported Elements and Attributes In some cases, you may need to add invalid content to the XHTML page. Using unsupported elements isn’t good practice, because it ultimately limits your audience. However, there might be times when you want to add • Elements or attributes that existed in earlier versions of HTML • Elements or attributes that are specific to one browser • New elements or attributes The first two situations commonly occur when you’re trying to build a web site for a specific browser, or when you’re trying to convert an older web site to XHTML. You can add this kind of information in several ways. As I discussed in the previous section, you can add the content using JavaScript after the page loads. Another more complex option is to test for the browser type and version and return appropriate pages to the user. By maintaining templates on the web server, you can quickly transform your web page to support various browsers using XSLT. XHTML Modularization A primary goal of XML is to create a simple markup language that you can extend easily. XHTML 1.1 simplifies the process of extending the XHTML definition. You can add any vocabulary to XHTML through a process called modularization. Although XHTML modularization is complex, you can still enjoy the benefits. The W3C has released a working draft of a modularization that supports the MathML and SVG vocabularies. These two vocabularies are commonly embedded within XHTML and vice versa. You can find out more at http://www.w3.org/TR/XHTMLplusMathMLplusSVG/. You might need to limit rather than extend the XHTML specification. XHTML Basic provides a subset of the basic modules of XHTML for use on mobile devices; find out more at http://www.w3.org/TR/xhtml-basic/. Using these new vocabularies is very similar to using the other document types you’ve seen in this chapter. You need to follow the rules of the new document type and declare the appropriate DOCTYPE. The DOCTYPE declaration for XHTML plus MathML plus SVG is <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
Slide 94: CHAPTER 3 ■ WEB VOCABULARIES 73 The DOCTYPE declaration for XHTML Basic is <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd"> I’ve introduced you to the basics of XHTML, examining it as a vocabulary of XML. Now let’s move on to examine some of the other popular web vocabularies, starting with MathML and SVG. MathML Mathematical Markup Language (MathML) is a popular XML vocabulary that describes mathematical notation. It was developed to include mathematical expressions on web pages. MathML is an XML vocabulary, so it must be well formed and valid according to the specification. You can find out more about MathML at http://www.w3.org/Math/. While the W3C MathML group was developing the specification, the group realized it actually had two distinct goals. There was a need for a vocabulary that could represent both how mathematic equations were displayed, as well as the meaning of a mathematic equation. The group divided MathML into two types of encoding: presentation and content. Presentation MathML conveys the notation and structure of mathematical formulas, while Content MathML communicates meaning without being concerned about notation. You can use either or both of these elements, depending on your task, but be aware that each has some web browser limitations. Firefox supports Presentation MathML, as MathML is part of Mozilla’s layout engine. The derived browsers Netscape, Galeon, and Kmeleon also include Presentation MathML, as does the W3C browser Amaya. Internet Explorer 6 supports MathML using plugins such as the free MathPlayer (http://www.dessci.com/en/products/mathplayer/) and techexplorer (http://www.integretechpub.com/techexplorer/). You can’t use MathML within Opera. Presentation MathML Presentation MathML provides control over the display of mathematic notation in a web page. Thirty presentation elements and around 50 attributes allow you to encode mathematical formulas. Presentation MathML tries to map each presentation element to an element. To start, Presentation MathML divides a formula into vertical rows using <mrow> elements. This basic element is used as a wrapper. Rows may contain other nested rows. Each <mrow> element usually has a combination of mathematical numbers (<mn>), mathematical identifiers (<mi>), and mathematical operators (<mo>). This example represents 10 + (x ✕ y)4: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd"> <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> <mn>10</mn> <mo>+</mo> <msup>
Slide 95: 74 CHAPTER 3 ■ WEB VOCABULARIES <mfenced> <mrow> <mi>x</mi> <mo>*</mo> <mi>y</mi> </mrow> </mfenced> <mn>4</mn> </msup> </mrow> </math> In the preceding document, you start with an XML declaration, adding the DOCTYPE declaration for MathML and including the <math> document element. The document includes a default namespace for the MathML vocabulary: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd"> <math xmlns="http://www.w3.org/1998/Math/MathML"> Next, the document includes an <mrow> element, which represents the horizontal row of the equation. The row begins with the number 10 and includes a mathematical additional operator +: <mrow> <mn>10</mn> <mo>+</mo> It then includes an <msup>, or mathematical superscript, section. This section allows the display of exponents and the <mn> element before the closing </msup> element indicates that the contents are raised to the power of 4. The <msup> element includes an <mfenced> element, which corresponds to the use of brackets in a mathematical equation. Within the brackets, the equation multiplies x by y: <msup> <mfenced> <mrow> <mi>x</mi> <mo>*</mo> <mi>y</mi> </mrow> </mfenced> <mn>4</mn> </msup>
Slide 96: CHAPTER 3 ■ WEB VOCABULARIES 75 You’ll find this document saved as mathml_presentation.mml with the code download resources. I also could have saved it with a .xml file extension. Figure 3-6 shows the effect of opening this document in Firefox 1.5. Figure 3-6. A Presentation MathML document displayed in Firefox 1.5 ■ Note Firefox may prompt you to install some additional fonts from http://www.mozilla.org/ projects/mathml/fonts/. Installing these fonts ensures that Firefox can render all mathematical symbols in your MathML document correctly. If you try to view this document in a browser that doesn’t support MathML, such as Opera 8.5, you’ll see something similar to the image shown in Figure 3-7. Figure 3-7. A Presentation MathML document displayed in Opera 8.51 Notice that the browser doesn’t render the markup correctly. It doesn’t insert the parentheses or raise the exponent. Essentially, it ignores all of the MathML elements and displays only the text within the XML document. You can find a slightly more advanced example in the file quadratic_equation_ presentation.mml. You need to install the Firefox MathML-enabled fonts in order to see the square root sign rendered correctly, as shown in Figure 3-8.
Slide 97: 76 CHAPTER 3 ■ WEB VOCABULARIES Figure 3-8. Firefox showing a more complicated MathML page Content MathML Content MathML allows you to be very explicit about the order of operations and primary equation representation. Content markup has around 100 elements and 12 attributes. Content MathML documents begin in the same way as Presentation MathML documents. They also contain <mrow> elements to separate the lines of the equation. However, Content MathML elements don’t use the <mo> element for mathematical operators. Instead, they use the <apply> element and specific operator and function elements. This becomes clearer when you look at the same example written in Content MathML: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd"> <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> <apply> <plus/> <ci>10</ci> <apply> <power/> <apply> <times/> <ci>x</ci> <ci>y</ci> </apply> <cn>4</cn> </apply> </apply> </mrow> </math> You can find the document saved as mathml_content.mml with your resources. Let’s walk through the example.
Slide 98: CHAPTER 3 ■ WEB VOCABULARIES 77 The document starts with an XML declaration, a DTD reference, and the document root, including the MathML namespace. Then, like the Presentation XML example, you include an <mrow> element: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd"> <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> From here on, the similarity ends. The example uses the <apply> element with <plus/> to include the addition operator with the value 10: <apply> <plus/> <ci>10</ci> Another <apply> element surrounds the <power/> element, and the value of 4 is indicated immediately before the corresponding closing element: <cn>4</cn> The x ✕ y section is contained within a third <apply> block that uses the <times/> element to indicate multiplication: <apply> <times/> <ci>x</ci> <ci>y</ci> </apply> The differences are obvious. Instead of <mi> and <mn> elements, the vocabulary uses <ci> and <cn>. There is no need for the <mfenced> element because you can be specific about the order of operations by using the <apply> element. In the preceding example, all of the operators use postfix notation. In postfix notation, you indicate the operation first and then follow that by the operand(s). Some MathML functions use postfix notation, and some don’t. For a complete listing, see http://www.w3.org/TR/MathML2/appendixf.html. You can’t view this document in the web browser because that’s not the purpose of Content MathML. Instead, it’s supposed to be processed by a MathML engine, which may also perform the calculation. Most web browsers simply ignore all of the elements and only display the text, as you saw in the earlier Opera example. Scalable Vector Graphics SVG was developed so that designers could represent two-dimensional graphics using an XML vocabulary. Just as MathML provides a detailed model to represent mathematical notation, SVG allows for the display of graphics with a high level of detail and accuracy. Again, because SVG is an XML vocabulary, it must follow the rules of XML. You can find out more about SVG at http://www.w3.org/Graphics/SVG/.
Slide 99: 78 CHAPTER 3 ■ WEB VOCABULARIES SVG has wide acceptance and support with many available viewers and editors. Both Firefox 1.5 and Opera 8 support SVG in some form, as does Amaya. For other browsers, you need to use plugins such as Adobe’s SVG Viewer to view SVG documents. You can download the Adobe SVG Viewer plugin from http://www.adobe.com/svg/. You can find the current SVG specification version 1.1 at http://www.w3.org/TR/SVG11/. The SVG 1.2 specification is currently under development. You can break down SVG into three parts: • Vector graphic shapes • Images • Text Let’s look at each of these in more detail. Vector Graphic Shapes Vector graphics allow you to describe an image by listing the shapes involved. In a way, they provide instructions for creating the shapes. This is in contrast to bitmap or raster graphics, which describe the image one pixel at a time. Because you store vector graphics as a set of instructions, these images are often much smaller than their raster-based counterparts. In SVG, you can represent vector graphics using either basic shape commands or by specifying a list of points called a path. You can also group objects and make complex objects out of more simple ones. To get an idea about how you can work with shapes, let’s look at an SVG document that describes a basic rectangle: <?xml version="1.0"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> <svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg"> <desc>A simple rectangle with a red border</desc> <rect x="10" y="10" width="200" height="200" fill="none" stroke="red" stroke-width="10"/> </svg> This file is saved as svg_rectangle.svg. Opening it in an SVG viewer or SVG native browser shows something similar to the image in Figure 3-9.
Slide 100: CHAPTER 3 ■ WEB VOCABULARIES 79 Figure 3-9. A simple SVG document displayed in Opera 8.51 The document starts with an XML and DOCTYPE declaration and includes a document element called <svg>. Notice that the document element includes a reference to the SVG namespace, as well as attributes determining the size: <?xml version="1.0"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> <svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg"> In addition to creating basic shapes, SVG allows you to add complex fill patterns and other effects, as you can see in this example: <?xml version="1.0"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> <svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg"> <desc>A simple rectangle with a red border and a gradient fill</desc> <g> <defs> <linearGradient id="RedGradient" gradientUnits="objectBoundingBox"> <stop offset="0%" stop-color="#F00" /> <stop offset="100%" stop-color="#FFF" /> </linearGradient> </defs> <rect x="10" y="10" width="200" height="200" fill="url(#RedGradient)" stroke="red" stroke-width="10"/> </g> </svg>
Slide 101: 80 CHAPTER 3 ■ WEB VOCABULARIES I’ve saved this document as svg_rectangle_fill.svg. When viewed in an appropriate viewer, it appears as shown in Figure 3-10. Figure 3-10. A shape with a fill shown in Opera 8.51 This example creates a linear gradient in the <g> graphic object element called RedGradient: <linearGradient id="RedGradient" gradientUnits="objectBoundingBox"> <stop offset="0%" stop-color="#F00" /> <stop offset="100%" stop-color="#FFF" /> </linearGradient> The rectangle element then specifies that you should use the RedGradient fill element: <rect x="10" y="10" width="200" height="200" fill="url(#RedGradient)" stroke="red" stroke-width="10"/> The SVG 1.1 specification allows you to create the following basic shapes: <rect>, <circle>, <ellipse>, <line>, <polyline>, and <polygon>. Images You also can include raster graphics in an SVG page. You might need to do this if you want to include an image of a person or landscape, or any other photo-realistic image, that you can’t represent adequately as a vector drawing. Including images in SVG is very simple: <?xml version="1.0"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> <svg width="282px" height="187px" viewBox="0 0 282 187"
Slide 102: CHAPTER 3 ■ WEB VOCABULARIES 81 xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <desc>This SVG document contains lions.jpg</desc> <image x="0" y="0" width="282px" height="187px" xlink:href="lions.jpg"> <title>Two lions</title> </image> </svg> This file is saved as lions.svg. Figure 3-11 shows how it renders in Firefox. Figure 3-11. An SVG page showing an image of lions The markup is self-explanatory. You can control how the image is displayed by changing the attributes in the SVG document. It’s important to realize that the image isn’t converted to a vector graphic. Instead, it maintains its original raster format and is drawn to the SVG display. Text In addition to creating basic shapes and including images, SVG documents can represent text. This example creates text that has a color gradient outline: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-flat-20030114.dtd"> <svg width="20cm" height="4cm" viewBox="0 0 400 400" xmlns="http://www.w3.org/2000/svg"> <desc>This SVG document contains rainbow text</desc> <g> <defs> <linearGradient id="RedBlueGradient" gradientUnits="objectBoundingBox"> <stop offset="0%" stop-color="#F00" />
Slide 103: 82 CHAPTER 3 ■ WEB VOCABULARIES <stop offset="100%" stop-color="#00F" /> </linearGradient> </defs> <text x="-600" y="200" font-size="128" fill="white" stroke="url(#RedBlueGradient)" stroke-width="5"> SVG creates gradient text! </text> </g> </svg> This file appears as svg_gradienttext.svg with your resources. Figure 3-12 shows how it appears when open in an SVG viewer. Figure 3-12. Gradient text created with an SVG document The simple examples you’ve seen so far are only the beginning of what you can achieve with SVG. Let’s move on to a more complicated example involving animation. Putting It Together SVG allows you to create animations, and in the next example, I’ll create an animation for the imaginary “Mars Travel” web site. The completed file is saved as marstravel.svg with your resources. Note that you won’t be able to view the page with Mozilla unless you use a plugin. Mozilla’s native support doesn’t extend to SVG animations. The page starts with declarations: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-flat-20030114.dtd"> <svg width="16cm" height="9cm" viewBox="0 0 1000 600" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <desc>Mars Travel introduction</desc>
Slide 104: CHAPTER 3 ■ WEB VOCABULARIES 83 These declarations add the XML and DOCTYPE declarations and set the size of the drawing. I’ve used the <desc> element to add a description for the page. In the next step, I’ve added an image for the background. Make sure that you save the resource file, mars.jpg, in the same location as the svg file: <image x="650" y="100" width="250" height="250" xlink:href="mars.jpg"/> The first animation occurs in the next section of the SVG document: <rect width="300" height="100" fill="rgb(200,200,200)" fill-opacity="0.25"> <animate attributeName="y" attributeType="XML" from="500" to="-100" dur="4s" repeatCount="indefinite" fill="freeze" /> </rect> The lines create a <rect> object and fill it with a medium-gray color. The fill-opacity is set to 0.25. This attribute accepts values between 0 (completely transparent) and 1 (completely opaque). The block also includes an <animate> element that modifies the y attribute from the value 500 to the value -100. This moves the block in an up-and-down motion. The element specifies that the animation lasts for four seconds with the dur attribute and that it repeats indefinitely using repeatCount="indefinite". The fill="freeze" attribute specifies that the fill doesn’t change during the animation. In this example, I’ve made the effect more interesting by adding six more moving <rect> objects that cross one another: <rect width="300" height="400" fill="rgb(200,200,200)" fill-opacity="0.5"> <animate attributeName="y" attributeType="XML" from="600" to="-400" dur="14s" repeatCount="indefinite" fill="freeze" /> </rect> <rect width="300" height="14" fill="rgb(200,200,200)" fill-opacity="0.25"> <animate attributeName="y" attributeType="XML" from="600" to="-40" dur="3s" repeatCount="indefinite" fill="freeze" /> </rect> <rect width="300" height="4" fill="rgb(200,200,200)" fill-opacity="0.75"> <animate attributeName="y" attributeType="XML" from="500" to="-4" dur="2s" repeatCount="indefinite" fill="freeze" /> </rect> <rect width="300" height="300" fill="rgb(200,200,200)" fill-opacity="0.75"> <animate attributeName="y" attributeType="XML" from="-300" to="500" dur="8s" repeatCount="indefinite" fill="freeze" /> </rect> <rect width="300" height="14" fill="rgb(200,200,200)" fill-opacity="0.75"> <animate attributeName="y" attributeType="XML" from="-90" to="510" dur="3s" repeatCount="indefinite" fill="freeze" /> </rect> <rect width="300" height="4" fill="rgb(200,200,200)" fill-opacity="0.75"> <animate attributeName="y" attributeType="XML" from="-100" to="500" dur="2s" repeatCount="indefinite" fill="freeze" /> </rect>
Slide 105: 84 CHAPTER 3 ■ WEB VOCABULARIES The rectangles are partly transparent, so they produce some interesting effects as they overlap. If you test the document now, you’ll see something similar to the screen shot shown in Figure 3-13. Figure 3-13. The SVG animation so far The next block of code adds some text and vertical separators: <!-- Default text --> <text x="295" y="575" <text x="295" y="590" <!-- Separator --> <line x1="300" y1="0" <line x1="305" y1="0" text-anchor="end">Scalable Vector Graphics</text> text-anchor="end">by Mars Travel</text> x2="300" y2="600" stroke-width="2" stroke="gray"/> x2="305" y2="600" stroke-width="1" stroke="gray"/> The <text> element has the attribute text-anchor set to end. This is the equivalent of aligning the text to the right. If the SVG viewer you’re using has right-to-left reading enabled, the SVG aligns the text to the left. In either case, it aligns it to the “end” of the area. The following line animates the title of the site so that it flies in from the right side: <text x="1000" y="200" font-size="32" font-style="italic" font-weight="bold" font-family="verdana" fill="#C65B2E"> <animate attributeName="x" attributeType="XML" begin="0s" dur="2s" fill="freeze" from="1000" to="340"/> Mars Travel </text> The <text> element lists the text properties and also includes the <animate> element so that the text moves in from the right. It takes two seconds for the text to arrive at its final position.
Slide 106: CHAPTER 3 ■ WEB VOCABULARIES 85 The next code block adds some more text that enters after the “Mars Travel” text: <text x="1000" y="224" font-size="24" font-style="italic" font-weight="bold" font-family="verdana" fill="#C65B2E" > <animate attributeName="x" attributeType="XML" begin="2.5s" dur="2s" fill="freeze" from="1000" to="340" /> Out of this world! </text> Finally, the page completes with a <text> element and closing <svg> tag. The text is linked so that users can visit the rest of the web site: <a xlink:href="http://www.apress.com/"> <text x="750" y="467" fill="#C65B2E" font-weight="bold" font-family="verdana" font-size="24">ENTER >>></text> </a> </svg> This completes the SVG page. When you view it, you should see an animated version of the screen shot shown in Figure 3-14. Figure 3-14. The completed SVG animation
Slide 107: 86 CHAPTER 3 ■ WEB VOCABULARIES Figure 3-14 shows the page displayed in Internet Explorer; I can view the SVG file in this browser because I have the Adobe SVG Viewer plugin installed. You could also view the page using the native SVG support in Opera 8.5 or in any other browser that has an SVG plugin installed. You should probably provide an alternative image for viewers who don’t have this plugin or an appropriate browser. Even though this SVG introduction is graphically rich, it isn’t inaccessible to people with disabilities. As you’ve seen, SVG documents can include the <desc> element, which provides an accessible text-based description of the document. Let’s move on to two more XML vocabularies that you can use with Web services: WSDL and SOAP . Web Services Web services allow organizations to use the Internet to provide information to the public through XML documents. You can see examples of web services at Amazon and Google, where developers can interact with live information from the databases of both companies. You have a number of different choices for working with web services, but all deliver their content in an XML document. When someone receives this information, it’s called “consuming” a web service. In this section, you’ll briefly look at two of the XML vocabularies that impact the area: Web Services Description Language (WSDL) and Simple Object Access Protocol (SOAP). Both of these sections are more technical than the previous vocabularies that you’ve seen in this chapter. Let’s begin with WSDL. You won’t need to be able to write this language yourself, as it’s usually generated automatically. However, I’ll explain the WSDL file, as it’s useful to understand its structure. WSDL WSDL is an XML vocabulary that describes web services and how you can access them. A WSDL document lists the operations or functions that a web service can perform. A web programming language usually carries out these operations in an application that isn’t accessible to the consumer. The WSDL file describes the data types as well as the protocols used to address the web service. Microsoft, Ariba, and IBM jointly developed WSDL. They submitted the WSDL 1.1 specification to the W3C as a note. The W3C accepted the note, which you can see at http:// www.w3.org/TR/wsdl. The W3C is currently working on the WSDL 2.0 recommendation. You can see the primer for the working draft at http://www.w3.org/TR/2004/WD-wsdl20primer-20041221/. You normally don’t write the WSDL file yourself using XML tools. Instead, your web services toolkit usually generates the file automatically. However, understanding the structure of the WSDL document can be useful.
Slide 108: CHAPTER 3 ■ WEB VOCABULARIES 87 Understanding WSDL Document Structure WSDL files are stored in locations that are accessible via the web. Anyone consuming the web service accesses these files. For example, you can find the Google web search WSDL at http://api.google.com/GoogleSearch.wsdl. A WSDL document starts with an optional XML declaration and contains the <types>, <message>, <binding>, and <service> elements. The following code block shows the file structure of a WSDL file: <?xml version="1.0" encoding="utf-8" ?> <definitions> <types> <!-- datatype definitions --> </types> <message> <!-- message definitions --> </message> <portType> <operation> <!-- operation definitions --> </operation> </portType> <binding> <!-- binding definitions --> </binding> ..<service> ..</service> </definitions> Table 3-1 explains each of the sections. Table 3-1. The Major Elements Within a WSDL File Element <definitions> <types> <message> <portType> <operation> <binding> <service> Explanation Provides the root element for the WSDL document and contains the other elements Defines the data types used by the web service Describes the messages used when the web service is consumed Combines messages to create the library of operations available from the web service Defines the operations that the web service can carry out Lists the communication protocols that a user can use to consume the web service and the implementation of the web service Defines the address for invoking the web service—usually a URL to a SOAP service
Slide 109: 88 CHAPTER 3 ■ WEB VOCABULARIES Defining Web Service Data Types When someone consumes a web service, the service receives the request, queries an application, and sends an XML document containing the results in response. In order to use the web service, the consumer must know how to phrase the request as well as the format for the returned information. It’s crucial to understand the data types used by the web service. The WSDL document defines the data types for both the inputs to and the outputs from the web service. These might equate to the data types listed in the XML schema recommendation, or they could be more complicated, user-defined data types. If you’re only using W3C built-in simple data types, the WSDL file doesn’t include the <types> element. The XML schema namespace appears in the <definitions> element and references data types in the <message> elements: xmlns:xsd="http://www.w3.org/2001/XMLSchema" Custom data type definitions appear in the <types> element. The WSDL file can use XML schema declarations or any alternative schema system for defining these data types: <types> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"> <!-- schema declarations here --> ..</schema> </types> Mapping Data Types to Messages A consumer calls the web service and provides inputs. These inputs map to <message> elements. Each message has <part> elements that refer to each of the inputs received: <message name="mName"> <part name="mInputName" type=" mInputNameType"/> </message> The types referred to in the <message> elements must come from one of the schema namespaces within the document. If the type refers to the simple built-in data types from the XML schema recommendation, the element includes a reference to the XML schema namespace: <message name="mName"> <part name="mNameIO" type="xsd:string"/> </message> Listing Web Service Operations The most important element in the WSDL document is the <portType> element. This element defines all of the operations that are available through the web service. The <portType> element is like a library of all of the available operations.
Slide 110: CHAPTER 3 ■ WEB VOCABULARIES 89 The <portType> element contains <operation> elements that have <input> and <output> elements. Inputs pass to an application for processing. The outputs are the responses received from the application that are passed to the consumer: <portType name="ptName"> <operation name="oName"> <input message="oNameRequest"/> <output message="oNameResponse"/> </operation> </portType> The <message> elements define the inputs and outputs. They are normally prefixed with the current document’s namespace. A web service can carry out four types of operations. The most common is the requestresponse type. In this type, the web service receives a request from a consumer and supplies a response. A web service can also carry out a one-way operation, where a message is received but no response is returned. In this case, the operation has an <input> element. The other options are solicit-response, where the web service sends a message and then receives a response. It is the opposite of a one-way operation. The operation has an <output> element followed by an <input> element. You can also specify a <fault> element. The final option is notification, where the service sends a message and only has an <output> element. Mapping to a Protocol The <portType> element contains all of the operations for a web service. Bindings specify which transport protocol each portType uses. Transport protocols include HTTP POST, HTTP GET, and SOAP You can specify more than one transport protocol for each portType. Each . binding has a name and associated type that associates with a portType. If you’re using SOAP 1.1, WSDL 1.1 includes details specific to SOAP The binding specifies . a <soap:binding> element, which indicates that the binding will use SOAP This element . requires style and transport attributes. The style attribute can take values of rpc or document. Document style specifies an XML document call style. Both the request and response messages are XML documents. rpc style uses a wrapper element for both the request and response XML documents. The transport attribute indicates how to transport the SOAP messages. It uses values such as http://schemas.xmlsoap.org/soap/http http://schemas.xmlsoap.org/soap/smtp The following example specifies a SOAP 1.1 transport mechanism over HTTP using an rpc interaction: <binding name="bName" type="bType"> <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http"/> <!-- declarations--> </soap:binding> </binding>
Slide 111: 90 CHAPTER 3 ■ WEB VOCABULARIES The web service binds each operation using the following format. The operation name corresponds with the operation defined earlier in the <portType> element. The soapAction attribute shows the destination URI including a folder, if necessary: <soap:operation name="oName" soapAction="URI"> <input> <soap:body use="literal"/> </input> <output> <soap:body use="literal"/> </output> </soap:operation> You can also specify an optional SOAP encoding for each operation. Specifying Processing Software The <service> element shows where to process the requested operation. The service has a name attribute and a child <port> element. The <port> element specifies a portType for binding. The <port> element also has a name attribute. If you’re using SOAP the <soap:address> element specifies the location of the processing , application: <service name="sName"> <port binding="portTypeName" name="pName"> <soap:address location="URI/> </port> </service> The file can also include a <documentation> element as a child of <service> to provide a human-readable description of the service. Viewing a Sample WSDL Document The concepts behind a WSDL file are easier to understand with an example. The following example shows a simple fictitious WSDL document: <?xml version="1.0" encoding="utf-8" ?> <definitions name="Author" targetNamespace="http://www.apress.com/wsdl/Authors.wsdl xmlns:tns="http://www.apress.com/wsdl/Authors.wsdl" xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <message name="getAuthorRequest"> <part name="book" type="xsd:string"/> </message> <message name="getAuthorResponse"> <part name="author" type="xsd:string"/> </message> <portType name="authorRequest">
Slide 112: CHAPTER 3 ■ WEB VOCABULARIES 91 <operation name="getAuthor"> <input message="tns:getAuthorRequest"/> <output message="tns:getAuthorResponse"/> </operation> </portType> <binding name="authorSOAPBinding" type="tns:authorRequest"> <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http"/> <operation name="getAuthor"> <soap:operation soapAction="http://www.apresscom/getAuthor"/> <input> <soap:body use="literal"/> </input> <output> <soap:body use="literal"/> </output> </operation> </binding> <service name="authorSOAPService"> <port binding="tns:authorSOAPBinding" name="Author_Port"> <soap:address location="http://www.apress.com:8080/soap/servlet/rpcrouter/"> </port> </service> </definitions> Notice that this WSDL file contains a number of namespaces: <?xml version="1.0" encoding="utf-8" ?> <definitions name="Author" targetNamespace="http://www.apress.com/wsdl/Authors.wsdl xmlns:tns="http://www.apress.com/wsdl/Authors.wsdl" xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> The targetNamespace in the document element allows the document to reference itself. It uses a prefix of tns for the namespace. The document element includes the default WSDL namespace http://schemas.xmlsoap.org/wsdl/ as well as a reference to the XML schema namespace http://www.w3.org/2001/XMLSchema. The WSDL document includes two <message> elements—one request and one response. The data types are the built-in xsd:string types: <message name="getAuthorRequest"> <part name="book" type="xsd:string"/> </message> <message name="getAuthorResponse"> <part name="author" type="xsd:string"/> </message>
Slide 113: 92 CHAPTER 3 ■ WEB VOCABULARIES The <portType> contains a single operation called getAuthor. The getAuthor operation has both input and output messages, which correspond to the string <message> elements: <portType name="authorRequest"> <operation name="getAuthor"> <input message="tns:getAuthorRequest"/> <output message="tns:getAuthorResponse"/> </operation> </portType> The binding specifies the SOAP 1.1 protocol over HTTP using the rpc style: <binding name="authorSOAPBinding" type="tns:authorRequest"> <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http"/> <operation name="getAuthor"> <soap:operation soapAction="http://www.apress.com/getAuthor"/> <input> <soap:body use="literal"/> </input> <output> <soap:body use="literal"/> </output> </operation> </binding> The application addressed by the web service is located at http://www.apress.com:8080/ soap/servlet/rpcrouter/: <service name="authorSOAPService"> <port binding="tns:authorSOAPBinding" name="Author_Port"> <soap:address location="http://www.apress.com:8080/soap/servlet/rpcrouter/"> </port> </service> You’re not likely to have to write WSDL documents yourself, but understanding how they work can be useful. You can see an example of a more complicated WSDL file at http:// soap.amazon.com/schemas2/AmazonWebServices.wsdl. The next section explains the SOAP protocol, one of the most popular ways to consume a web service. SOAP SOAP is another XML vocabulary that works with web services. You can send SOAP messages using HTTP and even email. When consuming a SOAP web service, the consumer sends a SOAP message to a receiver, who acts upon it in some way. For example, the SOAP message could contain a method name for a remote procedure call. The receiver could run the method on a web application and return the results to the sender.
Slide 114: CHAPTER 3 ■ WEB VOCABULARIES 93 In the simplest situation, the SOAP message involves a message between two points: the sender and the receiver. The number of messages could increase if the receiver has to send back another SOAP message to clarify the original request. A further SOAP message would then be required to respond to the clarification request. You also can send a SOAP message via an intermediary who acts before sending the message to the receiver. The SOAP 1.2 primer is available on the W3C web site at http://www.w3.org/TR/2003/ REC-soap12-part0-20030624/. You also can see the messaging framework at http:// www.w3.org/TR/2003/REC-soap12-part1-20030624/ and the adjuncts at http://www.w3.org/ TR/2003/REC-soap12-part2-20030624/. The “SOAP Version 1.2 Specification Assertions and Test Collection” document is available at http://www.w3.org/TR/2003/REC-soap12testcollection-20030624/. Creating a SOAP Message SOAP messages are XML documents that conform to the SOAP schema. Because SOAP is an XML vocabulary, a SOAP document must be well formed. A SOAP message can optionally include an XML declaration, but it can’t contain a DTD or processing instructions. The document element of a SOAP message is the <Envelope> element. It encloses all other elements in the message and must contain a reference to the soap-envelope namespace: <?xml version='1.0' ?> <env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> This namespace refers to the SOAP 1.2 specification. If the SOAP processor receiving the message expects a SOAP 1.1 message, it generates an error. You should match the namespace and SOAP version. For SOAP 1.1, use <env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"> Each SOAP message is different. It includes the parameters that are required for the operation. You can include a schema for the SOAP message so that you ensure that the contents are valid. A schema allows both the sender and the receiver to understand the format for the request and response. You can see the schema for a SOAP 1.2 message at http://www.w3.org/2003/05/soap-envelope/. Understanding the Contents of a SOAP Message SOAP messages have the following format: • The root <Envelope> element identifies the message as a SOAP message. • The <Body> element contains the content for the end destination. • The <Header> and <Fault> elements are optional. The following code shows the structure of a SOAP message: <?xml version='1.0' ?> <env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <!-- Optional header information --> </env:Header>
Slide 115: 94 CHAPTER 3 ■ WEB VOCABULARIES <env:Body> <!-- Body information --> <env:Fault> <!-- Optional fault information --> </env:Fault> ..</env:Body> </env:Envelope> Explaining SOAP Headers The SOAP <Header> element includes information additional to that required by the SOAP receiver. It’s optional, but if it’s present, it must appear directly after the <Envelope> element. The header often includes machine-generated information such as dates and times and unique session identifiers. Any child element within a <Header> element must be qualified with a namespace. You can include the mustUnderstand attribute in a header to require that the receiver must be able to interpret the header: <env:Header> <e:Element xmlns:e="http://www.apress.com" env:mustUnderstand="True"> <!--Element content--> </e:Element> </env:Header> You can also use a value of 1: <e:Element xmlns:e="http://www.apress.com" env:mustUnderstand="1"> The processor can only process the message if it understands all elements where the value of the mustUnderstand attribute is True. If it doesn’t, it returns an error message and ignores the rest of the SOAP message. A SOAP message may pass through other points on the way to its final destination. The intermediate points may need to act on some of the headers in the message. You use the actor attribute to address the element to an intermediary: <env:Header> <e:Element xmlns:t="http://www.apress.com" env:mustUnderstand="True" env:actor="http://www.apress.com/wsxml/"> </env:Header> Understanding the SOAP Body The <Body> element contains the information intended for the final destination. Any information contained in this element is mandatory. Child elements of the <Body> element can include a namespace declaration.
Slide 116: CHAPTER 3 ■ WEB VOCABULARIES 95 The information contained in the body must be well formed and must conform to the WSDL for the web service. In other words, the information must reference the operations set out in the WSDL. The following code shows a sample <Body> element: <env:Body> <b:getAuthor xmlns:b="http://www.apress.com/bookauthor"> <b:book>Beginning XML with DOM and Ajax</b:book> </b:getAuthor> </env:Body> In this fictitious example, the <Body> element makes a getAuthor request. This request takes one parameter <book>. In the example, you request the author details for the book Beginning XML with DOM and Ajax. The namespace http://www.apress.com/bookdetails qualifies the getAuthor request. The body of the returned information might look something like this: <env:Body> <b:getAuthorResponse xmlns:b="http://www.apress.com/ bookauthor"> <b:Author>Sas Jacobs</b:Author> </b:getAuthorResponse> </env:Body> Examining the Fault Element The optional <Fault> element provides information on faults that occurred when the message was processed. If present, it must contain two elements: <Code> and <Reason>. It can also contain an optional <Detail> element: <env:Envelope> <env:Body> <env:Fault> <env:Code> <env:Value>Value here</env:Value> </env:Code> <env:Reason> <env:Text xml:lang="en-US">Error reason here</env:Text> </env:Reason> </env:Fault> </env:Body> </env:Envelope> If there is a fault, the web service sends a fault message instead of a response. A SOAP processor can’t return both a response and a fault.
Slide 117: 96 CHAPTER 3 ■ WEB VOCABULARIES Explaining SOAP Encoding You can include an optional <encodingStyle> element in your SOAP message. For SOAP 1.2, use the following: <?xml version='1.0' ?> <env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> xmlns:enc="http://www.w3.org/2003/05/soap-encoding/" env:encodingStyle="http://www.w3.org/2003/05/soap-encoding"> You use the following format for SOAP 1.1: <?xml version='1.0' ?> <env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> xmlns:enc=" http://schemas.xmlsoap.org/soap/encoding/" env:encodingStyle=" http://schemas.xmlsoap.org/soap/encoding/"> The namespaces include definitions for the data types that you can use with SOAP encoding. Let’s summarize: • The WSDL vocabulary describes web services and their operations. • WSDL isn’t a W3C recommendation; rather, it was developed by Microsoft, Ariba, and IBM. • A WSDL file is usually generated automatically rather than being written by a human. • SOAP is an XML vocabulary that allows someone to consume a web service. • There are different versions of SOAP At the time of writing, the latest is version 1.2. . • SOAP messages request and receive information from web services. We’ll finish this chapter by looking at some of the other web XML vocabularies. Other Web Vocabularies I’ve given you a brief introduction to some of the most popular web vocabularies: XHTML, MathML, SVG, WSDL, and SOAP These vocabularies are only the tip of the iceberg, and new . vocabularies appear regularly. In this section, I’ll list some additional web vocabularies and provide a brief description of their use. RSS and News Feeds Really Simple Syndication or RDF Site Summary (RSS), commonly used in news feeds, is like a web service that works specifically with news. Companies such as The Associated Press (AP) and United Press International (UPI) make international stories available via RSS. You can find news feeds for each of them at http://www.newsisfree.com/syndicate.php. Smaller web sites can also provide news in this way. There are many different versions of the RSS specification. The current version is RSS 3, and you can find out more about at http://www.rss3.org/main.html.
Slide 118: CHAPTER 3 ■ WEB VOCABULARIES 97 VoiceXML VoiceXML is a W3C recommendation designed to represent aural communications on the web. VoiceXML includes support for voice-synthesizing software, digitized audio, and command-and-response conversations, among others. The VoiceXML vocabulary is surprisingly easy to understand: <?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form> <field name="gender"> <prompt>Are you female or male?</prompt> <grammar src="gender.grxml" type="application/srgs+xml"/> </field> <block> <submit next="gender.asp"/> </block> </form> </vxml> Using grammar documents to specify the expected responses to a user’s input, you can quickly create verbal forms to interact with users. You can find out more about VoiceXML at http://www.w3.org/Voice/. SMIL SMIL (Synchronized Multimedia Integration Language) is an XML vocabulary for authoring interactive multimedia presentations. The acronym, pronounced smile, is a W3C recommendation. You can find out more at http://www.w3.org/AudioVideo/. Like VoiceXML, SMIL is a relatively easy vocabulary to understand. It allows you to describe the layout of items on the screen, as well as the timing and synchronization of items in the presentation. SMIL documents can support the following media types: images, video, audio, animation, text, and textstream. You need a SMIL player or Internet Explorer 6 for Windows to be able to view your presentations. Database Output Formats Although database formats aren’t explicitly web vocabularies, you may encounter them in your development. Some popular formats include • Microsoft Access • Microsoft SQL Server • Oracle XML DB • IBM Informix • IBM DB2 Universal Database • Sybase
Slide 119: 98 CHAPTER 3 ■ WEB VOCABULARIES Each of these formats is different, but stylesheets are available that can handle the conversion from one type of database to another. Most of these databases have the ability to export their data directly as XML. Additionally, some tools can extract the information and format it as XML. I’ll show you examples of using XML with databases in Chapters 12 and 13. Summary This chapter presented an introduction to several XML vocabularies. I examined XHTML, the primary vocabulary in use on the web today. I also discussed SVG, MathML, and vocabularies involved with web services, along with some other, less well-known vocabularies. In the chapters that follow, you’ll learn how to use some of the common vocabularies of XML, and learn how they work together to create XML applications.
Slide 120: CHAPTER 4 Client-Side XML In Chapters 1, 2, and 3, you looked at XML and saw its application in some specific web vocabularies. The next section of the book deals with XML on the client-side—in the web browser and desktop environment. XML is well supported in the major web browsers, and most browsers have adopted the World Wide Web Consortium (W3C) standards in their implementations of XML. In this chapter, I’ll show you the different ways that you can use XML in web browsers. I’ll also talk about Adobe (formerly Macromedia) Flash and finish with a summary of the different client/server architectures that may apply in XML applications. Why Use Client-Side XML? To start with, it’s important to understand why you might want to work with XML on the client side. There are two reasons: • To reduce the amount of traffic between the server and client • To pass on more of the page-processing responsibility to the client Let’s examine the first reason. If you reduce the amount of data flowing between the client and server, you’ll provide for a better user experience. By removing some of the client/server communication, the browsing experience is faster, as the users aren’t waiting for server responses. Client-side XML also allows users to download XML in the background or as an asynchronous task. If the data has been loaded already, the users won’t perceive a lag when interacting with the page. A second advantage of using XML on the client is that the server can pass on more of the page-processing responsibility. This reduces the web server load and should also enhance the user experience. For example, an XML application could use a stylesheet to display an XML document in the browser rather than using a server-side page to extract and format content. Before getting started with client-side processing, it’s important to add one caution: Browser support is inconsistent in areas such as Extensible Stylesheet Language Transformations (XSLT), so be aware of this when designing client-side XML solutions. So how can you work with XML on the client? 99
Slide 121: 100 CHAPTER 4 ■ CLIENT-SIDE XML Working with XML Content Client-Side As you’ve seen in the previous chapters, XML is a language for marking up data. On the client, XML applications are likely to adopt one of the following approaches: • Display XML content in the browser using Cascading Style Sheets (CSS) and XSLT stylesheets. • Manipulate XML documents in the browser using Document Object Model (DOM), XSLT, and scripting languages such as JavaScript and VBScript. • Display and manipulate XML documents using Flash and ActionScript. I’ll examine each of these approaches in turn. Styling Content in a Browser The purpose of an XML document is to mark up information. Stylesheets separate the content of an XML document from its layout. XSLT and CSS play slightly different roles in this process. XSLT uses one XML document to generate another. It transforms a source XML tree into a destination XML tree. In the case of a web browser, the XSLT stylesheet uses elements in the XML document to generate XHTML. The XSLT stylesheet creates the XHTML elements by matching specific parts of the original XML document. CSS adds styling to the transformed elements. Although CSS can style an XML document directly, it can’t transform the document, as you’ll see in Chapter 5. While XSLT can also add styling, that’s not its main function. Figure 4-1 illustrates this relationship. Figure 4-1. The process of styling with an XML document By applying different CSS stylesheets to the same transformed XML document, you can repackage the content for a range of purposes. For example, you can use one stylesheet for a web browser display and another for a mobile phone. This gives the most flexibility to the presentation layer.
Slide 122: CHAPTER 4 ■ CLIENT-SIDE XML 101 Manipulating XML Content in a Browser Client-side code can use XML documents as a data source. JavaScript allows you to work with client-side XML to generate dynamic XHTML content. This provides an alternative to writing server-side pages that access external content, or storing large amounts of data within the client-side code in arrays. Using XML as an external data source allows you to keep the content separate from the presentation layer. It also allows you to update the data without reloading the web page. As an example, an XML document could provide information about the structure of a web site, and the application could use it to build a dynamic navigation system. Figure 4-2 shows how the server and client might work together to generate such a menu system. Figure 4-2. Client and server involvement in the manipulation of XML content
Slide 123: 102 CHAPTER 4 ■ CLIENT-SIDE XML I’ll explain this process: 1. When a browser requests a web page, a server-side scripting language such as Visual C# .NET (C#), Visual Basic .NET (VB .NET), or PHP can generate XHTML content. 2. The server-side logic can also include embedded XML content (a data island) within the XHTML page. 3. The XHTML page, including the XML data, is returned to the browser. 4. After loading, client-side code can access the XML within the data island and use it to generate dynamic XHTML content. 5. At the same time, you can load an XSLT stylesheet in the background. 6. When the user chooses an option from the dynamic menu, the page returns the appropriate XML data. The XSLT stylesheet can transform the XML content into XHTML and display it in the browser. You’ll learn more about this approach in the section, “Transforming XML into XHTML.” Flash movies offer an alternative to XHTML pages, as they can run either in a web browser or as standalone desktop applications. Working with XML in Flash Flash includes a range of tools for working with XML content. It doesn’t provide support for XSLT transformations, but it does include a scripting language, ActionScript, that provides similar XML functionality to that provided by JavaScript. Flash also contains tools for styling content. Versions of Flash from MX 2004 upward include user-interface (UI) components that you can bind directly to XML content. Further advantages of Flash are that it’s not tied to a web browser, and it runs in a variety of devices. Flash can generate standalone content that runs independently, and Flash Lite 2.0 for mobile phones allows for the inclusion of XML content. Flash includes a number of prebuilt components. Some of these components work with data such as XML documents. Other UI components provide functionality similar to that within XHTML forms. Figure 4-3 shows how Flash might work with XML content. You’ll learn more about Flash and XML in Chapter 10. The diagram shows that Flash can work with XML content in two different ways. In both approaches, Flash receives an XML document and parses it into a document tree (step 1). Flash can then display the content within a Flash movie (step 4) using ActionScript. As an alternative, Flash can bind the XML content to prebuilt components (step 2). At this point, Flash can optionally format the data as part of the binding process (step 3) before displaying it within a Flash movie (step 4). Steven Webster’s article, “Choosing Between XML, Web Services, and Remoting for Rich Internet Applications” at http://www.macromedia.com/devnet/flash/articles/ ria_dataservices.html, provides a good coverage of working with XML in Flash. I’ll also talk about Flash in more detail in Chapter 10.
Slide 124: CHAPTER 4 ■ CLIENT-SIDE XML 103 Figure 4-3. Working with XML content in Flash Now that you understand the ways in which you can work with XML on the client, it’s time to look at XML support in the most common web browsers. Examining XML Support in Major Browsers XML support can include the display of raw XML and conformity with • The W3C DOM • XML Schema Definition (XSD) Language • XSLT Before discussing browser support, let’s have a quick refresher about these concepts and look at some pertinent points. Understanding the W3C DOM A DOM represents a document as a series of related objects. The HTML DOM provides an application programming interface (API) for addressing parts of a web document. If you’ve worked with JavaScript, you may have used the HTML DOM to access specific elements within an XHTML document. For example, you can find the title of an XHTML document with document.title or count the number of images on a page using document.images.length. If you’ve created DHTML, you’ve addressed the issue of browser incompatibility. The W3C has released a recommendation that provides for three different levels of DOM support, numbered 1 to 3, respectively. The higher the DOM level, the larger the feature set that is supported. The W3C refers to the early Netscape Navigator 3 and Microsoft Internet Explorer (IE) 3 DOMs as Level 0. You can find out more at http://www.w3.org/DOM/. DOM is also separated into different sections: Core, XML, and HTML. The HTML DOM extends some of the Core functionality. Because it extends this functionality, it’s compatible with earlier DOM implementations.
Slide 125: 104 CHAPTER 4 ■ CLIENT-SIDE XML The W3C DOM treats data as a tree of nodes, where each node has properties and methods. While DOM theoretically has a wider scope than XML documents, most of the implementations have been concerned with XML and XHTML. The recommendation is platform- and programming-language-independent. This means that, once you’ve learned one implementation, you’ll be able to apply the same constructs with different languages. Rather than go into detail in this short section, I’ll examine DOM scripting fully in Chapter 8. In that chapter, I’ll use JavaScript to manipulate DOM, and you’ll work through several examples. Understanding the XML Schema Definition Language Schemas specify the rules for creating valid documents within a given XML vocabulary. XML schemas are one class of schema developed by the W3C. XML schemas address some of the shortcomings in Document Type Definitions (DTDs). One area addressed is the ability of the XML schema language to define complex relationships and data types within an XML document. Understanding XSLT XSLT is an XML vocabulary that is concerned with transforming one XML document tree into another. I’ll look at this topic in more detail in Chapters 6 and 7. The sections that follow will look at XML support in these major web browsers: • Microsoft Internet Explorer 6 • Mozilla Firefox 1.5 • Netscape 8 • Opera 8.5 These are the current browser versions at the time of writing. I’ll cover the display of raw XML in each browser and the XML parser used by each browser, and I’ll show you how the browser determines XML content. I’ll also look at any XML functionality specific to the browser. Note that the forthcoming release of Opera 9 includes support for XSLT, which isn’t present in the current version. Microsoft Internet Explorer Microsoft included XML support in early releases of the IE browser with MSXML, formerly known as the Microsoft XML Parser. MSXML is available as a DLL, in different versions. Examining the MSXML Parser Internet Explorer has included MSXML since version 4 of the browser. The parser provides a fairly complete implementation of most of the major W3C XML standards. In general, the more recent versions of IE provide better compliance with standards.
Slide 126: CHAPTER 4 ■ CLIENT-SIDE XML 105 MSXML provides support for DOM, XML schema, and XSLT. MSXML also supports other proprietary and non-W3C standards, such as Simple API for XML (SAX). MSXML is not a validating parser. If an XML document specifies a schema or DTD, IE isn’t able to validate the document instance. For more details on MSXML, visit http:// msdn.microsoft.com/xml/ and browse to the MSXML SDK documentation. W3C DOM Support Microsoft has supported DOM Level 1 since MSXML version 2.0. Version 1.0 supported a Microsoft derivative of DOM, which is very similar to, but not fully compliant with, the W3C DOM Level 1. W3C XSD MSXML began to support the W3C XSD recommendation from version 4. MSXML 3 supported XML-Data Reduced (XDR) schemas, but this approach has since been deprecated. MSXML 6 removes support for XDR altogether. XSLT IE 6 offers support for XSLT 1.0 and XPath 1.0. At the time of writing, XSLT 2.0 is a candidate recommendation from the W3C, along with XPath 2.0. MSXML Versions IE 6 ships with version 3 of MSXML, but you can also download the component separately to upgrade to a later version. You may also have a later version if you’ve installed other software that requires its MSXML. At the time of writing, the most recent version is MSXML 6, and it ships with SQL Server 2005. The state of different versions is a little confusing. It seems that Windows Vista will include MSXML 6 when released, so presumably MSXML 6 will also be distributed with IE 7. MSXML 5 was included with Microsoft Office 2003 and wasn’t available as a separate download. MSXML 6 includes support for • XML 1.0 (DOM and SAX2 APIs) • XML schema 1.0 • XPath 1.0 • XSLT 1.0 The most common versions of MSXML are likely to be 4 and 3 at the time of writing, so this book will focus on using them. You can use JavaScript to determine which parser is installed. You’ll find out more about this in Chapter 8. Table 4-1 shows the versions of MSXML that shipped with the various versions of IE.
Slide 127: 106 CHAPTER 4 ■ CLIENT-SIDE XML Table 4-1. IE and MSXML Versions Internet Explorer Version 4.0 4.01 Service Pack 1 (SP1) 5.0 5.0b 5.01 5.01 SP1 5.5 6 MSXML Version 1.0 2.0 2.0a 2.0b 2.5a 2.5 SP1 2.5 SP1 3 At the time of writing, no versions of Internet Explorer ship with MSXML 4.0 or higher. Viewing Raw XML in IE When IE opens an XML document, it checks first for a stylesheet processing instruction. If IE finds a stylesheet, it applies the stylesheet to transform the document. If no such processing instruction exists, IE displays the raw data in a collapsible tree structure, using its own default stylesheet. To show you how IE displays raw XML, I’ll use the dvd.xml file from Chapter 1. You can find this with the resources available for download from the Source Code area of the Apress web site (http://www.apress.com). The document follows: <?xml version="1.0" encoding="UTF-8"?> <!-- This XML document describes a DVD library --> <library> <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> <DVD id="2"> <title>Contact</title> <format>Movie</format> <genre>Science fiction</genre> </DVD> <DVD id="3"> <title>Little Britain</title> <format>TV Series</format> <genre>Comedy</genre> </DVD> </library>
Slide 128: CHAPTER 4 ■ CLIENT-SIDE XML 107 Figure 4-4 shows how this XML appears when opened in IE. Figure 4-4. The dvd.xml document displayed in Internet Explorer The document displays in a tree view complete with + and - signs that you can click to open and close branches of the tree. MSXML includes a default stylesheet that IE applies when no processing instruction exists in the XML document. ■ Choose View ➤ Source from the menu to see the raw source of the XML file. Tip You can see the default MSXML stylesheet by entering the following addresses into the browser: • For MSXML 4, use the address res://msxml.dll/defaultss.xsl. • For MSXML 3, use the address res://msxml3.dll/defaultss.xsl. • For MSXML 2, use the address res://msxml2.dll/defaultss.xsl. Figure 4-5 shows the MSXML 4 default stylesheet.
Slide 129: 108 CHAPTER 4 ■ CLIENT-SIDE XML Figure 4-5. The default stylesheet for MSXML 4 The content might be a little hard to understand, but it provides an elegant way of formatting raw XML data. If you want to use this stylesheet in your own applications, you can’t save it directly from the browser. Instead, you can copy the content and remove the + and - signs. Determining XML Content IE takes into account different factors to determine whether it’s dealing with an XML document. If the file is loaded from the local file system, IE looks first at the file extension to see if it’s a known type. Failing this, it looks for an <?xml?> declaration at the top of the file. When the file is loaded from a remote server using HTTP or FTP the browser looks to the , Multipurpose Internet Mail Extensions (MIME) content type sent by the server to determine the file type. If it’s unable to do this, it looks for an <?xml?> declaration in the document. When IE determines that the document is of the type XML based on the declaration, it still displays the appropriate MIME type in the document properties box. Once IE determines that it’s dealing with XML content, it parses the document and checks that it is well formed. If the document isn’t well formed, IE displays an error message, as shown in Figure 4-6.
Slide 130: CHAPTER 4 ■ CLIENT-SIDE XML 109 Figure 4-6. Internet Explorer 6 showing an error message Using Proprietary XML Functionality in IE IE includes the following proprietary features: • XML data islands • XML data binding • XMLHTTP object I’ll discuss these in a little more detail. XML Data Islands JavaScript allows you to manipulate XML on the client side. IE includes proprietary functionality that loads XML into script-accessible variables when the page first loads. Microsoft calls this functionality XML data islands, as they are islands of data within a sea of XHTML. Be aware that MSXML no longer supports this technology. You can include content within an XHTML page by using the proprietary <xml> element. You can either add the content inline <xml id="dvd1"> <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> </xml>
Slide 131: 110 CHAPTER 4 ■ CLIENT-SIDE XML or by referencing a URL <xml id="dvd" src="dvd.xml"/> You can then use JavaScript to access the data by using the XML DOM. The resource file dvd_island.htm includes XML data islands: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head></head> <body> <p>This page contains XML data islands</p> <p> <a href="JavaScript: alert(document.all.dvd1.XMLDocument.xml)">View DVD 1</a> <br /> <a href="JavaScript: alert(document.all.dvd2.XMLDocument.xml)">View DVD 2</a> <br /> <a href="JavaScript: alert(document.all.dvd3.XMLDocument.xml)">View DVD 3</a> </p> <xml id="dvd1"> <DVD id="1"> <title>Breakfast at Tiffany's</title> <format>Movie</format> <genre>Classic</genre> </DVD> </xml> <xml id="dvd2"> <DVD id="2"> <title>Contact</title> <format>Movie</format> <genre>Science fiction</genre> </DVD> </xml> <xml id="dvd3"> <DVD id="3"> <title>Little Britain</title> <format>TV Series</format> <genre>Comedy</genre> </DVD> </xml> </body> </html>
Slide 132: CHAPTER 4 ■ CLIENT-SIDE XML 111 Each data island has a unique id, and you can use JavaScript and the XML DOM to access the XML content: <a href="JavaScript: alert(document.all.dvd1.XMLDocument.xml)">View DVD 1</a> Figure 4-7 shows what happens when you click this link in IE. Figure 4-7. XML data island content displayed in IE XML Data Binding IE allows you to bind XML data islands to Dynamic HTML (DHTML) elements. After binding, you can view or even update the data. XML HTTP Object The XMLHTTP object has been included with MSXML since version 1. The object requests data over HTTP MSXML 6 no longer supports XMLHTTP10. . The following JavaScript code shows how easy it is to retrieve data from the server: var oXMLHTTP = new ActiveXObject("Microsoft.XMLHTTP"); oXMLHTTP.Open ("GET", "http://www.microsoft.com/", false ); oXMLHTTP.SetRequestHeader ("Content-type", "text/html"); oXMLHTTP.Send(); alert(oXMLHTTP.responsetext); You can find this file saved as xmlHTTP.htm with your resources, if you want to test it yourself. After IE 5 implemented this functionality, other browser creators followed suit. Similar functionality is available within Mozilla 1.0+, Safari, and Opera 8+. You’ll learn more about this in Chapter 9. Now that you’ve seen how IE works with XML, it’s time to look at support in Firefox and Netscape.
Slide 133: 112 CHAPTER 4 ■ CLIENT-SIDE XML Mozilla Mozilla is the basis for both the Netscape and Firefox browsers, so the XML functionality discussed in this section applies to both browsers. Examining the Expat Parser Most of Mozilla’s XML functionality is based around a core XML parser called Expat. Expat is tightly integrated with the Mozilla engine, so all Mozilla versions ship with this parser. The parser supports XSLT stylesheets, namespaces, simple XLinks, Scalable Vector Graphics (SVG), and Mathematical Markup Language (MathML). Expat 1.2 is also available for separate download from http://www.jclark.com/xml/ expat.html. At the time of writing, Expat 2.0 is in development and can be downloaded from http://expat.sourceforge.net/. W3C DOM Support Mozilla provides complete support for the W3C XML DOM to Level 2, with additional support for some DOM Level 3 elements. Unlike IE, Mozilla’s DOM support is built into the browser, making it very easy to work with a DOM representation of an XML document using JavaScript. Because DOM is a standardized interface, once you create the DOM objects, you can use the same code to manipulate them, regardless of browser. You’ll discover more about this in Chapter 8. W3C XSD Expat is a nonvalidating parser, so Mozilla cannot validate an XML document using an XML schema or DTD. XSLT Mozilla can perform XSLT transformations in much the same way as IE. It relies on a module called TransforMiiX, which you can also use as a standalone processor. Viewing Raw XML in Mozilla Both Netscape 8 and Firefox 1.5 add formatting to display raw XML content in much the same way as IE. Figure 4-8 shows an XML document opened within Firefox. Determining XML Content Mozilla is more particular than IE in determining what is and isn’t XML. Regardless of the source of the document, Mozilla tries to use the MIME type to determine content type. On platforms with no native MIME support, such as Windows, it uses the file extension. Unlike IE, Mozilla doesn’t look at the content of the file in making the determination. Mozilla treats unknown file types as text/plain, even though they may contain XML content. Mozilla checks that XML documents are well formed, and it displays an error in the browser if this isn’t the case.
Slide 134: CHAPTER 4 ■ CLIENT-SIDE XML 113 Mozilla also generates an error when it detects white space above the XML declaration. This is the correct behavior according to the specification. However, IE is not as strict about enforcing this requirement. Figure 4-8. Raw XML content displayed in Firefox 1.5 Using Proprietary Functionality in Mozilla Mozilla adheres to W3C recommendations and as such, it doesn’t have much proprietary functionality. Like IE, though, it does have native support for XMLHTTP and data islands. Mozilla also supports XML Binding Language (XBL) and XML User Interface Language (XUL). The Mozilla XML Extras project includes support for Simple Object Access Protocol (SOAP), Web Services Description Language (WSDL), MathML, Resource Description Framework (RDF), and SVG. In the future, Mozilla plans to provide full XLink and XPointer support. Let’s look a little more closely at XBL and XUL.
Slide 135: 114 CHAPTER 4 ■ CLIENT-SIDE XML XUL XUL (pronounced zool and rhymes with cool) is a proprietary language created by Mozilla that describes Mozilla user interfaces. You can use XUL to create interfaces containing elements such as form controls, toolbars, and menus. The advantage is that it provides a simple way to define user interface widgets. You might use XUL to add functionality to Mozilla or to create complete applications such as Firefox and Thunderbird. XUL is beyond the scope of this book, but you can find a great introduction to it at http://developer.mozilla.org/en/docs/XUL_Tutorial. XBL XBL works with XUL to describe the behavior of XUL widgets. Again, Mozilla developed XBL and submitted it as a note to the W3C. It provides similar functionality to IE XML data binding, combined with IE DHTML behaviors. You can find out more about XBL at http:// developer.mozilla.org/en/docs/XUL_Tutorial:Introduction_to_XBL. Native SVG Support Chapter 3 introduced you to SVG. The latest version of Firefox, 1.5, includes native SVG for most of the SVG 1.1 recommendation. It doesn’t include support for filters, SVG-defined fonts, and declarative animations. Netscape 8 doesn’t offer SVG support. Opera Opera has supported XML since version 4, but it doesn’t yet have the same level of support offered by the other major browsers. At the time of writing, the next release, 9.0, plans to increase XML support. Examining the Expat Parser Like Mozilla, Opera also makes use of the Expat open source parser. W3C DOM Support Opera 8 has full support of XML DOM 2. XSLT Opera 8.5 has no support for XSLT stylesheets, though it’s planned for the forthcoming release of Opera 9. You must apply XSLT stylesheet transformations on the server side if you’re targeting Opera. Viewing Raw XML in Opera Opera ignores the XML tags within a document and displays only the content from the elements, in accordance with the recommendation. Figure 4-9 shows how the XML document dvd.xml, displays in Opera. Opera treats all elements as inline and renders all text in the same font.
Slide 136: CHAPTER 4 ■ CLIENT-SIDE XML 115 Figure 4-9. Raw XML content displayed in Opera 8.5 You can see the content within the XML document by choosing View ➤ Source. Determining XML Content Opera uses the content type followed by the file extension to determine whether a file contains XML content. In addition, Opera looks at the first line of the file for an XML declaration. Opera also checks whether an XML document is well formed. As with the other browsers, Opera generates a parser error if it loads a document that is not well formed. However, unlike the other browsers, Opera displays the part of the XML file that it successfully parsed prior to reaching the error. Using Proprietary Functionality in Opera Opera doesn’t offer much in terms of proprietary XML tools. However, it offers native support for some XML vocabularies: native SVG 1.1 Tiny and native WML. Native SVG 1.1 Tiny Support Opera has native support for SVG 1.1 Tiny, a subset of the SVG recommendation suitable for cell phones. This means that Opera natively supports SVG opacity, font handling, and animation. Native WML Support WML is a vocabulary of XML used to mark up documents for display in mobile phone-based browsers. Opera supports most of WML 1.3 and WML 2.0, and Opera is the only major browser to offer support of WML natively. Adobe (Formerly Macromedia) Flash Flash provides another option for the display and manipulation of XML content. Since version 5, Flash has been able to parse XML documents into a tree. Flash uses an internal XML class that is similar to, but not fully compliant with, the W3C DOM. One advantage of Flash movies is that they can display in a web browser or within standalone applications. You can find out more about Flash and XML in Chapter 10. The Le@rning Federation project provides a good example of using XML with Flash. This project is an initiative of the governments of Australia, the Australian states, and New Zealand. You can find out more about the project at http://www.thelearningfederation.edu.au/.
Slide 137: 116 CHAPTER 4 ■ CLIENT-SIDE XML The aim of the project is to provide online content for students and teachers through learning objects. A high proportion of the learning objects available use Flash and XML for portability and platform independence. You can find examples of learning objects at http:// www.thelearningfederation.edu.au/tlf2/showMe.asp?nodeID=242#groups. Figure 4-10 shows one learning object. Figure 4-10. A Flash movie displaying XML content Now that I’ve covered the range of client-side options available for working with XML data, let’s examine when client-side processing is appropriate. Choosing Between Client and Server It’s important to decide whether an XML application should use client-side XML, server-side XML, or some combination of the two types of processing. So far, you’ve seen several clients that can work with XML content. In Chapters 5 to 10, you’ll look at client-side communication in more detail. Chapters 11 to 13 will examine server-side applications. In this section, I’ll cover different approaches for client-side and server-side interactions in XML applications.
Slide 138: CHAPTER 4 ■ CLIENT-SIDE XML 117 Using Client-Side XML At the beginning of this chapter, you saw that the main benefits of working with XML on the client were a reduction of traffic between server and client, and a reduction in server-side load. Let’s examine these concepts more closely with an example. Suppose you need to display a list of properties that are for sale on a web site. Using XHTML and server-side processing, you could • Load a list of the property addresses and allow users to drill down to view the details of each property on a separate page • List all details of every property in a list on a single page The second approach isn’t practical. If you need to display a large number of properties, the page will be very long and will take a long time to download. You will also have a hard time locating information. In the first approach, viewing the details of a new property requests information from the server, which reloads the interface to display those details. Even if you need only a small amount of information, you’ll still need to refresh the page and load additional content from the server each time. Separating the content from the interface saves server traffic and download times each time you want to view another property. One solution is to use XML on the client side. The server downloads the interface once, when you first load the page. Each time you request further property details, you can download the new content to the client, transform and style the XML into the desired format, and insert the styled content into the cached interface. The only problem with this approach is that the application can only run in a client that has the appropriate level of XML support. If the content is served within a web browser, you need to be careful, because the level of support differs greatly between the major players. For example, Opera versions 8 and below don’t support XSLT. Using Server-Side XML One solution might be to process the XML on the server instead. Using server-side processing can avoid any of the specific browser issues. However, as discussed, this means users place more load on the server with more frequent trips to request information. Unless you’re dealing with a particularly data-intensive application, this isn’t likely to overshadow the advantages of the server-side approach. I’ll discuss this in more detail in Chapters 11 to 13, where you’ll see some approaches to using server-side XML. There are three broad approaches to using XML in web browser applications: • Using XML on the server side only and sending XHTML to the web browser • Transforming the XML into XHTML for delivery to the browser • Serving XML to the web browser and manipulating it with client-side scripting I’ll look at each of these approaches in the following sections. I’ll examine Flash as a special case in Chapter 10.
Slide 139: 118 CHAPTER 4 ■ CLIENT-SIDE XML Using XML Within a Dynamic Web Page In this approach, the application processes XML using a server-side scripting language, such as C#, VB .NET, PHP or JavaServer Pages (JSP), and presents the end result to the browser as , XHTML. The browser can then style the content using server-side languages that provide DOM or SAX support, allowing the application to process XML content easily. Transforming XML into XHTML The second approach is to generate XML and use XSLT to transform it into XHTML for presentation on the browser. You can apply the XSLT stylesheet transformation on either the server or client, depending on the browser capabilities. If the browser has XSLT support, the transformation occurs there; otherwise, it takes place on the server. Once generated, the application can style the XHTML in the browser using CSS. Figure 4-11 shows the workflow involved in this approach. Figure 4-11. The process of transforming XML into XHTML This architecture involves the following steps: 1. Generate XML on the server. 2. Transform the XML content into XHTML on either the server or client. 3. Style the XHTML with CSS. I’ll explain each step in a little more detail.
Slide 140: CHAPTER 4 ■ CLIENT-SIDE XML 119 Generating XML on the Server The first step is much like building a dynamic web page, except that instead of generating XHTML, the application generates XML. The structure of the XML depends on the data source and the application. Transforming the XML Content into XHTML In the second stage, the application determines where the transformation should take place and transforms the data. The result of the transformation is an XHTML document that contains CSS references. If the client has the capability to transform the data, it should apply the stylesheet at that point to reduce the load on the server. However, this determination must be made on the server, so that you can apply a server-side transformation if necessary. If you’re using XSLT to access a small amount of content from a larger XML document, the overhead of sending the XML to the browser may be more than the time saved in client-side processing. It may make more sense to transform the content on the server and deliver XHTML to the browser. Another alternative is to combine both server-side and client-side transformations. The server-side transform selects the content and delivers XML to the client. The client then performs another transformation to generate the final XHTML. Styling the XHTML with CSS Once the browser receives the XHTML content, it is styled with CSS either through a linked external stylesheet or through embedded or inline CSS declarations. The result is a styled XHTML page. Advantages and Disadvantages Transforming XML into XHTML is a useful approach because it offers the following advantages compared with traditional XHTML-based dynamic web pages: • The application separates the data, layout, and styling of pages quite rigidly. • Separating styling provides more manageability for web applications. This type of architecture can be easily adapted to a server farm environment. • The application can target different platforms with the same server-side code. For example, the same content can be presented on web and mobile-phone browsers by applying a different XSLT stylesheet for each device. • The same application can be used for multiple purposes. For example, stylesheets could transform application-specific XML into a format suitable for sharing with business partners. They could then “browse” the transformed XML with a corporate system, allowing both parties to interact without making major changes to either system. Bear in mind that if you apply XSLT transformations on the server side, the server must carry out additional processing. Through this process, you may lose gains arising from reduced server traffic.
Slide 141: 120 CHAPTER 4 ■ CLIENT-SIDE XML You can implement this type of architecture either by building your own framework or by relying on existing tools. Some of the existing tools include • Apache AxKit: http://www.axkit.org/ • Apache Cocoon Project: http://cocoon.apache.org/ • PolarLake Integration Suite: http://www.polarlake.com/en/html/products/ integration/index.shtml • Visual Net Server: http://www.visualnetserver.com/ In addition, web servers such as Adobe (formerly Macromedia) ColdFusion (http:// www.macromedia.com/software/coldfusion/) and Microsoft Internet Information Services (IIS) (http://www.microsoft.com/WindowsServer2003/iis/default.mspx) offer good XML application tools. Serving XML to Client-Side Code In this approach, the browser receives the XML content as data embedded within the clientside code. You can use this approach to build dynamic pages that don’t have to make a round-trip to the server for additional processing. The application makes XML data available to client-side code by • Loading XML into a DOM variable using the browser’s proprietary DOM load method. • Using the XMLHTTP Request objects in IE, Mozilla, and Opera. This option is the core technology behind an approach called Asynchronous JavaScript and XML (AJAX) that you’ll learn about in Chapter 9. • Using XML-aware client-side development tools such as Flash. • Working with XML data islands. Serving XML directly to the client reduces the number of round-trips to the server. Without XML, the application would have to make a call to the server each time to request new content, which has the potential to slow down the user experience. Summary In this chapter, you’ve examined the XML support available in current versions of the major browsers. You’ve seen the different ways that you can process XML in a web browser, including some advanced functionality offered by IE. I’ve also shown you three different approaches to using XML in web applications. Chapters 5 to 10 examine how to implement the areas that you’ve examined in this chapter. Chapter 5 looks at styling XML documents with CSS, and Chapters 6 and 7 cover XSLT in detail. Chapter 8 looks more closely at scripting in the browser, while Chapter 9 examines one browser scripting approach, called Ajax. In Chapter 10, I’ll introduce you to Flash as an alternative method for working with XML.
Slide 142: CHAPTER 5 Displaying XML Using CSS You’ re probably familiar with Cascading Style Sheets (CSS) and using CSS declarations to style your XHTML pages. As you’ve already seen, stylesheets are very helpful for separating the content of an XHTML page from its presentation. They also allow you to be more efficient in managing web sites, because you can update styles across multiple pages by editing a single stylesheet. In this chapter, you’ll learn about CSS and see how you can use it to style XML documents. I’ll start with an introduction to CSS and show you how it styles XHTML documents. This will help to clarify the terms and roles of CSS and show you what’s possible. You’ll then work through examples that style XML documents with CSS. This process will show you some of the limitations and the special considerations when styling with CSS. I’ll discuss issues such as adding links, including images, adding content before or after elements, and displaying attribute content. All of these areas require special CSS techniques. CSS styling of XML provides some special challenges. With XHTML, a web browser understands the meaning of each of the elements and can display them accordingly. For example, a web browser understands how to render an <a> or <table> tag when it appears in an XHTML page. If the same tag appears in an XML document, there is no intrinsic meaning, so a browser cannot make any assumptions about how to render the element. This chapter will • Summarize how CSS works with XHTML • Style XML documents with CSS • Use CSS selectors with XML • Discuss the CSS box model and the positioning schemes • Lay out tabular XML data with CSS • Link XML documents • Add images to XML documents • Add text to XML documents from the stylesheet • Use attribute values from XML documents Within the chapter, I’ll mention which browsers support each approach. I tested these examples with Internet Explorer (IE) 6, Netscape 8, Firefox 1.5, Amaya 9.1, and Opera 8.51. Therefore, when I mention that something isn’t supported in a web browser, I’m referring to 121
Slide 143: 122 CHAPTER 5 ■ DISPLAYING XML USING CSS these versions. I’ve also included support for the Macintosh IE and Safari web browsers where possible. As with the previous chapters, you can download the resources for this chapter from the Source Code area of the Apress web site (http://www.apress.com). Let’s start with a quick recap of CSS. Introduction to CSS Since the early days of printing, stylesheets have provided instructions about which font family and size to use when printing a document. You can use CSS to provide styling information for web documents. A CSS stylesheet is effectively a text document saved with the .css extension. Why CSS? When you include presentation elements within an XHTML page, the content can easily get lost within the style or presentation rules. The following benefits arise from separating the content from the style and using a stylesheet to indicate how a document can be presented visually: • A single stylesheet can alter the appearance of multiple pages, meaning that you don’t need to edit each individual page to make changes. • Different stylesheets offer alternative views of the same content. • The content is simpler to author and interpret because it doesn’t include presentation information. • Web pages load more quickly because a stylesheet is downloaded once and cached. You can then reuse it throughout the site. The pages themselves are smaller because they no longer contain styling information. A CSS document contains style rules that apply to the elements of a target document, indicating how the content of those elements should be rendered in a web browser. CSS Rules CSS is based on rules that govern how the content of an element or set of elements should be displayed. You’ll see how to specify which elements to style a little later when I discuss the CSS selectors. Here’s an example of a CSS rule: h1 {color:# 2B57A1;} The rule is split into two parts: the selector (h1) and the declaration (color:# 2B57A1). The selector shows which element or elements the declaration should apply to while the declaration determines how the element(s) should be styled. In this example, all <h1> elements have been specified, but selectors can be more sophisticated, as you’ll see later. The declaration has two components: a property and a value, separated by a colon. The property is the visual property that you want to change within the selected element(s). In this
Slide 144: CHAPTER 5 ■ DISPLAYING XML USING CSS 123 example, I’ve set the color property, which sets the foreground or text color of the heading. The value of the property is #2B57A1, a blue color. The rule ends with a semicolon. ■ A CSS declaration can consist of several property-value pairs, and each property-value pair within a Tip rule must be separated with a semicolon. If you forget the semicolon, property-value pairs that appear afterwards will be ignored. While you don’t have to add a semicolon at the end of a single declaration, it’s good practice in case you want to add more declarations afterwards. CSS supports a system of inheritance. Once you declare a rule for an element, it applies to all child elements as well. If you set a rule specifying the color for the <body>, all child elements will inherit that color, including <p>, <h1>, <h2>, and <h3> elements. The exception here is links, which a web browser often overrides. You may have to include a separate rule for the <a> element. This is one of the reasons for the name cascading stylesheets. The CSS declarations flow down the element tree. Another reason for the name is that you can use rules from several stylesheets by importing one into another or importing multiple stylesheets into the same XHTML file. In addition, the rules apply in a cascading order. An inline declaration overrides a declaration embedded in the <head> section of a page, which overrides an external stylesheet. The following example shows a single rule containing multiple declarations. This means that the rule applies to several elements at the same time: h1, h2, h3 {color:# 2B57A1; font-family:Verdana, Arial, sans-serif; font-weight:bold;} Commas separate the element names in the selector: h1, h2, h3 Here, semicolons separate several properties for these elements, and all properties appear between curly braces: {color:# 2B57A1; font-family:Verdana, Arial, sans-serif; font-weight:bold;} If you want the <h3> element to appear in italics as well, you can add an additional rule: h3 {font-style:italic;} By declaring the common properties together, you can avoid repeating all the other property-value pairs when declaring the <h3> element individually. Rules declared individually have a higher level of precedence in the cascade. For example, if you add a font-weight:normal declaration in the rule for <h3>, it will override the bold declaration in the preceding rule. You can find a list of CSS2 properties at http://www.w3.org/TR/REC-CSS2/propidx.html. Many web sites explain how these properties are applied within stylesheets.
Slide 145: 124 CHAPTER 5 ■ DISPLAYING XML USING CSS CSS VERSIONS At the time of writing, there are two CSS recommendations: CSS1 and CSS2. The CSS2.1 specification is in working-draft stage. The revision adds requested features and corrects errors in the CSS2 specification. CSS3, also under development, provides a modularized approach to CSS; each of the modules are at various stages of development. The CSS1 features are mostly supported by IE 6, Netscape 6+, and Opera 6+ on Windows, and by IE 5+, Netscape 6+, and Opera 5+ on Macintosh. Support for CSS2 is patchier, as you’ll see throughout this chapter, despite being made a World Wide Web Consortium (W3C) recommendation in May 1998. Styling XHTML Documents with CSS As you saw in Chapter 3, XHTML is the reformulation of HTML using XML syntax. XHTML version 1.1 is modular, meaning that web-enabled devices can choose to support modules of XHTML, such as the tables or forms module. This makes it easier to create sites for new devices, such as phones and Internet-enabled refrigerators. I covered how to construct XHTML in Chapter 3. I’ll start this chapter by constructing a CSS stylesheet. Figure 5-1 shows the page that you’ll create. Figure 5-1. The XHTML page that you’ll create
Slide 146: CHAPTER 5 ■ DISPLAYING XML USING CSS 125 Without the stylesheet, Figure 5-2 shows that the document looks entirely different. Figure 5-2. The XHTML page without CSS styling As a precursor to constructing a CSS stylesheet for an XHTML document, you need to remove all styling from that document. What remains should be only content and structural tags. You’ll then use CSS to position the elements instead of relying on tables. The style declarations are stored in an external stylesheet that links to the XHTML document with the <link> element. You could also include the style rules inside the XHTML document using a <style> element within the <head> element, or by adding a style attribute to each element. However, storing the declarations in a single external document makes it easier to maintain and apply the style rules. The file styledXHTMLpage.htm, which appears below, contains the styled content: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>XHTML Example</title> <link rel="Stylesheet" href="styledXHTML.css" type="text/css" media="screen" /> </head> <body> <div class="header">Sample XHTML and CSS Layouts</div> <div class="contents">
Slide 147: 126 CHAPTER 5 ■ DISPLAYING XML USING CSS <div class="sideBarHead">Side bar</div> <div class="item">Side 1</div> <div class="item">Side 2</div> <div class="item">Side 3</div> <div class="item">Side 4</div> <div class="item">Side 5</div> <div class="item">Side 6</div> </div> <div class="navigation"> <div class="sideBarHead">Navigation</div> <div class="item">Link 1</div> <div class="item">Link 2</div> <div class="item">Link 3</div> <div class="item">Link 4</div> <div class="item">Link 5</div> <div class="item">Link 6</div> </div> <div class="page"> <div class="title">Sample Text</div> <div class="credit">by Apress</div> <table> <tr> <td rowspan="2">Cell spans<br />two rows</td> <td>Cell 1</td> <td>Cell 2</td> <td>Cell 3</td> </tr> <tr> <td>Cell 1</td> <td>Cell 2</td> <td>Cell 3</td> </tr> </table> <div class="pullQuote"> This text is the remnants of a passage from Cicero's de Finibus Bonorum et Malorum, written in 45 BC. </div> <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exercitation ulliam corper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem veleum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel willum lunombro dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. </p> </div> </body> </html>
Slide 148: CHAPTER 5 ■ DISPLAYING XML USING CSS 127 The document uses the stylesheet styledXHTML.css, which you can find with the downloaded resources. If you’re not familiar with some of the content in this example, don’t worry. Ill cover it in depth in the “Layout of XML with CSS” section later in this chapter. I’ll also show you how to choose which elements to style. The styledXHTML.css stylesheet follows: body, p, td { color: #000000; background-color: #FFFFFF; font-family: Arial, Hevetica, sans-serif; } table, td { padding: 10px; border-style: solid; border-width: 2px; } table {background-color: #CCCCCC;} td {background-color: #FFFFFF;} p {padding-bottom:20px;} .header { position: absolute; top: 0px; bottom: auto; left: 0px; z-index: 100; width: 100%; height: 60px; padding-top: 10px; padding-left: 20px; font-size: 26px; font-family: Arial, Hevetica, sans-serif; color: #FFFFFF; background-color: #2B57A1; } .contents, .navigation { width: 100px; height: 500px; font-size:14px; font-family: Arial, Helvetica, sans-serif; color: #FFFFFF; background-color: #7299D9; padding: 10px; } .contents { position: absolute; left: 0px; top:60px; }
Slide 149: 128 CHAPTER 5 ■ DISPLAYING XML USING CSS .navigation { position: absolute; right: 0px; top: 60px; padding-left: 10px; } .sideBarHead { font-size: 12px; font-weight: bold; padding-top: 15px; padding-bottom:10px; } .item {font-size: 12px; padding-left: 10px; } .page { width: auto; background-color: #FFFFFF; padding-top: 75px; padding-left: 10px; padding-right: 10px; padding-bottom: 10px; margin-left: 120px; margin-right: 120px; } .title {font-size:22px;} .credit { font-size: 12px; font-style: italic; color: #999999; padding-bottom: 15px; } .pullQuote { float: right; width: 20%; background-color: #FFFFFF; font-style: italic; border: solid 2px #2B57A1; padding: 10px; margin:10px: } You can see from the range of declarations that it’s possible to style XHTML elements in many different ways. The stylesheet governs the positioning of elements, padding, borders, fonts, and colors. There are a few things to note before moving on. The example uses CSS positioning instead of tables for the header and sidebars. Separating the content of the document from
Slide 150: CHAPTER 5 ■ DISPLAYING XML USING CSS 129 the layout rules makes the page easier to edit. You should only use tables for presenting tabular data. The XHTML document includes structural elements, such as a rowspan attribute within a table cell. It also separates each block within the document into separate <div> tags. A <div> element is a handy container for content within a document. The most important point from the exercise relates to the role played by the web browser. While XHTML is HTML reformulated in XML syntax, there is a difference between XHTML and other XML vocabularies. A web browser already understands XHTML elements and knows how they should be rendered. For example, when a web browser comes across a <table> element, it understands how to represent the <tr> and <td> tags. It knows that the rowspan attribute indicates how many rows a table cell should span. Other XML vocabularies don’t offer the same advantages. Unless you’re working with a specialized viewer, a web browser or other processor can’t derive the display meaning attached to each element. You must be a lot more careful when constructing stylesheets for XML documents. Styling XML Documents with CSS You’ve looked at a styling example with XHTML, so now let’s see what happens when you display content from an XML vocabulary that is unfamiliar to the web browser. For instance, you might want to display a custom XML document created from a database, or you could be dealing with a vocabulary that is specific to one of your trading partners. The web browser can’t display the content without help. One option is to control the display with CSS. Because XML elements represent content without any attached presentation cues, you must address the following questions: • How can you control layout without the use of tables? • How can you link the CSS stylesheet to the XML document? • How can you present tabular data in XML? • How do you include links to other documents? • How can you display images in our XML documents? As you style the document, some other issues will arise, including • The extent to which you can reorder the elements so that they are presented in a different sequence to their order in the original XML document • Whether you can add content that isn’t in the original XML document, such as headers and other fixed text elements • The display of attribute content, since many XML files contain important data you may wish to view You can see that styling XML documents with CSS raises many issues. Let’s start by attaching a CSS stylesheet to an XML document, so you can see how to render XML in a web browser.

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location