Slide 1: XML in a Nutshell
Slide 2: Outline
• XML Basics • Displaying XML with CSS • Transforming XML with XSLT • Serving XML to Web Users • Resources • Tips & Advice
Slide 3: Documents
• XML is expressed as “documents”, whether an entire book or a database record • Must haves:
– At least one element – Only one “root” element
• Should haves:
– A document type declaration; e.g., <?xml version="1.0"?> – Namespace declarations
• Can haves:
– One or more properly nested elements
Slide 4: Elements
• Must have a name; e.g., <title> • Names must follow rules: no spaces or special characters, must start with a letter, are case sensitive • Must have a beginning and end; <title></title> or <title/> • May wrap text data; e.g., <title>Hamlet</title> • May have an attribute that must be quoted; e.g., <title level=“main”>Hamlet</title> • May contain other “child” elements; e.g., <title level=“main”>Hamlet <subtitle> Prince of Denmark</subtitle></title>
Slide 5: Element Relationships
• Every XML document must have only one “root” element • All other elements must be contained within the root • An element contained within another tag is called a “child” of the container element • An element that contains another tag is called the “parent” of the contained element • Two elements that share the same
Slide 6: The Tree
<?xml version="1.0"?> Root element <book> Parent of <lastname> <author> <lastname>Tennant</lastname> <firstname>Roy</firstname> Child of <author> </author> <title>The Great American Novel</title> <chapter number=“1”> <chaptitle>It Was Dark and Stormy</chaptitle> Siblings <p>It was a dark and stormy night.</p> <p>An owl hooted.</p> </chapter> </book>
Slide 7: Comments & Processing Instructions
• You can embed comments in your XML just like in HTML:
<!-- Whatever is here (whether text or markup) will be ignored on processing -->
• A processing instruction tells the XML parser information it needs to know to properly process an XML document:
<?xml-stylesheet type="text/css" href="style2.css"?>
Slide 8: Well-Formed XML
• Follows general tagging rules:
– All tags begin and end
• But can be minimized if empty: <br/> instead of <br></br>
– All tags are case sensitive – All tags must be properly nested:
• <author> <firstname>Mark</firstname> <lastname>Twain</lastname> </author>
– All attribute values are quoted:
• <subject scheme=“LCSH”>Music</subject>
• Has identification & declaration tags
Slide 9: Valid XML
• Uses only specific tags and rules as codified by one of:
– A document type definition (DTD) – A schema definition
• Only the tags listed by the schema or DTD can be used • Software can take a DTD or schema and verify that a document adheres to the rules • Editing software can prevent an author from using anything except allowed tags
Slide 10: Namespaces
• A method to keep metadata elements from different schemas from colliding • Example: the tag <name> may have a very different meaning in different standards • A namespace declaration specifies from which specification a set of tags is drawn
<mets xmlns="http://www.loc.gov/METS/" xsi:schemaLocation= "http://www.loc.gov/standards/mets/mets.xsd">
Slide 11: Character Encoding
• XML is Unicode, either UTF-8 or UTF-16 • However, you can output XML into other character encodings (e.g., ISOLatin1) • Use <![CDATA[ ]]> to wrap any special characters you don’t want to be treated as markup (e.g., )
Slide 12: Displaying XML: CSS
• A modern web browser (e.g., MSIE, Mozilla) and a cascading style sheet (CSS) may be used to view XML as if it were HTML • A style must be defined for every XML tag, or the browser displays it in a default mode • All display characteristics of each element must be explicitly defined • Elements are displayed in the order they are encountered in the XML • No reordering of elements or other processing is possible
Slide 13: Displaying XML with CSS
• Must put a processing instruction at the top of your XML file (but below the XML declaration):
<?xml-stylesheet type="text/css" href="style.css"?>
• Must specify all display characteristics of all tags, or it will be displayed in default mode (whatever the browser wants)
Slide 14: CSS Demonstration
XML Doc Cascading Stylesheet (CSS)
Web Server
Slide 15: Transforming XML: XSLT
• XML Stylesheet Language — Transformations (XSLT) • A markup language and programming syntax for processing XML • Is most often used to:
– Transform XML to HTML for delivery to standard web clients – Transform XML from one set of XML tags to another – Transform XML into another syntax/system
Slide 16: XLST Primer
• XSLT is based on the process of matching templates to nodes of the XML tree • Working down from the top, XSLT tries to match segments of code to:
– The root element – Any child node – And on down through the document
• You can specify different processing for each element if you wish
Slide 17: XSLT Processing Model
XML Doc XML Parser Source Tree
Transformatio n
Formatted Output
Formatting
XSLT Stylesheet
Result Tree
From Professional XSL, Wrox Publishers
Slide 18: Nodes and XPath
• An XML document is a collection of nodes that can be identified, selected, and acted upon using an Xpath statement • Examples of nodes: root, element, attribute, text • Sample statement:
//article[@name=‘test’] = Select all <article> elements of the root node that have a name attribute with the value ‘test’
Slide 19: Templates
• An XSLT stylesheet is a collection of templates that act against specified nodes in the XML source tree • For example, this template will be executed when a <para> element is encountered:
<xsl:template match="para"> <p><xsl:value-of select="."/></p> </xsl:template>
Slide 20: Calling Templates
• A template can call other templates • By default (tree processing):
<xsl:apply-templates/> [processes all
children of the current node]
• Explicitly:
<xsl:apply-templates select=“title”/>
[processes all <title> elements of the current node]
<xsl:call-template name=“title”/>
[processes the named template, regardless of the source tree]
Slide 21: XSLT Structures
• Decision:
– Choose: when you want an “otherwise” (default) condition – If: when you don’t need a default condition
• Looping:
– For-each: processes each selected node in turn
Slide 22: XSLT Primer: Doing HTML
• Typical way to begin:
<xsl:template match="/"> <html> <head> <title><xsl:value-of select="title"/></title> <link type="text/css" rel="stylesheet" href="xslt.css" /> </head> <body> <xsl:apply-templates/> </body> </html> </xsl:template>
• Then, templates for each element appear below
Slide 23: XSLT Demonstration
XSLT Stylesheet
XML Processor (xsltproc)
XHTML representation Cascading Stylesheet (CSS)
XML Doc
CGI script
Web Server
Slide 24: XML vs. Databases
(a simplistic formula)
• If your information is…
– Tightly structured – Fixed field length – Massive numbers of individual items
• You need a database • If your information is…
– Loosely structured – Variable field length – Massive record size
• You need XML
Slide 25: Serving XML to Web Users
• Basic requirements: an XML doc and a web server • Additional requirements for simple method:
– A CSS Stylesheet
• Additional requirements for complex, powerful method:
– An XSLT stylesheet – An XML parser – XML web publishing software or an in-house CGI or Java program to join the pieces – A CSS stylesheet (optional) to control how it looks in a browser
Slide 26: XML Web Publishing Software
• Software used to add XML serving capability to a web server • Makes it easy to join XML documents with XSLT to output HTML for standard web browsers • A couple examples, both free…
Slide 27: Requires a Java servlet container such as Tomcat (free) or Resin (commercial)
Slide 28: Requires mod_perl
Slide 29: http://texts.cdlib.org/escholarship/
Slide 30: XML & XSLT Resources
• Eric Morgan’s “Getting Started with XML” a good place to begin • Many good web sites, and Google searches can often answer specific questions you may have • Be sure to join the XML4Lib discussion
Slide 31: Tips and Advice
• Begin transitioning to XML now:
– XHTML and CSS for web files, XML for static documents with long-term worth – Get your hands dirty on a simple XML project
• Do not rely on browser support of XML • DTDs? We don’t need no stinkin’ DTDs! • Buy my book! (just kidding…)
Slide 32: Contact Information
Roy Tennant
California Digital Library roy.tennant@ucop.edu http://roytennant.com/ 510-987-0476