Notes in XML

I had some time so I converted my PERL and TEX notes from a weird XML-esque format going through a PHP preprocessor to proper, validating XML, coverted to static XHTML using an XSL stylesheet.

PERL notes: XML source, XHTML output
TEX notes: XML source, XHTML output

The notes DTD and XSL stylesheet. For validation, you also need the XHTML DTD and entity files:

xhtml1-strict.dtd
xhtml-lat1.ent
xhtml-special.ent
xhtml-symbol.ent

For XSL transformation, I use xsltproc from the excellent libxslt. It's blazing fast, written in C, and has minimal dependencies.

To validate the XML:

xmllint --noout --valid perl.xml

To convert it to XHTML:

xsltproc notes.xsl perl.xml > perl.html

All in all, the whole XML thing is pretty cool. XSLT is a little weird (a functional language described through XML), but fun, at least until you try to do something non-trivial with it.

XML Glossary

("Things I learned")

XML
(eXtensible Markup Language) XML is a restricted subset of SGML. It defines a way to create human-readable, human-editable, machine-parsable data files.
DTD
(Document Type Definition) Describes the format of XML content, such as what children and attributes an element can have. DTDs can be used to validate XML data.
XSL
(eXtensible Stylesheet Language) Describes how to convert XML data into other, possibly XML, data. For example: converting a description of CDs in a custom XML document type to a playlist in XHTML. Stylesheets written in XSL are valid XML.
XPath
A part of XSL which provides functions (like string mangling), boolean logic and comparisons, and paths to elements in XML data. XPath is not described using XML.
element
An element is a name in angle brackets. It has a starting tag, an ending tag, attributes (optional) and children (optional) such as text, comments and other elements. For example: <p>...</p>. In XML, all elements must have a closing tag. This is one of the differences between XML and SGML. Empty elements can be opened and closed in the same tag. For example: <br/>
attribute
A parameter of an element, given in the starting tag. For example: <p class="...">
entity
A name that expands to a special character or symbol. For example: &amp;
SAX
(Simple API for XML) Loading an XML file by generating events according to what is being parsed. This can be used to process an XML file in a single pass very quickly and using very little memory, assuming the structure of the data makes it possible.
DOM
(Document Object Model) Loading an XML file into memory in the form of a tree structure. This allows for much more complex transformations than are possible with SAX, at the expense of having to load all the data into memory at the same time.
XHTML
HTML with restrictions that make it valid XML. Mainly structural things like correct nesting of elements (i.e. <b><i></b></i> is illegal) and closing of all tags (like <br/>, <hr/>, and <img/>) although a few religious choices snuck in there as well (like all <img> elements having a mandatory alt="..." attribute)*

[*] Actually, the alt="" attribute in <img> and <area> elements was made mandatory in the jump from HTML 3.2 to HTML 4.0. Reference: HTML 4.0 Appendix A.1.8. Thanks to Dylan for pointing out it wasn't XHTML that standardized this.


Valid XHTML 1.1
Path: home > stuff > Notes in XML
copyright © 2004, 2005 Emil Mikulic
$Date: 2006/04/22 07:07:12 $