Notes in XML
I had some time so I converted my PERL and TEX notes from a weird
XML-esque format going through a PHP preprocessor to proper, validating
XML, coverted to static XHTML using an XSL stylesheet.
PERL notes:
XML source,
XHTML output
TEX notes:
XML source,
XHTML output
The notes DTD and XSL stylesheet. For validation, you also
need the XHTML DTD and entity files:
xhtml1-strict.dtd
xhtml-lat1.ent
xhtml-special.ent
xhtml-symbol.ent
For XSL transformation, I use xsltproc from the excellent
libxslt. It's blazing
fast, written in C, and has minimal dependencies.
To validate the XML:
xmllint --noout --valid perl.xml
To convert it to XHTML:
xsltproc notes.xsl perl.xml > perl.html
All in all, the whole XML thing is pretty cool. XSLT is a little weird
(a functional language described through XML), but fun, at least until
you try to do something non-trivial with it.
XML Glossary
("Things I learned")
- XML
-
(eXtensible Markup Language) XML is a restricted subset of SGML. It
defines a way to create human-readable, human-editable,
machine-parsable data files.
- DTD
-
(Document Type Definition) Describes the format of XML content, such
as what children and attributes an element can have. DTDs can be used
to validate XML data.
- XSL
-
(eXtensible Stylesheet Language) Describes how to convert XML data
into other, possibly XML, data. For example: converting a description
of CDs in a custom XML document type to a playlist in XHTML.
Stylesheets written in XSL are valid XML.
- XPath
-
A part of XSL which provides functions (like string mangling),
boolean logic and comparisons, and paths to elements in XML
data. XPath is not described using XML.
- element
-
An element is a name in angle brackets. It has a starting tag, an
ending tag, attributes (optional) and children (optional) such as
text, comments and other elements. For
example: <p>...</p>. In XML, all elements must
have a closing tag. This is one of the differences between XML and
SGML. Empty elements can be opened and closed in the same tag. For
example: <br/>
- attribute
-
A parameter of an element, given in the starting tag. For example:
<p class="...">
- entity
-
A name that expands to a special character or symbol. For example:
&
- SAX
-
(Simple API for XML) Loading an XML file by generating events
according to what is being parsed. This can be used to process an XML
file in a single pass very quickly and using very little memory,
assuming the structure of the data makes it possible.
- DOM
-
(Document Object Model) Loading an XML file into memory in the
form of a tree structure. This allows for much more complex
transformations than are possible with SAX, at the expense of having
to load all the data into memory at the same time.
- XHTML
-
HTML with restrictions that make it valid XML. Mainly structural
things like correct nesting of elements (i.e.
<b><i></b></i> is illegal) and
closing of all tags (like <br/>, <hr/>,
and <img/>) although a few religious choices snuck in
there as well (like all <img> elements having a mandatory
alt="..." attribute)*
[*]
Actually, the alt="" attribute in <img> and
<area> elements was made mandatory in the jump from HTML
3.2 to HTML 4.0. Reference: HTML
4.0 Appendix A.1.8. Thanks to Dylan for pointing out it wasn't
XHTML that standardized this.
Valid XHTML 1.1
Path:
home >
stuff >
Notes in XML
copyright © 2004, 2005 Emil Mikulic
$Date: 2006/04/22 07:07:12 $