|
Moving to XML
|
|
|
Moving to XML
Author and presenter: Simon Brooke, Technical
Director, Weft Technology
Ltd.
The full text of this presentation is online at
<URL:http://www.weft.co.uk/library/inet2001/xml/>
Changes to the presentation since your handouts
were printed are highlighted like this.
Simon Brooke
Weft Technology Ltd, 21
Main Street, Auchencairn, Scotland.
What we're going to do today
-
What is XML
-
A brief chautauqua on language
-
What is XML
-
A bit about the other bits
-
XML in your context
-
Anatomy of an XML system
-
Specifying
-
Creating
-
Transforming
-
Communicating
How we're going to get there
-
The morning, mostly talking
-
This afternoon, mostly doing it. We have an awful lot to
get in!
-
As I write this, I'm expecting we're going to be able to use
the terminal room in the afternoon, but I don't have
confirmation of this. I'm designing this course to be
hands-on. If it can't be, we may have a tricky afternoon!
Breaks
-
Coffee 10:30 I hope to be up to the
beginning of A bit about the other bits
-
Lunch 12:30 I hope to be up to the
beginning of Worked Example: a meeting arranger system
-
Coffee 3:30 I hope to be up to the end of
Exercise period [ii]
-
We're supposed to end at 5:30.
Nearest WCs, how to get to coffee?
Before we start: what do you know
We've got a lot of ground to cover... Just so I can know which
bits to concentrate on and which bits to skip, what do you
know about...
|
|
nothing
|
a little
|
can do it
|
expert
|
|
HTML
|
|
|
|
|
|
Java
|
|
|
|
|
Before we start: Namespace
-
A context in which there are things with names
-
Each thing in the namespace has a different name
-
You can address a thing in the namespace just by using its
name
-
Examples
- This is a powerful concept, and I'm going to use it a lot -
but it has special meaning in XML. Be clear about when I'm
talking about an 'XML namespace' and when I'm just talking
about a namespace!
[get participants to write their names on bits of paper. Make
sure there are not two in the class with the same name. If
there are, get them to add something to their names to
disambiguate]
A brief chautauqua on language
Words
We can recognise words as belonging to a language because we
know them... (sometimes, we can recognise words as belonging
to a language even when we don't know them, because they
sound right).
Sentences
-
Colourless green ideas sleep furiously.
-
Development state with join material and.
Language
-
In any given language, we can easily recognise what is a
well formed component of the language.
-
And what is not...
Words
-
Fish
-
Peske
-
Choiremheir
-
Chautauqua
-
Gkprtwcv
-
P7ajo
Although language families have rules about what can be in a
word and what can't, it's much harder to tell whether a word
is valid or not, unless we know which language we're looking
at.
Sentences
-
This is not a pipe.
-
Ceci n'est pas une pipe.
-
Chagco vet nici yan toube.
-
GGGGGG #000007 cabala.
Meta-Language [1]
-
Within a family of languages, we can recognise what is a
well-formed component of some language
-
or might be...
-
and what certainly isn't.
Meta-Language [2]
-
In Indo-European languages,
-
a word has at least one vowel
-
no word has more than four consonants in succession
-
a sentence is a succession of words
-
a sentence starts with a capital letter and ends with a
period
-
There is an (implicit) meta-grammar.
HTML [1]
-
<address>
-
A valid HTML tag (in HTML 4.0 Transitional)
-
<cotton>
HTML [2]
-
HTML is a language (albeit a simple one).
- It's a markup language, and I hope it's one you're all
familiar with.
-
we can know at once whether a tag is a valid HTML tag or
not...
-
and what it means...
-
and how it should be used...
Well Formedness
-
When we know what the language is we can parse ill-formed
forms:
-
because we can predict what the missing bits are
-
and where they should be:
-
I have been there, and I
have done that.
What is XML
-
Key features
-
Differences from HTML
-
Differences from SGML
-
A bit about the other bits
-
Reality check
Key Features
A universal, application-independent framework for the
communication of semantically rich structured information between
software agents.
-
A language for describing other languages
-
Which describe the structure of a document
-
Not the visual appearance (CSS, XSL)
-
Written in simple
UniCode (a sixteen-bit replacement for ASCII)
Differences from HTML
-
A Metalanguage: In a word, extensible.
-
HTML can be (and has been)
reimplemented as an XML dialect
-
Also, strictly parsed.
Extensible: what does this mean for you?
-
Allows you to define new markup.
-
Describing structure, not appearance.
-
Makes it easier for programs to extract
information from your documents.
Extensible: a simple example [1]
<?xml version="1.0"?>
<!DOCTYPE meeting PUBLIC "-//WEFT//DTD MEETING 0.1//EN"
"meeting.dtd">
<meeting id="June Board Meeting">
<venue>
28 Forth Street, Edinburgh
</venue>
<invitees>
<attendee attendance="required"
meeting-role="convenor">
<name>
Simon Brooke
</name>
<position>
Technical Director
</position>
</attendee>
<attendee attendance="required">
<name>Angela Stormont</name>
<position>
Communications Director
</position>
</attendee>
</invitees>
</meeting>
Extensible: a simple example [2]
-
What does this do?
-
For the user directly, very little.
-
For the user's program, it allows it to isolate items of
structured information and handle them in intelligent ways
to help the user.
-
But only if the user's program understands the special
markup you have defined.
Strictly parsed: what does this mean for you? [1]
Documents which are not well-formed will not be handled by an
XML application. At all.
-
Tags and attributes are case-sensitive;
-
End tags cannot be omitted - every <p>
must have a </p>.
-
Tags must be correctly nested:
<b><i>This won't
work</b></i>
-
Empty tags (those which don't enclose any content) must be
marked with a trailing slash like this:
<xx/>
Strictly parsed: what does this mean for you? [2]
-
Most Web designers are sloppy.
-
More than ninety percent of all commercially authored Web
pages do not conform to any standard and are not valid
HTML.
-
Few if any of the commercially available WYSIWYG tools
generate valid HTML.
-
Web authors switching to XML will need to adopt much more
rigorous technical discipline.
Differences from SGML
-
Like HTML, simpler!
- I used to say 'much simpler', but now I'm not too sure...
-
Like HTML, optimised for delivery over restricted-bandwidth
links.
-
Unlike HTML, a true subset of SGML.
-
All valid XML documents are valid SGML documents.
-
SGML tools (conforming to ISO 8879) will work with XML.
-
Organisations with an existing committment to SGML will
find the transition to XML much simpler.
A bit about the other bits
- XML is a language for describing other languages
- Most of these are application specific
- Some are very general
- XLink: a
vocabulary for linking between XML documents
- XPath and
XPointer: vocabularies
for describing positions inside XML documents
- XSL-T: a
vocabulary for transforming XML documents
- XML
Schema: a vocabulary for describing vocabularies
- SMIL
(Synchonised Multimedia Integration Language): a
vocabulary for integrating and synchronising multimedia
presentations
- SOAP
(Simple Object Access Protocol): a vocabulary for
exchanging computation requests between heterogenous
agents in a network.
- All of these key standards are looked after by W3C
A bit about the other bits [ii]: XLink
- In HTML, you need to use a special element (the A or Anchor tag)
to be the start of a link
-
In XML any element can be the start of a link
- Currently, Mozilla/Netscape 6 is the
only 'mainstream' browser which partly supports this
- W3C's Amaya 4
also partly supports it
- Several other demo and prototype implementations
- No mainstream browser fully supports XLink
A bit about the other bits [iii]: XPath and XPointer
- In HTML, you need to use a special element (the A or Anchor
tag) to be the target of a link
- In XML a link can target any element in the target document
- Several demo and prototype implementations
- No mainstream browser fully supports this
- You've heard this before somewhere, haven't you?
A bit about the other bits [iv]: XSL-T
- 'eXtensible Stylesheet Language - Transformations'
-
A language for manipulating document structure
-
Maps any XML dialect into any other (or even to plain text)
-
Declarative, pattern matching language, conceptually like
Prolog
-
Extremely powerful, unquestionably useful.
-
But not really a stylesheet language
Digression: Visual Appearance and Stylesheets [i]
-
XML documents are not necessarily or primarily intended to
be viewed by people, but when they are...
-
The visual appearance of a document should be controlled by
stylesheets.
-
The appearance of this one is.
-
In XML as in HTML you don't have to use stylesheets.
-
If you don't, you will get a plain, simple appearance.
If people are interested, you can open the stylesheet for
this presentation, slideshow.css, in a text editor.
Visual Appearance and Stylesheets [ii]
-
A special stylesheet language, XSL, was conceived to
support the new features of XML.
-
Two parts:
-
XSL-T, the Transformation language
- I've described this above
-
XSL-FO, formatting objects
-
A comprehensive language for descibing the fine detail of
document presentation.
-
Produces prolix, semantically impoverished markup.
-
Not supported by any client yet.
- Really a stylesheet language...
-
Of doubtful value.
Visual Appearance and Stylesheets [iii]: Status of XSL
-
XSL-T was adopted
on 16 November 1999 as a W3C recommendation.
-
XSL-FO was adopted
on 21 November 2000 as a W3C recommendation.
-
Microsoft IE5 implemented a proprietary 'XSL' which
is based on an older draft of XSL; newer IE5s are migrating
towards the standard
Visual Appearance and Stylesheets [iv]: XSL Summary
-
Transformation language of unquestionable merit, greatly
aids separating content from presentation.
-
Designed primarily to transform XML to XSL-FO, but can
transform to any other XML dialect (including XHTML).
-
Recommendation:
-
use XSL-T to map XML into XHTML for presentation to
users, decorate with CSS
-
use XSL-T to map XML to other XML dialects as needed
for communication with other organisations
-
ignore XSL-FO for now, except if
- You need pixel-perfect presentation of your
documents and
- You work in an environment (e.g. an Intranet) where
you control the client.
Visual Appearance and Stylesheets [v]: What about CSS?
-
You can continue to use existing CSS1 and CSS2 stylesheets.
-
Probably.
-
Depending on what individual client vendors decide to
support...
-
all (roughly) support CSS2.
-
This presentation is not about stylesheets.
A bit about the other bits [v]: XML Schemas
- A vocabulary for defining vocabularies or 'dialects'
- A bit late arriving
- Replace DTDs, inherited from SGML
Digression: Dialects of XML
-
What is a DTD?
-
What about Schemas?
-
Do I have to use a DTD or Schema?
-
What DTDs and Schemas are available?
-
Who will write DTDs and Schemas?
What is a Document Type Definition?
-
Essentially, a dictionary for the language you are using.
-
Every Web author has heard of one
-
Every good Web author has seen one
-
Very few Web authors have written one
What about Schemas? (Schemata?)
-
Schemas are a new, alternate way to specify
XML languages
-
Officially adopted by w3c on
4th May 2001 - so still very new
-
Recommendation: Let someone else take the
grief of getting the bugs out of it - stick with DTDs for now.
More about Schemas [i]: benefits
-
The schema language is itself an XML laguage, so schemas can
be parsed with standard XML tools
-
You can specify rules for the content of elements and
attributes with much finer granularity than with DTDs
- You can specify that an attribute must be a number
- You can specify minimum and maximum values for an attribute
- You can specify regular expression patterns the
attribute must match
More about Schemas [ii]: examples
- An attribute representing someone's age
- An attribute representing a UK bank sorting code
(e.g. 68-59-13)
- An attribute representing a UK grid reference (e.g. NX7951)
The pattern specification seems to have
changed at some stage in the drafting process. The examples given
in Learning XML
don't work with Daniel Potter's tutorial
applet. Treat all tutorials with care and refer back to
the formal specification!
More about Schemas [iii]: conversion
- Schema has superset of the same information in a DTD
- You can convert a DTD to a schema with a PERL
script
- You should be able to convert a schema to a DTD using
XSL-T
- But you might lose some information
Do I have to use a DTD or Schema?
-
As with HTML, you don't have to specify a DTD.
-
Even if you define new markup...
-
... but client programs won't know how to interpret your
new markup unless you also define a DTD or Schema.
-
As with HTML, you should specify one.
What DTDs and Schemas are available?
-
All the XML extensions discussed in this presentation are
defined as DTDs or Schemas (mostly DTDs).
-
Thousands of SGML DTDs are available which can relatively
easily be converted.
-
There are already many hundreds of XML DTDs available, and the number
is growing fast.
Some repositories:
Who will write DTDs and Schemas? [i]
-
Very specialised documents, technically demanding to write.
-
For most purposes, suitable examples are available.
-
Most XML users will never write one.
Who will write DTDs and Schemas? [ii]
-
Large organisations with special documentation requirements
may write DTDs and/or Schemas.
-
Communities of organisations which wish to exchange data
will probably write DTDs and/or Schemas.
-
Corporations which sell application programs will
probably write DTDs and/or Schemas.
-
Corporations which sell WYSIWYG Web authoring tools
will certainly write DTDs and/or Schemas.
-
In future, there will be much less distinction between
a word processor and a Web authoring tool.
-
Communities of interest with special technical needs
will certainly write DTDs and/or Schemas.
A bit about the other bits [vi]: SOAP
- Simple Object Access Protocol
- A vocabulary for communicating with software agents in a
heterogenous network
- Not actually very simple...
- But this is an inherently difficult area
- Software toolkits (such as Apache Soap) will make this easier
to deploy
XML in your context
-
Applications which will benefit greatly from XML
-
Applications which will benefit little from XML
-
XML in action: Content syndication
Applications which benefit greatly from XML
-
Applications exchanging structured data with other software agents.
-
Accounting systems exchanging orders, invoices,
payments...
-
Engineering systems exchanging specifications,
dimensions...
-
Diary systems exchanging bookings, events, meetings,
holidays...
-
Technical documentation applications, or applications
involving special notation (e.g., mathematics, music).
-
Applications requiring highly detailed illustrations.
-
Multimedia applications.
At present, only where the audience is controlled
Applications which will benefit little from XML
-
Simple publishing of text, with or without simple graphics.
XML in action: content syndication
- What is content syndication
- History of Syndication
- Standards for Syndication
- Offering Syndication
- Incorporating Syndication
- Aggregation
What is content syndication
- Making headlines from one web site available to others
- Automatically
- A dramatically successful public application of XML
History of Syndication
- In the beginning was the ripper
- 1997: ScriptingNews starts promoting XML-based syndication
- 1999: My Netscape and Rich Site Summary 0.90
- 1999: ScriptingNews elements integrated by Netscape into RSS
0.91
- 2001: Netscape abandon Rich Site Summary
Standards for Syndication
- Rich Site Summary 0.91
- Netscape, now abandoned
- Very, very simple
- Still useful
- Rich Site Summary 1.0
- Invent your own
Offering Syndication
- Provide a URL on your site from which an RSS document can be
pulled
- Example pulled from a flat file (static, compiled
periodically)
[Wired news]
- Example pulled from a Servlet (dynamic)
[PRES]
- You can do this with CGI, or any other server side content
technology
- Very easy to set up.
Incorporating Syndication [i]
- Periodically request RSS from donor sites and transform to HTML
- Example sites
Incorporating Syndication [ii]: Sample code
<!-- sidebar sections: show title and top eight entries -->
<xsl:template match="rss">
<h2>
<xsl:apply-templates select="channel/title" />
</h2>
<xsl:for-each select="channel/item">
<xsl:if test="9 > position()">
<p>
<a>
<xsl:attribute name="href"><xsl:value-of
select="link"/>
</xsl:attribute>
<xsl:apply-templates select="title" />
</a>
</p>
</xsl:if>
</xsl:for-each>
</xsl:template>
|
|
Sample XSL code
|
Moreover Internet Europe headlines, processed with
this XSL 22nd May 2001
|
Aggregation
- If you can collect headlines from multiple sources, you can
search the collection with predetermined patterns, and offer
personalised aggregations of news to users.
-
O'Reilly's Meerkat
-
Start of something big.
Worked Example: a meeting arranger system
-
We all go to meetings...
-
We all know what a hassle it is arranging them...
-
Wouldn't it be nice if the machines could do it for us?
-
Here's how!
Creating an example document (quite easy)
-
Start by typing what you want into your favourite text
editor.
-
Invent sensible looking markup as you go along.
-
Don't be too casual about this
-
this is a data design exercise,
-
you need to think about not only what you need for this
document,
-
but what you might need for others.
-
you need to think about all the possible uses of your
document.
-
Here's one I did earlier.
This is a good opportunity for a whiteboard and some
interaction! If possible, get the participants to do an
example for themselves.
Creating the DTD and/or Schema (hard, but we'll use a trick)
-
DTDs and Schemas are precise, technical documents. How are we
going to make them?
-
Pass our example page to the DTDGenerator
- Pass the results of that through the
DTD2Schema script
(requires PERL)
-
Tidy up the results with your text editor
-
Here's a DTD and
a schema I did earlier.
Again, if possible, get the participants to actually do this.
Viewing it: creating a style-sheet (harder)
-
Two approaches to stylesheets:
-
CSS1:
-
just establishes visual styles for the actual elements in your
document
-
XSL:
-
much more complex, but allows on-the-fly transformation
of the document to present particular features
(Of course, you can just do without altogether)
-
Here's one just for the
agenda.
Using it: applications
Now we need to write applications which will:
-
allow us to generate these documents
-
not very hard, there are Java components around which
semi-automate creating a form-driven special-purpose
editor from a DTD...
-
allow our diary programs to automatically handle these
documents
-
much harder, but XML parser libaries are available for
most modern programming languages which you can build
on.
-
We probably won't get that far today.
Specifying
- The Structure of an XML document
- Exercise period [i]
The Structure of an XML document
- Overall structure
- Processing Instructions
- XML Namespaces
- Elements
- Attributes
- When to use which
Overall Structure
- Prolog
- The XML declaration
-
<?xml version="1.0"?>
- declares that this is XML
- strictly, not optional
- The Document Type Declaration
<!DOCTYPE meeting PUBLIC
"-//WEFT//DTD MEETING 0.1//EN" "meeting.dtd">
- says what dialect of XML this is
- optional
- Processing instructions
- Comments
- Root element
- Just an element, like any other
- Just exactly one.
- Special instructions for particular applications
- Syntactically, delimited by
<? and
?>
<?xml version="1.0"?> is a processing
instruction
- a special one
- The tag-part identifies the particular application this PI is
intended for
xml means 'any XML parser'
- The rest of the content is application specific
- Warning: Special use of the term!
- Allow mutiple XML dialects to be used in one document
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
- xmlns means 'this is an XML namespace declaration'
- the rest means that names starting with xsl: belong to the
namespace defined as http://www.w3.org/1999/XSL/Transform
- Note that the URL doesn't actually point to anything interesting,
it's just a marker!
Elements [i]
Syntactically, an element is what is delimited by its tags.
- An opening tag comprises a left angle bracket
<, the name of the element, optionally some
attribute-value pairs, and a closing angle bracket
>
<meeting id="June Board Meeting">
- A closing tag comprises a left angle bracket
<, a slash /, the name of the
element, and a closing angle bracket >
- An empty tag comprises a left angle bracket
<, the name of the element, optionally some
attribute-value pairs, a slash /, and a closing
angle bracket >; it is just shorthand for an
opening tag immediately followed by the closing tag with nothing
in between.
Elements [ii]
- An element is a primary structural unit in the XML markup
- May allow child elements of particular kinds
- Or just text (PCDATA)
- Or neither (empty tags)
- An element may have many child elements with the same name
Attributes
- An attribute belongs to a particular element type
- Has a name which is a string of characters
- Has a value which is a string of
characters
- Syntactically
- name and value are separated by an equals sign
=
- value is delimited by quotation marks
"
- An element may have only one attribute with any given name
When to use which
- When you may have a value which is a complex data item,
use an element
- example: agenda containing agenda items
- When you may have many values of the same type, use an element.
- When you may have a long simple text value, use an element
- example: title of an agenda item
- When you always have just one short simple text value, use an
attribute
- example: proposer of an agenda item
<meeting id="June Board Meeting">
<agenda>
<item proposer="Simon Brooke">
<title>
Adoption of new project management
procedures manual
</title>
</item>
<item proposer="Angela Stormont">
<title>
Transfer of shares
</title>
</item>
</agenda>
</meeting>
Exercise period [i]
-
In groups, produce a DTD for an XML dialect to describe
meetings
-
You may use the DTD generator at <URL:http://www.pault.com/Xmltube/dtdgen.html>
-
You should think about your meetings database as you do so
and have some idea of how your XML DTD relates to your
database design.
Creating
-
Building XML applications: tools and technologies
-
Constructing the document
-
Exercise period[ii]
Building XML applications: tools and technologies
-
Languages for XML applications
-
Tools, components and toolkits
-
What we will be using today
Why Java?
-
Portable
-
Reasonably readable
-
Very well supported with XML toolkits and components
-
I like it...
Other languages for building XML applications
Tools, components and toolkits
-
Parsers
-
Transformation engines
-
APIs
-
Where to find XML tools
Transformation engines
Apply XSL stylesheets to transform a
document from one representation to another.
- XML to XML
- XML to HTML
- XML to text
What we will be using today
- Apache Xalan
- XSL processor contributed to the Apache Foundation by IBM
closely related to IBM's LotusXSL processor
- Apache Xerces
- XML parser contributed to the Apache Foundation by IBM; based on
IBM's XML4J parser
- SAX
- Simple API for XML, by David Megginson and others
- DOM
- The W3C Document Object Model API
- W3C Jigsaw
- HTTP Server and Servlet Server developed by W3C
- Jacquard
- A toolkit of useful bits for sticking it all together. By me. Not
neccesarily the best but it's what I know and use.
Constructing the document
-
Writing text to the output stream
-
Using the DOM
The Document Object Model
-
Standardised interface for working with XML documents
-
A W3C standard
-
Many DOM implementations
The DOM: what is a Document?
A document is just a document.
The DOM: what is an Element?
-
A 'tag'
-
With 'attributes'
-
And 'contents'
-
other elements which are children of this element
-
text elements
The DOM: what is a Text?
-
just text
-
No tag
-
No attributes
-
No enclosing angle brackets
Create a document object
DocumentImpl doc = new DocumentImpl();
Add a root ('content') element
doc.appendChild( new ElementImpl( doc, "eventsdiary"));
Element content = doc.getDocumentElement();
-
Every Document must have exactly one 'content' element
-
If you attempt to add another child to a document which
already has a child, that's an error.
Add further elements recursively as required
Enumeration e = events.elements();
while ( e.hasMoreElements())
content.appendChild( new
EventElement( doc, ( Context) e.nextElement()));
Let's see that again [i] the source
public class DayView extends DocumentGeneratorImpl
{
/** generate a document containing all the details of this session
* in this context */
public Document generate( ServiceContext context)
throws DataStoreException, SQLException
{
DocumentImpl doc = new DocumentImpl();
String day = context.getValueAsString( "day");
uk.co.weft.dbutil.Calendar when = new uk.co.weft.dbutil.Calendar();
if ( day == null) // default to today
day = new java.sql.Date( when.getTime().getTime()).toString();
else
when.setTime( java.sql.Date.valueOf( day));
doc.appendChild( new ElementImpl( doc, "eventsdiary"));
Element content = doc.getDocumentElement();
content.setAttribute( "date", when.toString());
String q = "select EVENT.Actor, EVENT.Event, " +
"CATEGORY.Description as Type, " +
"LOCATION.Description as Location, " +
"EVENT.EventDate, EVENT.StartTime, EVENT.EndTime, " +
"LOCATION.Description as Location, " +
"EVENT.Description " +
"from EVENT, CATEGORY, LOCATION " +
"where EVENT.EventDate = '" + day + "'" +
"and EVENT.Location = LOCATION.Location " +
"and EVENT.Category = CATEGORY.Category " +
"order by EventDate, StartTime";
Connection c = context.getConnection();
Statement s = c.createStatement();
ResultSet r = s.executeQuery( q);
Contexts events = new Contexts( r);
Enumeration e = events.elements();
while ( e.hasMoreElements())
content.appendChild( new
EventElement( doc, ( Context) e.nextElement()));
s.close();
context.releaseConnection( c);
return doc;
}
}
Let's see that again [ii]: the event element
The event element is a simple wrapper round a context
element:
/** an XML element representing a single event. This uses
* ContextElement which knows how to construct a DOM element node by
* taking values out of a context, so all we need to do is tell it
* which value names to treat as attributes and which as children */
class EventElement extends ContextElement
{
/** The name of this particular element type */
protected static String name = "event";
public EventElement( DocumentImpl doc, Context source)
{
super( doc, name, source);
}
/** return a String array of the names of my properties to output
* as attributes */
protected String[] getAttrNames()
{
String[] attrNames =
{ "event", "type", "location", "starttime", "endtime", "actor"};
return attrNames;
}
/** return a String array of the names of my properties to output
* as children */
protected String[] getChildNames()
{
String[] childNames = { "description"};
return childNames;
}
}
Let's see that again: [iii] the context element
-
A class which makes a simple element out of a namespace.
Often useful
-
Not part of DOM or SAX - part of my own Jacquard
toolkit
-
There's no particular reason to use Jacquard
-
ContextElment
-
Takes a 'context' (just a namespace)
-
A name to be used as an element name
-
A list of names which are to be used as attributes
-
A list of names which are to be used as child (text)
elements
-
Constructs an element node to that specification
Let's see that again: [iv] the output
<?xml version="1.0"?>
<eventsdiary
date="Jul 18, 2000">
<event
actor="simon"
endtime="5:30:00 PM"
event="19"
location="Yokohama, Japan"
starttime="9:00:00 AM"
type="Otherwise unavailable">
<description>
Lecture, Java XML, all day
</description>
</event>
</eventsdiary>
Should be online
here (login required). HTML formatted view here
Exercise period [ii]
We may skip this one if time's short or the group is
struggling!
-
In groups: Try to write a Java application or Servlet which
produces at least part of an XML document to your meeting
DTD from your database
Transforming
- Beginning XSL-T
- Exercise period [iii]
Beginning XSL-T [i] The 'stylesheet'
<?xml version="1.0"?>
<xsl:stylesheet version=1.0
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Basic XSL stylesheet for day view of events diary. -->
<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>
<xsl:template match="eventsdiary">
<html>
<head>
<title>
Diary for <xsl:value-of select="@date" />
</title>
<link rel="StyleSheet" href="/styles/jacquard.css" type="text/css"
media="screen"/>
</head>
<body>
<h1>
Diary for <xsl:value-of select="@date" />
</h1>
<table>
<tr>
<th rowspan="2">
Who
</th>
<th rowspan="2">
Where
</th>
<th colspan="2">
When
</th>
<th rowspan="2">
What
</th>
<th rowspan="2">
Details
</th>
<th rowspan="2">
<a href="event">Add</a>
</th>
</tr>
<tr>
<th>
Starts
</th>
<th>
Ends
</th>
</tr>
<xsl:apply-templates select="event" />
</table>
</body>
</html>
</xsl:template>
<xsl:template match="event">
<tr>
<td>
<xsl:value-of select="@actor"/>
</td>
<td>
<xsl:value-of select="@location"/>
</td>
<td>
<xsl:value-of select="@starttime"/>
</td>
<td>
<xsl:value-of select="@endtime"/>
</td>
<td>
<xsl:value-of select="@type"/>
</td>
<td>
<xsl:value-of select="description"/>
</td>
<td>
<a>
<xsl:attribute name="href">event?event=<xsl:value-of
select="@event"/>
</xsl:attribute>
Edit
</a>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
Beginning XSL-T [ii] The 'stylesheet' tag
<?xml version="1.0"?>
-
This says this stylesheet is written in XML; it should be
the first line of every XML document
-
Yes, XSL is a dialect of XML
-
version=1.0 says it's version 1.0 of XML
<xsl:stylesheet version=1.0
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-
Every XSL-T 'stylesheet' starts with this
-
xsl:stylesheet says it's a stylesheet
-
version=1.0 says it's version 1.0 of XSL
-
xmlns says the namesspace definition of names
which start with 'xsl:' is identified by the
URL http://www.w3.org/1999/XSL/Transform
Beginning XSL-T [iii] comments
<!-- Basic XSL stylesheet for day view of events diary. -->
-
Comments in XSL text are just like any other XML (or SGML)
comments
-
Start with
<!-- (the space matters)
-
End with
--> (the space matters)
-
Because they're comments, they don't appear in the output
-
To create comments in the output, use
xsl:comment
-
<xsl:comment>text of
comment</xsl:comment>
-
will produce
<!-- text of comment
-->
Beginning XSL-T [iv] output specifier
<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>
-
The output specifier is not required
-
If it exists it must appear at top level
-
as a child of the
xsl:stylesheet element
-
indent="yes" says we want the output neatly
indented to show structure
-
method="html" saya we want the output to have
html syntax
-
might have been "xml" or "text"
-
doctype-public says include a DOCTYPE
declaration of this DTD
-
There are a number of other possible
attributes.
Beginning XSL-T [v] declaring a template
<xsl:template match="eventsdiary">
This template matches every instance of the element
eventsdiary which is found in the document being processed.
As eventsdiary is the root element of the
document type we're interested in, there will only be one.
<html>
<head>
<title>
As you can see, what is in the template is just the HTML
markup that will be output (if we were outputting XML, it
would be XML, of course)...
Diary for <xsl:value-of select="@date" />
with scattered among it special xsl tags which cause things
to be spliced into the output. This one says 'use the value
of the data attribute of the current element'
</title>
</head>
<body>
<h1>
Diary for <xsl:value-of select="@date" />
</h1>
<table>
<tr>
<th rowspan="2">
Who
</th>
<th rowspan="2">
Where
</th>
<th colspan="2">
When
</th>
<th rowspan="2">
What
</th>
<th rowspan="2">
Details
</th>
<th rowspan="2">
<a href="event">Add</a>
</th>
</tr>
<tr>
<th>
Starts
</th>
<th>
Ends
</th>
</tr>
<xsl:apply-templates select="event" />
This is the important one. It says "apply the templates in
this stylesheet to all the instances of event elements which
are children of the current node".
</table>
</body>
</html>
</xsl:template>
Beginning XSL-T [vi] other useful bits
<xsl:template match="section[ @slot='main']">
This template will match only section elements
which have an attribute named slot whose value
is main
<p>
<xsl:call-template name="toc"/>
paste in the output of the named template called toc.
</p>
<xsl:apply-templates select="section">
<xsl:sort select="title"/>
Apply templates in this stylesheet to sections
which are children of this section, sorted
alphabetically by their title sub-element
</xsl:apply-templates>
</xsl:template>
<xsl:template name="toc">
This is the named template which was called earlier.
Most templates are not named: they are applied automatically
if their patterns match an element
<xsl:for-each select="section">
for-each iterates over matching elements in turn
<xsl:sort select="title"/>
<a>
<xsl:attribute name="href">#<xsl:value-of select="title"/>
</xsl:attribute>
xsl:attribute allows us to construct the value of an
attribute of the enclosing tag
<xsl:value-of select="title"/>
</a> |
</xsl:for-each>
</xsl:template>
XSL-T elements: reprise
-
xsl:output
-
allows us to define how we want the output to be formatted
-
xsl:template
-
defines what should be output for elements matching a given
pattern
-
xsl:apply-templates
-
applies the templates to the elements which match its
pattern
-
xsl:call-template
-
calls a template with a particular name, overriding the
pattern-matching system
-
xsl:for-each
-
produces output iteratively, overriding the
pattern-matching system
-
xsl:sort
-
orders the result of its enclosing element (an
xsl:apply-templates or an xsl:for-each)
-
xsl:value-of
-
produces the value of the thing matched by its pattern
-
xsl:attribute
-
outputs an attribute for the output element which encloses
it
There are a few more XSL elements, but these will do most
things for you.
Beginning XSL-T [vii]: Patterns
-
*
-
matches any element
-
foo
-
matches any element whose type is
foo
-
foo | bar
-
matches any element whose type is
foo or
bar
-
foo/bar
-
matches any
bar element with a
foo parent
-
foo//bar
-
matches any
bar element with a
foo ancestor
-
foo[ @bar='baz']
-
matches any
foo element which has a
bar attribute which has the value baz
-
foo[1]
-
matches any
foo element which is the first
foo child of its parent
-
foo[ position() = 1]
-
matches any
foo element which is the first
child of its parent
-
[ position() < 5]
-
matches any element which is the first, second, third or
fourth child of its parent
-
text()
-
matches any text element.
This is just the basics. The full definition is here
XSL-T: A deceptively simple language
-
Not many elements
-
Simple to learn all of them
-
Very subtle in use
-
The power is in the patterns
Exercise period [iii]
-
In groups: Write an XSL-T stylesheet which produces an HTML
agenda for your group's Meeting DTD.
-
Everyone together: negotiate and agree a new, common DTD
which you can use to communicate meeting information
between your groups
-
In groups: Write an XSL-T stylesheet which produces a
document conforming to the common DTD from a document
conforming to the groups DTD.
Communicating
-
Just a bit about transport
-
XML Parsers
-
Parsing XML into the Database
- Parsing: Simple worked example
-
Exercise period [iv]
Just a bit about transport
- XML is about the content of communication, not how it's sent...
- But how do you send XML information?
- HTTP GET to get a information from a known place
- HTTP POST or PUT to send information to a known place
- Special purpose listener daemons with special
purpose protocols
- eMail
Parsers
-
read a document from some source,
-
construct a representation of that document in the machine
-
or provide the hooks to allow you to do so
Parsing is quite compute-intensive - don't do it if you don't
have to!
More about parsers [i] types
-
Event-based parsers
-
You register handlers for parsing events you are
interested in
-
The parser calls these handlers when it sees the events
-
Useful if you only want some of the information out of
the document
-
Useful if the document might use more memory than you have
available
-
Quite a lot of work to set up.
-
Document parsers
-
Usually built on event-based parsers
-
Parse the whole document and provide you with a handle
on an internal representation of it
-
Usually a DOM document object
-
Useful if you want all the information out of the
document
More about parsers [ii] types
-
Validating parsers
-
Read the DTD (or schema)
-
Read the document
-
If the document isn't valid according to the DTD,
report this
-
Good if you're making sure your document conforms to
the dialect standard
-
Non validating parsers
-
Don't read the DTD (or schema)
-
Read the document
-
Will still throw an error if the document has bad
syntax
-
Good if you just want to parse XML quickly
Parsing from XML into the database
- Walk recursively down the document tree
- identifying the elements we want to store
- for each one, see if it's already there (tricky!)
- if not, store it.
Identifying the data to store
- The attributes of an element are a namespace
- So are the fields of a table
- If you have one table for every element type
- and one field in that table for every attribute
that element can have
- It's relatively easy
- The real world isn't often like that
- the overall structure of XML and relational databases are
quite different
- most serious databases have been around a long time,
we can't just design them to fit our DTD
- most DTDs are agreed between large numbers of
organisations, we can't just design them to fit our
database
- but it may be coerced with a little help from XSL...
Other things to bear in mind
- Text nodes - what do you do with them?
- Context - what was the key value of that meeting we
just stored?
Parsing: very simple worked example
- Sample XML document
- Sample Java class
Sample XML document
<?xml version="1.0"?>
<workshop tutor="Simon Brooke"
title="Parsing XML" venue="small">
<attendee name="Jon Smith" age="37"
sex="M" country="UK" />
<attendee name="Jane Doe" age="42"
sex="F" country="US" />
</workshop>
those who were here yesterday will probably
recognise this from the 'WORKSHOP' database - I'm using this
because I can't predict what your 'MEETING' databases will look
like
Sample Java class
import java.io.*; // to read things from the user
import java.sql.*; // to talk to the database
import uk.co.weft.domutil.*; // things to convert elements to namespaces
import uk.co.weft.dbutil.*; // things to store namespaces in databases
import org.w3c.dom.*; // interrogates a DOM tree...
import org.apache.xerces.dom.*; // using Apache's DOM implementation
import org.apache.xalan.xslt.*; // Apache's XSL processor
import org.apache.xerces.parsers.DOMParser;
// and Apache's XML parser
public class ParseExample
{
static Context connectionContext = new Context();
// a context to hold database
// connection details
/** walk down a document tree looking for nodes we recognise */
public static void walk( Node node)
throws SQLException, DataStoreException
{
if ( node.getNodeType() == Node.ELEMENT_NODE)
{
Element elt = ( Element) node;
System.out.println( "Considering element of type " +
elt.getTagName());
if ( elt.getTagName().equals( "workshop"))
handleWorkshop( elt);
else
{
NodeList children = elt.getChildNodes();
for ( int i = 0; i < children.getLength(); i++)
walk( children.item( i));
// recurse down through the children
}
}
}
/** handle a workshop element; extract its attribute (and
* actually, it's text-only child) values, and store them in the
* database. Then look for attendees.*/
protected static void handleWorkshop( Element elt)
throws SQLException, DataStoreException
{
Object key = null;
Context c = ( Context)connectionContext.clone();
// construct a new namespace with just
// the database connection details in
// it
ContextElement.populateContext( elt, c);
// fill it with values from the element
TableDescriptor workshopDescriptor =
TableDescriptor.getDescriptor( "WORKSHOP", "Workshop", c);
// get a descriptor on the WORKSHOP table
Contexts rows = workshopDescriptor.match( c);
// try to match that against what's
// already in the table
if ( rows != null && rows.size() > 0)
{ // there was a match
key = ( ( Context)rows.get( 0)).getValueAsInteger( "Workshop");
// get its primary key value
System.out.println( "Found workshop " + key.toString());
}
else
{
key = workshopDescriptor.store( c);
// store it and get its primary key value
System.out.println( "Created workshop " + key.toString());
}
NodeList children = elt.getChildNodes();
for ( int i = 0; i < children.getLength(); i++)
{ // look through the children for my attendees
Node child = children.item( i);
if ( child.getNodeType() == Node.ELEMENT_NODE &&
( ( Element) child).getTagName().equals( "attendee"))
{
handleAttendee( ( Element)child, key);
}
}
}
/** handle an attendee element by finding or storing it in the
* database, and fixing up the link table */
protected static void handleAttendee( Element elt, Object workshopKey)
throws SQLException, DataStoreException
{
Object attendeeKey = null;
Context c = ( Context)connectionContext.clone();
// construct a new namespace with just
// the database connection details in
// it
ContextElement.populateContext( elt, c);
// fill it with values from the element
TableDescriptor attendeeDescriptor =
TableDescriptor.getDescriptor( "ATTENDEE", "Attendee", c);
// get a descriptor on the ATTENDEE table
Contexts rows = attendeeDescriptor.match( c);
// try to match that against what's
// already in the table
if ( rows != null && rows.size() > 0)
{ // there was a match
attendeeKey =
( ( Context)rows.get( 0)).getValueAsInteger( "Attendee");
// get its primary key value
System.out.println( "Found attendee " +
attendeeKey.toString());
}
else
{
attendeeKey = attendeeDescriptor.store( c);
// store it and get its primary key value
System.out.println( "Created attendee " +
attendeeKey.toString());
}
String q = "insert into ATTENDANCE ( Attendee, Workshop) values ("
+ attendeeKey.toString() + ", " + workshopKey.toString() + ")";
Connection conn = c.getConnection();
Statement s = conn.createStatement();
// set up a database connection
s.executeUpdate( q); // run the statement
System.out.println( "Inserted link into link table");
s.close(); // close it...
c.releaseConnection( conn);
// and release it back into the pool
}
/** prompt the user for input; if we get any, return it */
protected static String maybeGetFromUser( BufferedReader in, String prompt,
String val) throws IOException
{
System.out.print( prompt + " ] ");
String s = in.readLine();
if ( s != null || s.length() == 0)
val = s.trim();
return val;
}
/** start me up... */
public static void main(String args[])
{
BufferedReader in = new
BufferedReader( new InputStreamReader( System.in));
// get from the user the name of the
// database driver to use
try
{
Class.forName(
maybeGetFromUser( in, "Database Driver",
"sun.jdbc.odbc.JdbcOdbcDriver"));
// get from the user the details
// needed to connect to the database
connectionContext.put( "db_url",
maybeGetFromUser( in, "Database URL",
"jdbc:odbc:workshop"));
connectionContext.put( "db_username",
maybeGetFromUser( in, "Database Username",
"nobody"));
connectionContext.put( "db_password",
maybeGetFromUser( in, "Database Password",
"doesntmatter"));
DOMParser p = new DOMParser();
p.parse( maybeGetFromUser( in, "URL of XML to handle",
"file:workshop.xml"));
walk( p.getDocument().getDocumentElement());
System.exit( 0); // all satisfactory
}
catch ( Exception e)
{
System.out.println( "Failed: " + e.getClass().getName() +
": " +e.getMessage());
System.exit( 1); // whoops
}
}
}
Exercise period [iv]
- In your groups
- Write an XSL-T stylesheet that converts back from the common DTD
to the group's DTD
- Adapt the above Java class to store (at least part
of) documents in your group's DTD into your database
References
XML
- news:comp.text.xml
- Newsgroup for XML - recommended
FAQs, Directories and Resources
- Extensible Markup Language (XML): http://www.oasis-open.org/cover/xml.html
- A useful and authoritative overview of the technology; another good place to start.
- Frequently Asked Questions about the Extensible Markup Language: http://www.ucc.ie/xml/
- The most superior FAQ. Everyone seriously interested in XML should start here.
- SCHEMA.NET: The XML Schema Site: http://www.schema.net/
- Cafe con Leche XML News, and Resources: http://metalab.unc.edu/xml/index.html
- DEVELOPERLIFE.COM brought to you by Nazmul Idris.: http://developerlife.com/
- xmlTree - The leading directory of XML content on the Web: http://www.xmltree.com/
News
- Welcome to XMLNews.org: http://www.xmlnews.org/
- Mulberry Technologies, Inc.: XSL-List -- Open Forum on XSL: http://www.mulberrytech.com/xsl/xsl-list/
- XMLephant: News: http://www.xmlephant.com/pages/News/
- XML.ORG - A good XML Portal: http://www.xml.org/
- XML.com - Another good XML portal: http://www.xml.com/pub
Standards
- Authoritative sources of standards documents, mostly from the World Wide Web Consortium (W3C)
Core standards
- The Annotated XML Specification: http://www.xml.com/axml/testaxml.htm
- The standard annotated by one of the editor's personal comments -- very revealing!
- Extensible Markup Language (XML) 1.0: http://www.w3.org/TR/1998/REC-xml-19980210
- XML Linking Language (XLink): http://www.w3.org/TR/WD-xlink#addressing
Resource Description Framework
- W3C Resource Description Framework: http://www.w3.org/RDF/
- java tutorial help resource only at gamelan.com: http://www.gamelan.com/journal/techfocus/090199_rdf1.html
- UKOLN: DC-dot, A Dublin Core Generator: http://www.ukoln.ac.uk/metadata/dcdot/
- Dublin Core Metadata Initiative / Documents / Proposed Recommendations / Dublin Core Element Set, Version 1.1: http://purl.org/DC/documents/rec-dces-19990702.htm
- Dublin Core Metadata Initiative: http://purl.org/dc/index.htm
- UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
- UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
- Welcome to XMLNews.org: http://www.xmlnews.org/
XSL
- XSL Transformations (XSLT) Specification: http://www.w3.org/TR/WD-xslt
DocBook
- The nwalsh.com Home Page - XSL DocBook Stylesheets: http://nwalsh.com/docbook/xsl/
- XSL DocBook Stylesheets: http://nwalsh.com/docbook/xsl/
WML
- WAP WAP Binary XML (WBXML) Encoding Specification: http://www.w3.org/TR/wbxml/
- Welcome to WAP School: http://www.refsnesdata.no/wap/default.asp
- Nokia WAP Developer Forum: Nokia WAP Toolkit: http://www.forum.nokia.com/wapforum/main/1,6668,1_1_3_2,00.html
RSS: Rich Site Summary
Tutorials
- My Netscape Network: http://my.netscape.com/publish/
- Using RSS News Feeds - Webreference.com: http://www.webreference.com/perl/tutorial/8/
Feed Directories
- Webfeeds: http://www.stirbitch.com/cgi-bin/agg/sources.pl
- Moreover... Top stories: http://w.moreover.com/
- StartsHere Channel List: http://theweb.startshere.net/channels.phtml
- Open Directory - Computers: Internet: WWW: Web Portals: Netscape Netcenter: My Netscape Network: http://dmoz.org/Computers/Internet/WWW/Web_Portals/Netscape_Netcenter/My_Netscape_Network/
- Internet Alchemy : Internet Alchemy : RSSMaker: http://internetalchemy.org/rss/index.phtml
- xmlTree - The leading directory of XML content on the Web: http://www.xmltree.com/rss/index.htm
- XML.COM - Standards List Sorted by Date: http://www.xml.com/xml/pub/standate/
- W3C Scalable Vector Graphics (SVG): http://www.w3.org/Graphics/SVG/
- VML - the Vector Markup Language: http://www.w3.org/TR/1998/NOTE-VML-19980513
- Vector (infinitely zoomable) graphics for the Web, with implications especially for maps and technical diagrams.
- News Industry Text Format: http://www.nitf.org/
- Meta Content Framework Using XML: http://www.w3.org/TR/NOTE-MCF-XML/
- 'Content about content' - i.e. information for search and indexing engines and other software agents which must make some sense of the document.
- Audio, Video, and Synchronized Multimedia: http://www.w3.org/AudioVideo/
- The SMIL standard. I believe SMIL has implications not just for the Web, but for all sorts of presentation media including digital television.
- XHTML 1.0: The Extensible HyperText Markup Language: http://www.w3.org/TR/WD-html-in-xml/
- Backwards compatibility: implementing HTML in XML. Only very well written HTML is going to work!
- XML Catalog proposal: http://www.ccil.org/~cowan/XML/XCatalog.html
- XHTML 1.0: The Extensible HyperText Markup Language: http://www.w3.org/TR/xhtml1/
- Template Resolution in XML/HTML: http://www-uk.hpl.hp.com/people/ak/doc/trix.html
- eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html
- Workflow Management Coalition: http://www.aiim.org/wfmc/mainframe.htm
- DSML.ORG: The Standards Effort to Link Directories with XML: http://www.dsml.org/
Turorials
- Info for Newcomers to XML at XMLINFO: http://www.xmlinfo.com/newcomers/
- Producing HTML tables with XSLT: http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
- A Tutorial in XML and XSL Authoring: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
- Java & XML: 1 + 1 > 2: http://www.sun.com.au/sjug/pres/xml/JavaAndXML/seminar.html#Slide3
- The WDVL: XML Tutorials: http://www.wdvl.com/Authoring/Languages/XML/Tutorials/
- Generally Markup: XML Resources: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
- developerWorks : XML : Education: http://www.software.ibm.com/developer/education/xmlintro/xmlintro.html
- SGML/XML: Using Elements and Attributes: http://www.oasis-open.org/cover/elementsAndAttrs.html
- Producing HTML tables with XSLT: http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
- Welcome to XML School: http://www.refsnesdata.no/xml/
- Practical XML : An introduction to XML and XSL stylesheets: http://www.kst.com/articles/2000/January/practical_xml1/index.php
- Crane Softwrights Ltd. - Training: http://www.CraneSoftwrights.com/training/index.htm#ptux-dl
- developerWorks : XML : Education: http://www-4.ibm.com/software/developer/education/xmlintro/xmlintro.html
- RSS Tutorial: http://my.netscape.com/publish/help/mnn20/quickstart.html#rsssyntax
- XML DTD Tutorial: http://www.xml101.com/dtd/
Software resources
Editors
- Editing SGML with Emacs and PSGML - Table of Contents: http://rainbow.ldeo.columbia.edu/documentation/programs/psgml/psgml_toc.html#SEC2
- A GNU Emacs mode for SGML files: http://www.lysator.liu.se/projects/about_psgml.html
- This is what I use and recommend (I personally use XEmacs
rather than GNU Emacs)
- SoftQuad XMetaLhttp://www.softquad.com/index_main.html
- Mulberry Technologies -- tdtd Emacs Major Mode for SGML and XML DTDs: http://www.mulberrytech.com/tdtd/
- Download Morphon XML Editor 1.0b41: http://www.lunatech.com/products/morphon-xml-editor/download/
Browsers
- Jumbo: http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/
- Doczilla: http://www.doczilla.com/download/index.html
- XML Viewer : another alphaWorks technology: http://www.alphaworks.ibm.com/tech/xmlviewer
- InDelv: http://www.indelv.com/
XML to HTML on the fly
- IBM XML Web Site, Education - Accessing XML on the Client: http://www.software.ibm.com/xml/education/client/client.html
- Apache Cocoon: http://xml.apache.org/cocoon/
- Apache is the world's most widely used Web server. This is the Apache project's server-side XML to HTML conversion strategy, important for serving XML documents while many browsers are still unable to interpret it. Implemented as a Java Servlet, may work with other Servlet enabled Web servers (but then does anyone serious use anything other than Apache anyway?)
XML Database integration
- DB2XML A tool for transforming relational databases into XML documents: http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
- Tamino - The Information Server for Electronic Business, Software AG: http://www.softwareag.com/tamino/
- A database which claims to store XML directly. Whether this means that it's really an object-oriented database underneath I'm not sure.
- ODBC2XML: Merging ODBC data into XML documents: http://members.xoom.com/_XOOM/gvaughan/odbc2xml.htm
- pgxml homepage: http://www.morinel.demon.nl/pgxml/
- My favourite database engine, Postgres,
- XML Lightweight Extractor : another alphaWorks technology: http://alphaworks.ibm.com/tech/xle
Conversion tools and filters
- RTF2XML: http://www.xmeta.com/omlette/
- Tool for converting RTF to XML, written in Omnimark
- OmniMark Technologies Corporation: http://www.omnimark.com/
- A programming language for manipulating data streams, useful in writing conversion filters from other formats into XML.
Quick ways to produce DTDs
- DTDGenerator Frontend: http://www.pault.com/Xmltube/dtdgen.html
- DB2XML A tool for transforming relational databases into XML documents: http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
- schematron: http://www.ascc.net/xml/resource/schematron/schematron.html
- Widely recommended as a very powerful and elegant solution, knows about schemas as well as DTDs.
- XMLschema.com: http://apps.xmlschema.com/
Structured Search tools
- Downloading sgrep: http://www.cs.helsinki.fi/~jjaakkol/sgrep/download.html
- Probably the most powerful simple tool for manipulating SGML and XML documents
Software collections and directories
- xml.apache.org: http://xml.apache.org/
- XMLSOFTWARE.COM: The XML Software Site: http://www.xmlsoftware.com/
- This (commercial) site tries to keep track of XML related software tools which are available. Likely not to effectively index open source tools in the longer term.
- Free XML software: http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html#SC_XSL
- IBM Developers: XML : Overview: http://www.ibm.com/developer/xml/
- eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html
- OpenXML: http://www.openxml.org/
- Major open source project to provide XML tools in Java
- PHP3: Manual: XML Parser Functions: http://www.php.net/manual/ref.xml.php3
- PHP is a server-side scripting language -- probably the best of the open source ones available. This manual section shows how the PHP project intends to handle XML at the server side, and is thus an alternative to Apache's Cocoon technology.
- XML Authority Product Overview: http://www.extensibility.com/xml_authority/xml_ath_specs.htm
- eidon products - Solutions for Structured Documents: http://www.eidon-products.com/
- Dynamic XML for Java : another alphaWorks technology: http://www.alphaworks.ibm.com/tech/dynamicxmlforjava
- XML Products Evaluation Form: http://www.bluestone.com/scripts/SaApps/SaCGI.exe/XMLevaluate.class
- XML Script - XML tools for E-commerce: http://www.xmlscript.org/
- SAX: The Simple API for XML: http://www.megginson.com/SAX/
- Activated Intelligence Rocks Your Java World!: http://www.activated.com/
- W4F, the World Wide Web Wrapper Factory: Welcome: http://db.cis.upenn.edu/W4F/
- JDOM: Who We Are: http://www.jdom.org/credits/index.html
Commentry and background
- XML, Java, and the future of the Web: ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- Scientific American: Feature Article: XML and the Second Generation Web: May 1999: http://www.scientificamerican.com/1999/0599issue/0599bosak.html
- An extremely clear and well written article
- DevEdge Online - Metadata: http://developer.netscape.com/tech/metadata/index.html
- Netscape's official take on metadata.
- XML.COM - XML support in IE5: http://www.xml.com/xml/pub/1999/03/ie5/first-x.html
- XML.com sets out to be a newsletter on XML and related developments. It's contributors are in general exceptionally well informed. In this article Tim Bray (who works closely with Netscape) reviews Microsoft IE5's XML compatibility.
- CNET News.com - Taking sides on XML: http://www.news.com/News/Item/0,4,37072,00.html
- XML, Java, and the future of the Web: ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- XML Namespaces: http://www.jclark.com/xml/xmlns.htm
- The Last Page: XML's Achilles Heel (Web Techniques, June 1999): http://www.webtechniques.com/archives/1999/06/lastpage/
XML EDI and e-Commerce stuff
- A number of competeing proposals are being developed to do automatic businessto business transfer of invoices, orders,et cetera...
- CNET.com - News - Services & Consulting - Big-name chemical firms join business e-commerce trend: http://news.cnet.com/news/0-1008-200-1579569.html?tag=st
Collaborative initiatives
- The OBI Consortium: http://www.openbuy.org/
- A solid business community consortium
- Welcome to RosettaNet: http://www.rosettanet.org/
- Probably the most incompetent and unprofessional Web site I've ever seen. This organisation claims to be the hub of EDI in XML development, but their Web site gives no comfort whatever regarding their competence.
- Biztalk - Letting computers speak the language of business: http://www.biztalk.org/
- Microsoft's tame e-Commerce consortium.
- FpML.org: http://www.fpml.org/
- JP Morgan - PriceWaterhouseCoopers initiative, apparently mainly aimed at financial services.
- Electronic Business XML (ebXML) Home Page: http://www.ebXML.org/
Suppliers
- DEDIOUX - Dynamic EDI Objects Using XML: http://www.americancoders.com/OpenBusinessObjects
- ariba.com - welcome: http://www.ariba.com/
- Welcome To OpenLink Software: http://www.openlinksw.com/virtuoso/
Stories
- XML Applications Stand Up To EDI: http://www.techweb.com/wire/story/TWB19990416S0002
- XML Applications Stand Up To EDI: http://www.techweb.com/se/directlink.cgi?INW19990419S0014
- News story about Dell Computer's XML
- CNET News.com - IBM links business software, e-commerce: http://www.news.com/News/Item/0,4,35128,00.html
- News story about IBM's XML e-Commerce
WAP/WML
- WAP WAP Binary XML (WBXML) Encoding Specification: http://www.w3.org/TR/wbxml/
- wml-tools: http://www.pwot.co.uk/wml/
- www.kannel.org: http://www.kannel.org/
- XML Icon Gallery.: http://www.iol.ie/~alank/xml/icons.htm