iso8211 0-1-0 by Weft

XML Representation of Nautical Chart Data

See:
          Description

Packages
uk.co.weft.iso8211 A package to parse ISO8211 data and return a DOM tree.
uk.co.weft.iso8211.values Values which may be found in an ISO8211 file.

 

XML Representation of Nautical Chart Data

Author: Simon Brooke, scaffie ltd, Auchencairn, Scotland.

Abstract

This document gives a brief overview of XML and the core related technologies as relevant to the representation of hydrographic (marine chart) data, in particular to the existing IHO S-57 standard. It also briefly describes existing initiatives to develop XML representations of S-57.

About the Author

Simon Brooke has been involved in software development for twenty years. Over the past eight years he has been developing systems which have dynamically generated marked up documents from a variety of data sources. He has contributed to the Apache Foundation's Cocoon 2 XML publishing framework and has led tutorials on Java and XML on behalf of the Internet Society in Malaysia, Japan, the United States, Sweden and Switzerland as well as the United Kingdom. He is a technical judge for the international ThinkQuest competitions.

Introduction

The XML revolution has exploded through the field of data representation standards over the past five years. Hundreds of data representation standards, from geographic information through meteorological observations to electronic data interchange are being recast into XML compliant formats. This rush is being driven by genuine, solid benefits of the XML framework including scrutability, validation, and transformability. One area which has not yet joined this trend, but has much to benefit from it, is hydrographic information. In this paper I shall attempt to set out the core features of the XML framework as they are applicable to hydrographic data and show that it has strengths which merit serious consideration.

What is XML

Extensible Markup Language, XML, is a meta-format for the textual representation of semantically rich structured data. Syntactically, every XML document is a tree - an acyclic directed graph where each node has a single predecessor - of labeled nodes. Each node may have attributes, descendant nodes, and textual content. The XML framework provides for two different mechanisms - Document Type Descriptions, inherited from SGML, and Schemas - for defining grammars which in tern define particular application languages. These mechanisms declare what node types (labels) a conforming document may have, what attributes each type of node may have, what values are acceptable for each attribute, what child nodes a type of node may have (and what it must have), and so on.

XML is not one standard but is a family of standards. The core standards - those which define XML and the core related technologies themselves - are managed by the World Wide Web Consortium ('w3c') and the Organization for the Advancement of Structured Information Standards ('Oasis'). Beyond these, thousands of standard definitions applying the XML framework to particular domain data have been developed; these standards typically belong to relevant domain standards organisations.

Data in XML, while structured, is human-readable. An example might be:

<buoy type="south cardinal" name="Whitehaven South Shoal">
  <position datum="WGS 84">
    <longitude direction="west" degrees="4" minutes="14.32"/>
    <latitude hemisphere="north" degrees="54" minutes="21.65"/>
  </position>
  <light>
    <sector start="0" end="360" colour="white">
      <characteristic repeat="10s">
        <flash duration="very quick" number="6"/>
        <flash duration="long"/>
      </characteristic>
  </light>
</buoy>

Historically, XML is a direct descendent of Standard Generalised Markup Language ('SGML'). It is thus the culmination of decades of research and experience in the development of structured data formats.

Fundamental XML technologies

A number of technologies support the XML family of standards. The benefit of the XML standard itself is that by providing a common syntax for a wide range of data-description languages, it is possible to apply common, well tested software components to any of the derived standards, thus providing significant savings in the cost and effort required to develop new software applications.

Parsers and Schemas

For any data format, you need a parser. But XML, being a meta format, allows parsers to be written which can parse any XML format, and these parsers are freely available. But because the formal definition of a data format in XML is itself an XML document - a Schema - the same parser can parse it too, and having parsed it, can verify that the document being parsed conforms precisely to the schema. There are two consequences.

The first is that to parse a new XML data format you do not have to write, test and debug a new parser. You can adopt a proven reliable parser component off the shelf. Furthermore, as many different XML parsers conform to the same APIs, if the parser you first chose proves not to suit you, you can replace it with another without having to rewrite a single line of code.

The second consequence is that you do not have to write and test a custom data validation suite to guarantee formal conformance for your new data format. The standard XML parser can act as your data validation tool. Obviously this does not guarantee the data is accurate, merely that it conforms to the standard.

XSL-T, XPath and XML Query

XSL-T, XPath and XML Query are a group of closely related standards aimed at subsetting and transforming XML documents. The Extensible Stylesheet Language - Transformations, XSL-T, is a language for describing transformations from one XML language to another. In principle any XML language can be translated into any other using an XSL-T transform, although, of course, information which does not exist in the source document cannot be included in the target document. XSL-T is a remarkably simple language, with very few operations, but is extremely powerful in use. In concept it is a recursive pattern matching language, quite similar to Prolog. XSL-T is itself an XML language and can consequently be operated on with the same standard XML tools as any other.

The XML Path Language, XPath, is a language which makes it possible to address and select individual nodes within an XML document, for example the node representing the Whitehaven South Shoal Buoy.

The XML Query language is a proposed standard still in development, but closely linked to both XPath and XSL-T. When completed it aims to provide a language which enables the user to query and subset an XML document, for example find all the buoys with a very quick flash group, find every tenth node on each coastline. In the meantime, while XML Query is still in progress, it is possible to do exactly this sort of thing with XSL-T.

DOM

The Document Object Model, DOM, is a platform- and language-neutral API for manipulating the representations of XML documents in computer memory, with language bindings for a number of object oriented languages. Although the development of the DOM was largely driven by the needs of the World Wide Web and 'Dynamic HTML', like all the XML family of specifications what works for one works for all and the DOM can be used to manipulate representations of any XML language.

SVG

Scalable Vector Graphics, SVG, is exactly what it says - an XML language for describing vector graphics. As such, an SVG document can be the target - the output - of an XSL-T transformation. It can equally be the input of an XML transformation, which can extract a subset of the graphical information. A scalable vector graphics document can be rendered - rasterised - at potentially any scale, allowing for very considerable zooming. In practice, of course, to render a chart of the world with enough information to be able to zoom in to fully detailed harbour plans would require infeasibly large amounts of data and be infeasibly slow in operation, so in practice one would subset the chart data using XSL-T to provide a series of different SVG documents optimised for viewing at particular ranges of scales.

Availability of software components

One of the consequences - and one of the great benefits - of the whole XML project is the development and availability of a wide range of software components which act on XML documents generally. Because the syntax is consistent, a parser which can parse one XML document in one XML language can, if correctly written, parse any XML document in any language. Similarly, an XSL stylesheet processor is a universal component, which can process any XSL stylesheet transforming any XML language into any other. A SVG renderer can render any SVG document, in principle at any scale.

Furthermore, while proprietary commercial components are available in categories discussed, very high quality open source components are also available. The Apache Software Foundation, for example has two different XML parsers, Xerces deriving from code contributed by IBM, and Crimson from code contributed by Sun Microsystems. It also has a family of XSL stylesheet processors, Xalan, and an SVG toolkit, Batik, which includes a range of SVG tools including a renderer and a browser.

Because these are universal components, very widely used, they get a very great deal of testing and bugs are found and resolved very quickly. Because they are open source, it is possible (if desired) to modify the code, perhaps to add special features. The Apache license under which these components are available allows them to be used at no cost in commercial products.

What is S-57

S-57 is a standard, developed by the International Hydrographic Organization ('IHO'), describing a data format for the transfer of digital hydrographic data. The standard is based on the ISO 8211 specification for a data descriptive file for information interchange. It defines two cognate data representations, 'ASCII' and 'binary', which encode the same underlying data structure. This data structure is, like an XML document, inherently a tree. However, the number of levels in the tree is finite: each file comprises records, each record fields, each field sub-fields. That's the limit; levels cannot be added arbitrarily as they can with XML.

There are two corrollaries. The first is while any arbitrary ISO8211 data can be converted to XML the reverse is not the case. The second is that where it is easy to isolate in an XML formatted file all the information relevent to any single unit at any level of granularity (for example the characteristic pattern of the light on a particular buoy, or all the data about that buoy, or all the data about all the buoys in a channel), the same is not true in an ISO 8211 file.

By contrast to XML representations the S-57 is dense and inscrutable to the naked eye. This is, of course, not very important, because after all these files are of little use unless interpreted by machine. But in general in file formats there is a trade-off between scrutability and compactness. The ISO8211 format is not particularly compact. While record and attribute labels are typically four-letter mnemonics, and some data is stored as bitfields, no compression of data is used.

The inscrutability of S-57 data has some real downsides. The ISO 8211 document is extremely complex with some apparant ambiguities. This makes writing a robust parser hard. Furthermore, as this standard is not widely used, parsers are not as thoroughly tested as they might be for a more widely used format.

Advantages of XML representation of S-57 data

Availability of commodity software components

The market for electronic chart display systems is large, but the total market for rasterising of vector data is far larger. In electronic chart applications, the penalty for inaccurate rendering may be very large: lives and vessels may be lost, and pollutants may be spilled in sensitive areas.

S-57 data is a format unto itself. To parse it requires special purpose parsers to be written; to query or subset it requires special purpose tools to be developed; to render it again requires special software. Errors in that software are serious. Yet parsers, subsetters and renderers are already available for XML based languages. Furthermore, precisely because these are not special purpose components, they are widely used and extensively tested. Because they conform to open public standards, is one component proves unsuitable another can be substituted.

For the developer of devices using S-57 data this ready availability of tested off-the-shelf software components offers lower cost development and faster time to market. To the user of such systems, it means greater reliability.

Availability of related data

But there's more to interoperability than just this. XML is being used to represent all sorts of data, some of which is relevant input data to the development of datasets for hydrographic charts. Initiatives such as OpenGIS and the UK Ordnance Survey's Digital National Framework - both using Geographic Markup Language, GML - offer the potential to automatically extract, using a suitable XSL transform, certain features from land surveys for incorporation into charts.

Disadvantages

XML is very substantially more prolix than the existing S-57 format. Consequently, an XML file will be substantially larger than the equivalent S-57 file. However, even this disadvantage can be addressed with the use of data compression algorithms such as bzip2.

Current work

Development of an XML Schema for S-57 data

Dr.Raphael Malyankar of the Department of Computer Science and Engineering at Arizona State University is working on the development of a partial XML schema for S-57 data as part of a research project in co-operation with the U.S. National Science Foundation and U.S. Coast Guard. This work is at an early stage.

Development of S-57 to XML and S-57 to SVG components

The present author has done preliminary work on the development of an open source S-57 to XML converter. The general design is to write, in Java, an ISO8211 parser which builds a DOM tree, and then pass this DOM tree to a DOM Printer to emit XML. In the long term, a SAX based parser would be better for this purpose than a DOM based one, as it could process files larger than the available memory, but it would also be substantially harder to write.

The initial generator will (obviously) generate an ad-hoc XML vocabulary closely following the structure of edition 3.1of the S-57 standard. This project will aim to track any evolving work on XML schemas for hydrographic data and provide means - probably XSL stylesheets - to convert the generator output to fit the schema.. It is of course highly likely that the development of the generator will provide useful input into the development of the schema. The generator will be open source, with all code available for any interested parties to download, alter and experiment with. The author would be happy to accept any offers of co-operation with this project, subject to the final codebase remaining open and free to anyone to use.

Once the generator is complete the author intends to work on an XSL stylesheet transforming the generated XML into an SVG representation conforming as closely as possible to UKHO Chart 5011, which in turn is based on 'Chart Specifications of the IHO'. Clearly this stylesheet is in itself a very considerable piece of work, and, again, the author is happy to collaborate with any other parties interested in developing an open, reference implementation.

It is desirable that there be at least one open source S-57 to XML converter in order that the whole hydrographic community can share and can benefit from bug-reports and corrections to it, and that it can serve as a basis for comparison for alternative, perhaps proprietary implementations. The same argument applies equally to the the XML to SVG transform. It is probably appropriate, if the IHO is prepared to take on this role, for copyright in the open reference implementations to rest with the IHO.

Conclusion

In this paper has attempted to demonstrate the relevance and utility of XML to the digital representation of hydrographic data. Remarkably little work has thus far been done on preparing a suitable XML representation, but initiatives are now emerging. While XML would be a suitable base data format for future hydrographic file formats, it is not necessary to develop a new, XML based hydrographic file format to gain some of the benefits of the XML project. A ISO8211 parser which parses to a DOM structure which can be processed or printed as XML represents a useful interim step and this is currently in development.


iso8211 0-1-0 by Weft