Analyzing XML schemas with the Schema Infoset Model
![]() |
|
Easily perform complex queries on your schemas with this model
Level: Intermediate |
Shane Curcuru (shane_curcuru@us.ibm.com)
Advisory Software Engineer, IBM
July 2002
As the use of schemas grows, the need for tools to manipulate schemas grows. The new Schema Infoset Model provides a complete modeling of schemas themselves, including the concrete representations as well as the abstract relationships within a schema or a set of schemas. This article will show some of the power of this library to easily query the model of a schema for detailed information about it; we could also update the schema to fix any problems found and write the schema back out.
Note: This tip assumes you have a basic knowledge of schema documents; there are a number of links to schema documentation and a tutorial in Resources.
Although there are a number of parsers and tools that use schemas to validate or analyze XML documents, tools that allow querying and advanced manipulation of schema documents themselves are still being built. The Schema Infoset Model (AKA org.eclipse.xsd.*, or just "the library") provides a rich API library that models schemas -- both their concrete representations (perhaps in a schema.xsd file) and the abstract concepts in a schema as defined by the specification. As anyone who has read the schema specs knows, they're quite detailed, and this model strives to expose all the details within any schema. This will then allow you to efficiently manage your schema collection, and empower higher level schema tools -- perhaps schema-aware parsers and transformers.
Schema Infoset Model UML diagrams The library includes various UML diagrams for the actual library classes, which gives a quick overview of the relationships and attributes of common schema components. Abstract Schema Component relationships Abstract Schema Component attributes Schema Library class listing These diagrams are included in the library's documentation, including several other UML diagrams for both the abstract and concrete class trees. |
For an interface listing of the library showing all the schema objects modeled, please see Schema Infoset Model UML diagrams. The library also includes the UML diagrams used in building the library interfaces themselves; these diagrams show the relationships between the library objects, which very closely mimic the concepts in the schema specifications.
Example: Analyzing
your schemas
In this example, you'll want to check your schema for possibly
failing to specify restrictions on integer-derived types. This
could be useful for ensuring that all order quantities in purchase
orders have been bounded. Here, the schemas must be very specific,
so you want to require that all simple types that derive from
integers include both min/maxInclusive or min/maxExclusive facets.
However, if the min/maxInclusive or min/maxExclusive facets are
inherited from a type which this type derives from, that is still
sufficient.
While you can use XSLT or XPath to query a schema's concrete
representation in an .xsd
file or inside some other
.xml
content, it is much more difficult to discover
the type derivations and interrelationships that schema components
actually have. Since the Schema Infoset Model library models both
the concrete representation and the abstract concept of the schema,
it can easily be used to collect details about its components, even
when the schema may have deep type hierarchies or be defined in
multiple schema files.
In this simple schema, you will find some types that meet the criteria of having max/min facets, and some that do not. (You can find the full schema in FindTypesMissingFacets.xsd included in the zip file.)
Listing 1. Sample schema
|
Loading schemas into
the library
The library can read and write schema objects from a variety of
sources. I'll show it using the org.eclipse.emf ResourceSet
framework to easily load sets of schemas; you can also build and
emit schemas directly from or to a DOM object that you manage
yourself. The library provides a custom XSDResourceSet
implementation that can intelligently and automatically load sets
of schemas related by includes, imports, and redefines. The
abstract relationship between related schemas is also modeled in
the library.
|
Convenient schema
querying
Now that you have an XSDSchema
object, you need to
query it to find any types that are missing max/min facets. First,
you'll use some convenient library methods to quickly find all of
its simpleTypeDefinition
s that derive from the
built-in integer type. Since the library provides a complete model
of the abstract meaning of a schema, this turns out to be very
straightforward. You can query the XSDSchema
for its
getTypeDefinitions()
listing, and then filter for
XSDSimpleTypeDefinition
s that actually inherit from
the base integer type.
|
The schema components
model
Every component defined in the W3C schema specifications is modeled
in detail in the library. Now that you have a list of all
XSDSimpleTypeDefinition
s that derive from an integer,
you can query this list for ones that are missing either their max
or min facets, and produce a report. Note that the library can
conveniently group the effective max/minExclusive or
max/minInclusive facets together for quick searching; it also
provides detailed access to each type, including the actual lexical
values if needed.
|
Your report: Types
missing max/min facets
With just a little bit of code, you've discovered some fairly
detailed information about the schema. If you download the sample
code and run it against the provided schema file, you should see a
listing like this:
|
Conclusion
Although this is a contrived example, it does show how the
library's detailed representation of a schema makes it easy to find
exactly the parts of a schema you need. The library provides setter
methods for the properties of schema components, so it is easy to
update your sample to automatically fix any found types by adding
any missing facets. And since the library models the concrete
representation of the schema as well, you can write your updated
schema back out to an .xsd
file.
Sample code
A sample program, XSDFindTypesMissingFacets.java
,
shows the example in this article. It uses a schema document
FindTypesMissingFacets.xsd
which has a number of types
with and without max/min facets.
You can download the sample program and the following sample .java files in a zip file.
Copies of several other sample .java files normally shipped with the Schema Infoset Model are also attached. These include:
XSDSchemaQueryTools.java
showcases a number of
other ways to perform advanced queries on schema objects.XSDSchemaBuildingTools.java
with convenience
methods for building schemas programmatically.XSDPrototypicalSchema.java
uses the library to
build the ever-popular schema primer
PurchaseOrder sample.This content was adapted from an article on IBM developerWorks at http://www.ibm.com/developerWorks/.
About the
author Shane Curcuru has been a developer and quality engineer at Lotus and IBM for 12 years and is a member of the Apache Software Foundation. He has worked on such diverse projects as Lotus 1-2-3, Lotus eSuite, Apache's Xalan-J XSLT processor, and a variety of XML Schema tools. Questions about this article or about automated testing can be sent to him at shane_curcuru@us.ibm.com. |