Gnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C library for Gnome

Catalog support

Main Menu
Related links
API Indexes

Table of Content:

  1. General overview
  2. The definition
  3. Using catalogs
  4. Some examples
  5. How to tune catalog usage
  6. How to debug catalog processing
  7. How to create and maintain catalogs
  8. The implementor corner quick review of the API
  9. Other resources

General overview

What is a catalog? Basically it's a lookup mechanism used when an entity (a file or a remote resource) references another entity. The catalog lookup is inserted between the moment the reference is recognized by the software (XML parser, stylesheet processing, or even images referenced for inclusion in a rendering) and the time where loading that resource is actually started.

It is basically used for 3 things:

  • mapping from "logical" names, the public identifiers and a more concrete name usable for download (and URI). For example it can associate the logical name

    "-//OASIS//DTD DocBook XML V4.1.2//EN"

    of the DocBook 4.1.2 XML DTD with the actual URL where it can be downloaded

    http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd

  • remapping from a given URL to another one, like an HTTP indirection saying that

    "http://www.oasis-open.org/committes/tr.xsl"

    should really be looked at

    "http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"

  • providing a local cache mechanism allowing to load the entities associated to public identifiers or remote resources, this is a really important feature for any significant deployment of XML or SGML since it allows to avoid the aleas and delays associated to fetching remote resources.

The definitions

Libxml, as of 2.4.3 implements 2 kind of catalogs:

  • the older SGML catalogs, the official spec is SGML Open Technical Resolution TR9401:1997, but is better understood by reading the SP Catalog page from James Clark. This is relatively old and not the preferred mode of operation of libxml.
  • XML Catalogs is far more flexible, more recent, uses an XML syntax and should scale quite better. This is the default option of libxml.

Using catalog

In a normal environment libxml will by default check the presence of a catalog in /etc/xml/catalog, and assuming it has been correctly populated, the processing is completely transparent to the document user. To take a concrete example, suppose you are authoring a DocBook document, this one starts with the following DOCTYPE definition:

<?xml version='1.0'?>
<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
          "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">

When validating the document with libxml, the catalog will be automatically consulted to lookup the public identifier "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" and the system identifier "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have been installed on your system and the catalogs actually point to them, libxml will fetch them from the local disk.

Note: Really don't use this DOCTYPE example it's a really old version, but is fine as an example.

Libxml will check the catalog each time that it is requested to load an entity, this includes DTD, external parsed entities, stylesheets, etc ... If your system is correctly configured all the authoring phase and processing should use only local files, even if your document stays portable because it uses the canonical public and system ID, referencing the remote document.

Some examples:

Here is a couple of fragments from XML Catalogs used in libxml early regression tests in test/catalogs :

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC 
   "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
   uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
...

This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are written in XML, there is a specific namespace for catalog elements "urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this catalog is a public mapping it allows to associate a Public Identifier with an URI.

...
    <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
                   rewritePrefix="file:///usr/share/xml/docbook/"/>
...

A rewriteSystem is a very powerful instruction, it says that any URI starting with a given prefix should be looked at another URI constructed by replacing the prefix with an new one. In effect this acts like a cache system for a full area of the Web. In practice it is extremely useful with a file prefix if you have installed a copy of those resources on your local system.

...
<delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
                catalog="file:///usr/share/xml/docbook.xml"/>
<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
                catalog="file:///usr/share/xml/docbook.xml"/>
<delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
                catalog="file:///usr/share/xml/docbook.xml"/>
<delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
                catalog="file:///usr/share/xml/docbook.xml"/>
<delegateURI uriStartString="http://www.oasis-open.org/docbook/"
                catalog="file:///usr/share/xml/docbook.xml"/>
...

Delegation is the core features which allows to build a tree of catalogs, easier to maintain than a single catalog, based on Public Identifier, System Identifier or URI prefixes it instructs the catalog software to look up entries in another resource. This feature allow to build hierarchies of catalogs, the set of entries presented should be sufficient to redirect the resolution of all DocBook references to the specific catalog in /usr/share/xml/docbook.xml this one in turn could delegate all references for DocBook 4.2.1 to a specific catalog installed at the same time as the DocBook resources on the local machine.

How to tune catalog usage:

The user can change the default catalog behaviour by redirecting queries to its own set of catalogs, this can be done by setting the XML_CATALOG_FILES environment variable to a list of catalogs, an empty one should deactivate loading the default /etc/xml/catalog default catalog

How to debug catalog processing:

Setting up the XML_DEBUG_CATALOG environment variable will make libxml output debugging informations for each catalog operations, for example:

orchis:~/XML -> xmllint --memory --noout test/ent2
warning: failed to load external entity "title.xml"
orchis:~/XML -> export XML_DEBUG_CATALOG=
orchis:~/XML -> xmllint --memory --noout test/ent2
Failed to parse catalog /etc/xml/catalog
Failed to parse catalog /etc/xml/catalog
warning: failed to load external entity "title.xml"
Catalogs cleanup
orchis:~/XML -> 

The test/ent2 references an entity, running the parser from memory makes the base URI unavailable and the the "title.xml" entity cannot be loaded. Setting up the debug environment variable allows to detect that an attempt is made to load the /etc/xml/catalog but since it's not present the resolution fails.

But the most advanced way to debug XML catalog processing is to use the xmlcatalog command shipped with libxml2, it allows to load catalogs and make resolution queries to see what is going on. This is also used for the regression tests:

orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml \
                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
orchis:~/XML -> 

For debugging what is going on, adding one -v flags increase the verbosity level to indicate the processing done (adding a second flag also indicate what elements are recognized at parsing):

orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \
                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
Parsing catalog test/catalogs/docbook.xml's content
Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
Catalogs cleanup
orchis:~/XML -> 

A shell interface is also available to debug and process multiple queries (and for regression tests):

orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \
                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
> help   
Commands available:
public PublicID: make a PUBLIC identifier lookup
system SystemID: make a SYSTEM identifier lookup
resolve PublicID SystemID: do a full resolver lookup
add 'type' 'orig' 'replace' : add an entry
del 'values' : remove values
dump: print the current catalog state
debug: increase the verbosity level
quiet: decrease the verbosity level
exit:  quit the shell
> public "-//OASIS//DTD DocBook XML V4.1.2//EN"
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
> quit
orchis:~/XML -> 

This should be sufficient for most debugging purpose, this was actually used heavily to debug the XML Catalog implementation itself.

How to create and maintain catalogs:

Basically XML Catalogs are XML files, you can either use XML tools to manage them or use xmlcatalog for this. The basic step is to create a catalog the -create option provide Mؗ%5۷9fxmV2W{z+iQ%+6,PR+61z|2ǬǏ=j-f3S5|pAz#Ft~/֣X;繿y)`=Vb02:M*PhR1 KmaWAY c02ۍbjj6 _k׀= q<X؏xVh_JR[1Yh11)lR-)S_ghH'MxZ( x;9P?"}cypE(V_5p'6BYTΡzx{e'`zaѵARD v2%/M@w:nXY"e7S< b3ּ'fמ}0,]9"ط:G-VVkbRſZ>I5Zσ]gW pG9q9~FLU% M{~rUmys5sƾa9, ʝ*ɊךгMK?4 ',>Wf9_y56~qK^mh-(7gNzmo(#}:s/cmϣP#HcץLb sNՍD/2`? ҳqL|Wڿ9 WPĢDd[9rb@ՉHGAs4"I1%uj~U@y!0 sW9ZΨp9۸}(__Ww'kwHa \;3nN`j[ <#G /K*`t,^ޣO/te +6+>ifd"M{((Ԙ_ `1C`9kH2$&DlӗtJ2ȰvJ=Ż>U*Zxx0p+=A(CJυ;Sf_<J-<[ƻrl{̾8DfOzό*]vHLWHm*m( 03΅v@hLC.emgg;C[oWrqYrR J8ɮ"9*p.r|/kxnσrS')~ M{7O[u$#Ft?0FBq/n= 0Ot_J]tZm7L88}WDhB؞۞e֘V Vk E47mqf&A3_< Jͬ$t+,X8]ԩ}RGzoc-9܋u`ġa~Q6Z!;oםR=&!{{*C}+ ꅍ a]Ӱ$ik~ {1ԥ\=EEʇ{wr;YdeDi,;9*-1| Ҕ^?§@!orwh?=/ )CΚERvh iJ- ,)FleBFZ!H{?={o@A28xߔ٭m@n1Ga`p1rpKT-V6W">Q/pl^Tpψĉ?rYтZmDa[owNJ&CO`~ˢgfeȃ'YUTkv /,$ .P猲SYw1q%bvӺo\5-v0󎣷>ܹ6/9^0d|T mʓٵA .W,'xNvPJ~&FQSdJ]Eү|*vƞYZ2zkȅމ\bЎRJXXY2)PR_sJ$BQ5>0#L[IRx1Z+T6 *$ƜiYp\$ ѩ[QgBu-f;O.siں!©SبP)#3 ʻﯷssN+rg?_TWd*X͝#Kµ~e#f9PĈeUQI=uº|'ﮕ/Ӽ"4"Db@^fGz[boZxcholp6zFIx6-|S[`_u@T]C >x.kv„od\͑&C[';by @A*=Ň pD CPp|6yWxa 5!ryh]h@8uqTYVbn?p1S$ErEZb 'WpPG+(ajj+C0M I#/Y'q2!s r葊FlAiS%!mh7n,AǠl@4k-xɞx+c?O8|6Wm[t<3!:f33](Bhx:*qaDWS [ǖvyD1 mbB{Pu="[bh$3pAWƤZ6 fӧ'bx1 pkXģGC> .bdb#K=|49V]lt̑E T%Qra8cڏJ=(("KXyDAh⽶H,[̼e{Cހz,֝%JOW(#voA\^*ȸD|}y;ܞf&L+e>$Dy9?wO;{ezL`.yrףoͮ>`&=|fWnN5GWV\$iDu$BY^+rM+μo00pw:sK Pd@j"JD}ebYOO,pՠ6gGcm\Eq"IĉPBn BrR{T!,-bzpxS[OM ڞ5%rүkn/56 6:t~I'ҏ6,.uCх|+ ͧP,/}V@N~': rYcg !m qq+x{PTp0fɃcCbk7ઐF)Nw3 [,J/pq/ )9 9zBuF̣*8, pW\C%Gw۲<+DSzj]PFL.@_'YqzYZX14\BC>Ӛr/2.BgR(ud&#po͢Gj]: v1F9/nF8+xڂ_Ƕ<ܟ &3^#J Ns@MW떖k`SH b!1U_):t]R®bu[FWfLV9%WKahV/gbZR(L#߫P>n|2r,Ox0?Y -M_QY1|tmYAXeHMk%Ӳ9*˛t1".a2_|" .u`=;2g?vǦKZ)S3Hy.!"mB9zo,1~s{^ٙͧ2|+/ 2}뱂P7 |]*kdl]ZT7M {Sź; wRTL(tҊD} eNl"$%c~SkrU-i_9_V5xjmʾ6~3ˎl.\XtL"9 e m1GU LzFhW# DDKOZ ^bW=J`ۭhźp 7.v7{併Beؾ,7_ZäuFv^~e,_ @}[c(ڍF6N^:[ f}:~ ӷK ֈHm]]I̽ד ;MyDDXQ>Ց6wA`, Wfnx uPEv}%q/c ?lJ$UkVv]S?CL8gߖ֤6,y@j,(356huc(, tD^?r^MBAxϩ.Om?vi@w2Y(jPwHhn+J M("z>Q"tFm9ų7/[A[CɮYRH_,r5{P+Q:Bw*29DFk#B~^b- yJ=q=mnko=2m᰹ؾ{Jl9Wb9)fȫf j[ M]$;O8ۄ'#dM0x*m2-'ɫ{Dݶ219)!_G& FBOIV;~!q7TѪ\cjB_?T^%I\v5ċY gUdd*A=B?'F|NJ~+k!- Kv;WNaux|ñ:+M/>9DDhP)R{?~}&%Saz>F# ^H|6BRwl60׊q8h3vFj$1"UMqM 8sIr$r汑&|u}sI2Gx.Y2]'N^339,"I"*}_JQll/5ɭ;GિĉFm''B6 5It!]5@>O49{P6[鬢<\\:<\r| ʯ\D2`3ΕϪyG}$ƄR('#a- *`*uqIebq @ʖR͇1GǮۛrz%wq5$V'! -C3'L-Ug=Ʒ@i_ Inl$QvAfb)8m}=rDyF XRmo@\,]4Ifi8=8hS<>%kKxާ>YC<~E `l!q /y )uZ.t>O eb(ɸ35BdV]Cp+)bwBYJJ)~B鋬0.̘lU!C8#_2x@ "gjB0MpB#e:_{{QHS7)Z7DY-ˍjH\ֶLv:={U#{;#@jKv \m0 9Kt-E~FR,9kf`aiAGx] nAv\1X:UIbw%Ccb^%VL?HpmlDS=jju]^cC!YcnhR||)G ;8bNgdj:ϪoH̅ hS9^Yg`VSأE4OP<ٯbf?AŽ3ƾ{3q3mI4H[HpEFnMTClFG-l+-2 Q 9<+ըEF5I;Un9nwcܣ2] 8 R&[в\-69F=CJ-V(s. ~`Z5N^D䉑M/ dF5mɶ+mR%zs]V68Qʶ_`6&ι-K~Rva^sA~]?T#bnK7 p<0 V^[Q<SE>](8'u$Y8Sthu s+8"N+1E"a^Q.+xCXeRIey9:o8eі5+Yi1L%Q S܀Xfz2ȈNnEnlᚷY`{r㆕㴕Lo{ PZS:d&ilc(