What are XML namespaces for?

This is something that I always find a bit hard to explain to others: Why do XML namespaces exist? When should we use them and when should we not? What are the common pitfalls when working with namespaces in XML?

Also, how do they relate to XML schemas? Should XSD schemas always be associated with a namespace?

21620 次浏览

From the W3 recommendation...

XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.

They're for allowing multiple markup languages to be combined, without having to worry about conflicts of element and attribute names.

For example, look at any bit of XSLT code, and then think what would happen if you didn't use namespaces and were trying to write an XSLT where the output has to contain "template", "for-each", etc, elements. Syntax errors, is what.

I'll leave the advice and pitfalls to others with more experience than I.

Namespaces are used to disambiguate names that you use within the document. It also gives you the ability to bind a short name to a name space that can then be used to refer to a remote element or attribute. The name space itself refers to the location that defines the elements and attributes you use in the document. There is a lot more to know, but that is the heart of it. There is a lot more information here.

It's nearly the same as asking "why do we use packages for Java/C#?":

  • reusability: You can reuse a set of tags/attributes you define across different types of xml documents.
  • modularity: If you need to add some "aspect" to your XML; adding a namespace to your xml document is simpler than changing your whole xml schema definition.
  • Avoid poluting the "main" namespace: You don't force your parser to work with a huge schema definition, just use the namespace you need to.

For example: XML Namespaces by Example

In my words: If you must use some XML format for external company ( for example ) and you need provide in XML document some informations, which has same name, you need a namespace. Example:

<sampleDoc>
<header title="Hello world!">
<items>
<item name="Volvo" color="Blue"/>
</items>
</header>
</sampleDoc>

and you want merge some data into this document, which has a same name, but another sense ( so value to ), you should use a namespace:

<sampleDoc>
<header title="Hello world!">
<items>
<item name="Volvo" color="White" my_unique_namespace:color="#FFFFFF"/>
</items>
</header>
</sampleDoc>

Ofcourse - you can change a name of attribute. For example to "my_unique_color". Bud in another document, there can be attribute with same name again. So, if you have a unique namespace ( our web domain for example ), you can always use the same names of elements and/or attributes withoud any problems.

Think of them as surnames for element types. If you've got two friends, both called Bob, and you are talking about one of them, somebody might ask which Bob you are talking about. Just saying "Bob" isn't very helpful, so you say "Bob Smith", or "Bob Jones".

It's the same with element types. Sometimes a short name isn't enough, because different people can pick the same name. So you include a URI as a "surname", to distinguish between the different Bobs out there.

XML is a super-language, meaning that it is the basis for any XML-based language (makes sense, right?). Think of XML as a pen that can write any sentence, in any language. It all depends on the writer, and preferably the language should be known to the reader.

An XML namespace is basically the name of the language, much like "English" or "עברית". I helps the recipient of the XML document to parse it and extract the information within.

Let's say that I have a furniture factory and you have a furniture store. your storage application and my supply application are completely unrelated, but when they communicate through XML messages, the messages should be understandable and easily parsed by both sides

Therefore, both systems need to know the Schema, which defines the language syntax and agreed restrictions. Think of the schema as the dictionary and grammar textbook. The schema is the document that both systems should know, that whomever writes the parsing code in each system must know, and that includes the declaration of the namespace.

Each namespace is named as a URI, which in most cases is the location of the schema document that defines it.

Of course, not every XML document needs a namespace, especially when it is not used to convey information to a remote system. For example, when you serialize objects into XML for persisting in your database.

The biggest pitfall IMHO is human-interaction interpreting documents e.g. to develop code to process an XML Doc. It is too easy to focus on the literal expression of the document rather than the infoset result of parsing the document.

e.g. the following nodes

<a xmlns="uri:foo"/>
<foo:a xmlns:foo="uri:foo"/>
<bar:a xmlns:bar="uri:foo"/>

are all semantically identical - yet very different to the naive eye.

The 1st example yields a very common mistake developing XPaths - missing the fact that "a" is in a namespace - thus //a yields no matches. (or worse still matching nodes in a different namespace!)

The 3rd example opens another flaw in understanding - that the prefix text is semantically significant. When parsing documents with XPATH I can declare any prefix I like for matching as long as it's uri matches those of the document.

We use namespaces because people xeep wanting to use the same words to mean different things in their own private idaho. Usually, you can determine from context what a person means. In a personnel database, the XML is personnel records. In a vehicle registry database, the XML is vehicle registry records.

Both keep a tag named "location", but the tag means different things to each and contains different fields.

Now, that's cool: but what if you need or want to store XML from both in the same database? Or, more interestingly, what if both databases want to store XML chunks from some other, common database (eg: an Accounts database).

XML namespaces associates with each XML tag a URI, such that the tag name itself has a url in front of it, that's part of the tag name (of course, actual XML documents use a shorthand do do this). By carefully choosing the URI, its easy to be confident that the tag names wont collide - it's as if the two location tags were named entirely differently, so there's no confusion. As a bonus, the two entirely different location tags can include stuff from the accounts database, and explicitly state that they are talking about the same thing.

The thing that makes all this useful is XPATH.

With the above, you can start to write XPATH expressions that say things like: find me any accounts:account overdue sections anywhere in this xml. Or: find me any accounts:warning message items anywhere in this particular chunk of XML, where the warning message is a child node (however deep) of either a personnel:payment node or a vehicle:status node.

That XPATH expression might be used somewhere in an XSLT document, whose job it is to convert the XML into XHTML or XPDF, for display.

What's the payoff? Why do it? Because you can search the XML logfile, pull out all the accounts overdue messages wherever they appear, without confusing them with "message" tags produced by other systems, convert 'em to xhtml, and display them in bold red via a css tag: all without writing a scrap of procedural code.

Why do XML namespaces exist?

Because, back in 1997, some very influential persons in the W3C wanted them, and would not take no for an answer. Even when it was demonstrated, I dare say conclusively, that there were better ways to solve the "problem" they thought they had, they still wielded their influence to have their desires written up into a W3C Recommendation.

The biggest whopper in the by now extensive mythology surrounding XML Namespaces is that there is technical merit to them. (This is the downstream effect of a Recommendation simply existing and thus occupying mindspace - "gee, there's gotta be a (good) reason!" - as opposed to a forgetable footnote somewhere.)

Much pain, no gain.

When should we use them and when should we not?

You should never use them if you can help it. Unfortunately, the relentless promotion of this BAD[*] device by interested parties has fostered a clusterf*ck of specs today that make it practically impossible not to have to contend with XML namespaces at some point or another. So, even if you eschew XML namespaces yourself, you will find namespace-encrusted crud coming at you from all directions, or worse, toolsets that simply refuse to work unless you feed them such crud.

What are the common pitfalls when working with namespaces in XML?

One very common pitfall is in using Xpath expressions with documents where a namespace has been "defaulted": the namespace will have to be explicit in the expressions. Another issue is using them "correctly" when constructing documents: they create problems out of thin air.

Also, how do they relate to XML schemas? Should XSD schemas always be associated with a namespace?

There is no necessary relation, except that the XSD Schema spec was developed at a time when just about everyone on the committee had the XML Namespaces bit in their teeth. So they worked it in as deeply as they could. It's possible, nevertheless, to use XSD schemas without namespaces, but it's a steep uphill slog as just about every toolset supporting XSD schemas assumes that you will be "wanting" to use namespaces.

[*] BAD = Broken As Designed

UPDATE: An old essay on this non-solution to a non-problem.