我应该在 XML 中使用元素还是属性?

我正在学习 来自 W3Schools 的 XML 属性

作者提到了以下几点(强调我的观点) :

XML 元素与属性

<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

在第一个例子中,sex 是一个属性。最后,性是一个元素。两个示例提供了相同的信息。

没有关于何时使用属性和何时使用元素的规则

避免 XML 属性?

使用属性的一些问题是:

  • 属性不能包含多个值(元素可以)
  • 属性不能包含树结构(元素可以)
  • 属性不容易扩展(对于未来的更改)

属性很难阅读和维护。对数据使用元素。对与数据无关的信息使用属性。

那么,作者的观点是一个著名的观点,还是 XML 中的最佳实践呢?

应该避免 XML 中的属性吗?

W3学校还提到了以下内容(重点是我的内容) :

元数据的 XML 属性

有时,ID 引用被分配给元素。这些 ID 可以用来标识 XML 元素,其方式与 HTML 中的 ID 属性非常相似。这个例子说明了这一点:

<messages>
<note id="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note id="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not</body>
</note>
</messages>

上面的 ID 只是一个标识符,用于标识不同的注释。这不是音符本身的一部分。

这里我想说的是,元数据(关于数据的数据)应该存储为属性,而数据本身应该存储为元素。

58187 次浏览

The author's points are correct (except that attributes may contain a list of values). The question is whether or not you care about his points.

It's up to you.

Usage of attributes or elements is usually decided by the data you are trying to model.

For instance, if a certain entity is PART of the data, then it is advisable to make it an element. For example the name of the employee is an essential part of the employee data.

Now if you want to convey METADATA about data (something that provides additional information about the data) but is not really part of the data, then it is better to make it an attribute. For instance, lets say each employee has a GUID needed for back end processing, then making it an attribute is better.(GUID is not something that conveys really useful information to someone looking at the xml, but might be necessary for other purposes)

There is no rule as such that says something should be an attribute or a element.

Its not necessary to AVOID attributes at all costs..Sometimes they are easier to model, than elements. It really depends on the data you are trying to represent.

It all depends on what XML is used for. When it's mostly interop between software and machines - such as Web services it's easier to go all-elements if only for the sake of consistency (and also some frameworks prefer it that way, e.g. WCF). If it is targeted for human consumption - i.e. primarily created and/or read by people - then judicious use of attributes can improve readability quite a lot; XHTML is a reasonable example of that, and also XSLT and XML Schema.

I usually work on the basis that attributes are metadata - that is, data about the data. One thing I do avoid is putting lists in attributes. e.g.

attribute="1 2 3 7 20"

Otherwise you have an extra level of parsing to extract each element. If XML provides the structure and tools for lists, then why impose another yourself.

One scenario where you may want to code in preference for attributes is for processing speed via a SAX parser. Using a SAX parser you will get an element call back containing the element name and the list of attributes. If you had used multiple elements instead then you'll get multiple callbacks (one for each element). How much of a burden/timesink this is is up for debate of course, but perhaps worth considering.

You could probably see the issue in a semantic way.

If the data is more tight linked with the element it would be an attribute.

i.e: an ID of an element, i would put it as an attribute of the element.

But it's true that while parsing a document attributes could cause more headaches than elements.

All depends on you, and how you design your Schema.

It's because of that kind of rubbish that you should avoid w3schools. If anything, that's even worse than the appalling stuff they have about JavaScript.

As a general rule, I would suggest that content - that is, data which are expected to be consumed by an end user (whether that be a human reading, or a machine receiving information for processing) - is best contained within an element. Metadata - for example an ID associated with a piece of content but only of value for internal use rather than for display to the end user - should be in an attribute.

Not least important is that putting things in attributes makes for less verbose XML.

Compare

<person name="John" age="23" sex="m"/>

Against

<person>
<name>
John
</name>
<age>
<years>
23
</years>
</age>
<sex>
m
</sex>
</person>

Yes, that was a little biased and exaggerated, but you get the point

Attributes model mapping. A set of attributes on an element isomorphizes directly onto a name/value map in which the values are text or any serializable value type. In C#, for instance, any Dictionary<string, string> object can be represented as an XML attribute list, and vice versa.

This is emphatically not the case with elements. While you can always transform a name/value map into a set of elements, the reverse is not the case, e.g.:

<map>
<key1>value</key1>
<key1>another value</key1>
<key2>a third value</key2>
</map>

If you transform this into a map, you'll lose two things: the multiple values associated with key1, and the fact that key1 appears before key2.

The significance of this becomes a lot clearer if you look at DOM code that's used to update information in a format like this. For instance, it's trivial to write this:

foreach (string key in map.Keys)
{
mapElement.SetAttribute(key, map[key]);
}

That code is concise and unambiguous. Contrast it with, say:

foreach (string key in map.Keys)
{
keyElement = mapElement.SelectSingleNode(key);
if (keyElement == null)
{
keyElement = mapElement.OwnerDocument.CreateElement(key);
mapElement.AppendChild(keyElement);
}
keyElement.InnerText = value;
}

Here's another thing to keep in mind when deciding on an XML format: If I recall correctly, the values of "id" attributes must not be all numeric, they must meet the rules for names in XML. And of course the values must be unique. I have a project that must process files that don't meet these requirements (although they are clean XML in other respects), which made processing the files more convoluted.

You can't put a CDATA in an attribute. In my experience, sooner or later you are going to want to put single quotes, double quotes and/or entire XML documents into a "member", and if it's an attribute you're going to be cursing at the person who used attributes instead of elements.

Note: my experience with XML mainly involved cleaning up other peoples'. These people seemed to follow the old adage "XML is like violence. If using it hasn't solved your problem, then you haven't used enough."

I've used Google to search for the exact question. First I landed on this article, Principles of XML design - When to use elements versus attributes. Though, it felt too long for a simple question as such. Anyhow, I've read through all the answers on this topic and didn't find a satisfactory summary. As such, I went back to the latter article. Here is a summary:

When do I use elements and when do I use attributes for presenting bits of information?

  • If the information in question could be itself marked up with elements, put it in an element.
  • If the information is suitable for attribute form, but could end up as multiple attributes of the same name on the same element, use child elements instead.
  • If the information is required to be in a standard DTD-like attribute type such as ID, IDREF, or ENTITY, use an attribute.
  • If the information should not be normalized for white space, use elements. (XML processors normalize attributes in ways that can change the raw text of the attribute value.)

Principle of core content

If you consider the information in question to be part of the essential material that is being expressed or communicated in the XML, put it in an element. If you consider the information to be peripheral or incidental to the main communication, or purely intended to help applications process the main communication, use attributes.

Principle of structured information

If the information is expressed in a structured form, especially if the structure may be extensible, use elements. If the information is expressed as an atomic token, use attributes.

Principle of readability

If the information is intended to be read and understood by a person, use elements. If the information is most readily understood and digested by a machine, use attributes.

Principle of element/attribute binding

Use an element if you need its value to be modified by another attribute. [..] it is almost always a terrible idea to have one attribute modify another.

This is a short summary of the important bits from the article. If you wish to see examples and full description of every case, then refer to the original article.

This is an example where attributes are data about data.

Databases are named by their ID attribute.

The "type" attribute of the database denotes what is expected to be found inside the database tag.

  <databases>


<database id='human_resources' type='mysql'>
<host>localhost</host>
<user>usrhr</user>
<pass>jobby</pass>
<name>consol_hr</name>
</database>


<database id='products' type='my_bespoke'>
<filename>/home/anthony/products.adb</filename>
</database>


</databases>

My 0.02 five years after the OP is the exact opposite. Let me explain.

  1. Use elements when you're grouping similar data, and attributes of that data.
  2. Don't use elements for everything.
  3. If the data repeats (1 to many), it's probably an element
  4. If the data never repeats, and only makes sense when correlated to something else, it's an attribute.
  5. If data doesn't have other attributes (i.e. a name), then it's an attribute
  6. Group like elements together to support collection parsing (i.e. /xml/character)
  7. Re-use similar element names to support parsing data
  8. Never, ever, use numbers in element names to show position. (i.e. character1, character2) This practice makes it very hard to parse (see #6, parsing code must /character1, /character2, etc. not simply /character.

Considered another way:

  • Start by thinking of all your data as an attribute.
  • Logically group attributes into elements. If you know your data, you'll rarely need to convert attribute to an element. You probably already know when an element (collection, or repeated data) is necessary
  • Group elements together logically
  • When you run into the case the you need to expand, add new elements / attributes based on the logical structure an process above. Adding a new collection of child elements won't "break" your design, and will be easier to read over time.

For example, looking at a simple collection of books and major characters, the title won't ever have "children", it's a simple element. Every character has a name and age.

    <book title='Hitchhiker&apos;s Guide to the Galaxy' author='Douglas Adams'>
<character name='Zaphod Beeblebrox' age='100'/>
<character name='Arthur Dent' age='42'/>
<character name='Ford Prefect' age='182'/>
</book>


<book title='On the Road' author='Jack Kerouac'>
<character name='Dean Moriarty' age='30'/>
<character name='Old Bull Lee' age='42'/>
<character name='Sal Paradise' age='42'/>
</book>

You could argue that a book could have multiple authors. OK, just expand by adding new author elements (optionally remove the original @author). Sure, you've broken the original structure, but in practice it's pretty rare, and easy to work around. Any consumer of your original XML that assumed a single author will have to change anyway (they are likely changing their DB to move author from a column in the 'book' table to an 'author' table).

<book title='Hitchhiker&apos;s Guide to the Galaxy'>
<author name='Douglas Adams'/>
<author name='Some Other Guy'/>
<character name='Zaphod Beeblebrox' age='100'/>
<character name='Arthur Dent' age='42'>
<character name='Ford Prefect' age='182'/>
</book>