Java: Most efficient method to iterate over all elements in a org.w3c.dom.Document?

What is the most efficient way to iterate through all DOM elements in Java?

Something like this but for every single DOM elements on current org.w3c.dom.Document?

for(Node childNode = node.getFirstChild(); childNode!=null;){
Node nextChild = childNode.getNextSibling();
// Do something with childNode, including move or delete...
childNode = nextChild;
}
186228 次浏览

Basically you have two ways to iterate over all elements:

1. Using recursion (the most common way I think):

public static void main(String[] args) throws SAXException, IOException,
ParserConfigurationException, TransformerException {


DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("document.xml"));
doSomething(document.getDocumentElement());
}


public static void doSomething(Node node) {
// do something with the current node instead of System.out
System.out.println(node.getNodeName());


NodeList nodeList = node.getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node currentNode = nodeList.item(i);
if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
//calls this method for all the children which is Element
doSomething(currentNode);
}
}
}

2. Avoiding recursion using getElementsByTagName() method with * as parameter:

public static void main(String[] args) throws SAXException, IOException,
ParserConfigurationException, TransformerException {


DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("document.xml"));


NodeList nodeList = document.getElementsByTagName("*");
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
// do something with the current element
System.out.println(node.getNodeName());
}
}
}

I think these ways are both efficient.
Hope this helps.

for (int i = 0; i < nodeList.getLength(); i++)

change to

for (int i = 0, len = nodeList.getLength(); i < len; i++)

to be more efficient.

The second way of javanna answer may be the best as it tends to use a flatter, predictable memory model.

I also stumbled over this problem recently. Here is my solution. I wanted to avoid recursion, so I used a while loop.

Because of the adds and removes in arbitrary places on the list, I went with the LinkedList implementation.

/* traverses tree starting with given node */
private static List<Node> traverse(Node n)
{
return traverse(Arrays.asList(n));
}


/* traverses tree starting with given nodes */
private static List<Node> traverse(List<Node> nodes)
{
List<Node> open = new LinkedList<Node>(nodes);
List<Node> visited = new LinkedList<Node>();


ListIterator<Node> it = open.listIterator();
while (it.hasNext() || it.hasPrevious())
{
Node unvisited;
if (it.hasNext())
unvisited = it.next();
else
unvisited = it.previous();


it.remove();


List<Node> children = getChildren(unvisited);
for (Node child : children)
it.add(child);


visited.add(unvisited);
}


return visited;
}


private static List<Node> getChildren(Node n)
{
List<Node> children = asList(n.getChildNodes());
Iterator<Node> it = children.iterator();
while (it.hasNext())
if (it.next().getNodeType() != Node.ELEMENT_NODE)
it.remove();
return children;
}


private static List<Node> asList(NodeList nodes)
{
List<Node> list = new ArrayList<Node>(nodes.getLength());
for (int i = 0, l = nodes.getLength(); i < l; i++)
list.add(nodes.item(i));
return list;
}