使用 XPath,如何根据节点的文本内容和属性值选择节点?

考虑到这种 XML:

<DocText>
<WithQuads>
<Page pageNumber="3">
<Word>
July
<Quad>
<P1 X="84" Y="711.25" />
<P2 X="102.062" Y="711.25" />
<P3 X="102.062" Y="723.658" />
<P4 X="84.0" Y="723.658" />
</Quad>
</Word>
<Word>
</Word>
<Word>
30,
<Quad>
<P1 X="104.812" Y="711.25" />
<P2 X="118.562" Y="711.25" />
<P3 X="118.562" Y="723.658" />
<P4 X="104.812" Y="723.658" />
</Quad>
</Word>
</Page>
</WithQuads>

我希望找到文本为‘ July’且 Quad/P1/X 属性大于90的节点。因此,在这种情况下,它不应该返回任何匹配。但是,如果使用 GT (>)或 LT (<) ,则会在第一个 Word 元素上获得匹配。如果使用 eq (=) ,则不会得到匹配。

所以:

//Word[text()='July' and //P1[@X < 90]]

将会回归真实,就像以前一样

//Word[text()='July' and //P1[@X > 90]]

如何在 P1@X 属性上正确地约束它?

此外,假设我有多个 Page 元素,用于不同的页码。如何额外约束上面的搜索,以找到具有 text()='July', P1@X < 90和 Page@pageNumber=3的节点?

103226 次浏览

Generally I would consider the use of an unprefixed // as a bad smell in an XPath.

Try this:-

/DocText/WithQuads/Page/Word[text()='July' and Quad/P1/@X > 90]

Your problem is that you use the //P1[@X < 90] which starts back at the beginning of the document and starts hunting any P1 hence it will always be true. Similarly //P1[@X > 90] is always true.

Apart form the "//" issue, this XML is a very weird use of mixed content. The predicate text()='July' will match the element if any child text node is exactly equal to July, which isn't true in your example because of surrounding whitespace. Depending on the exact definition of the source XML, I would go for [text()[normalize-space(.)='July'] and Quad/P1/@X > 90]