CSS 和 XPath 选择器之间的区别是什么?就跨浏览器测试的性能而言,哪一个更好呢?

我正在使用 Selenium WebDriver2.25.0开发一个多语言网络应用程序,主要测试页面内容(针对不同语言,如阿拉伯语、英语、俄语等)。

对于我的应用程序,根据性能哪个更好,并确保它应该支持所有的浏览器(即,Internet Explorer 789,Firefox,Chrome 等) ?

130147 次浏览

CSS selectors perform far better than XPath selectors, and it is well documented in Selenium community. Here are some reasons:

  • XPath engines are different in each browser, hence making them inconsistent
  • Internet Explorer does not have a native XPath engine, and therefore Selenium injects its own XPath engine for compatibility of its API. Hence we lose the advantage of using native browser features that WebDriver inherently promotes.
  • XPath expressions tend to become complex and hence make them hard to read in my opinion

However, there are some situations where you need to use an XPath selector, for example, searching for a parent element or searching element by its text (I wouldn't recommend the latter).

You can read blog from Simon here. He also recommends CSS over XPath.

If you are testing content, then do not use selectors that are dependent on the content of the elements. That will be a maintenance nightmare for every locale. Try talking with developers and use techniques that they used to externalize the text in the application, like dictionaries or resource bundles, etc. Here is my blog post that explains it in detail.

Thanks to parishodak, here is the link which provides the numbers proving that CSS performance is better.

I’m going to hold the unpopular on SO Selenium tag opinion that an XPath selector is preferable to a CSS selector in the long run.

This long post has two sections - first I'll put a back-of-the-napkin proof the performance difference between the two is 0.1-0.3 milliseconds (yes; that's 100 microseconds), and then I'll share my opinion why XPath is more powerful.


Performance difference

Let's first tackle "the elephant in the room" – that XPath is slower than CSS.

With the current CPU power (read: anything x86 produced since 2013), even on BrowserStack, Sauce Labs, and AWS VMs, and the development of the browsers (read: all the popular ones in the last five years) that is hardly the case.

The browser's engines have developed, the support of XPath is uniform, and Internet Explorer is out of the picture (hopefully for most of us). This comparison in the other answer is being cited all over the place, but it is very contextual – how many are running – or care about – automation against Internet Explorer 8?

If there is a difference, it is in a fraction of a millisecond.

Yet, most higher-level frameworks add at least 1 ms of overhead over the raw selenium call anyway (wrappers, handlers, state storing, etc.); my personal weapon of choice – Robot Framework – adds at least 2 ms, which I am more than happy to sacrifice for what it provides. A network round trip from an AWS US-East-1 to BrowserStack's hub is usually 11 milliseconds.

So with remote browsers, if there is a difference between XPath and CSS, it is overshadowed by everything else, in orders of magnitude.


The measurements

There are not that many public comparisons (I've really seen only the cited one), so – here's a rough single-case, dummy and simple one.

It will locate an element by the two strategies X times, and compare the average time for that.

The target – BrowserStack's landing page, and its "Sign Up" button; a screenshot of the HTML content as writing this post:

Enter image description here

Here's the test code (Python):

from selenium import webdriver
import timeit




if __name__ == '__main__':


xpath_locator = '//div[@class="button-section col-xs-12 row"]'
css_locator = 'div.button-section.col-xs-12.row'


repetitions = 1000


driver = webdriver.Chrome()
driver.get('https://www.browserstack.com/')


css_time = timeit.timeit("driver.find_element_by_css_selector(css_locator)",
number=repetitions, globals=globals())
xpath_time = timeit.timeit('driver.find_element_by_xpath(xpath_locator)',
number=repetitions, globals=globals())


driver.quit()


print("CSS total time {} repeats: {:.2f} s, per find: {:.2f} ms".
format(repetitions, css_time, (css_time/repetitions)*1000))
print("XPATH total time for {} repeats: {:.2f} s, per find: {:.2f} ms".
format(repetitions, xpath_time, (xpath_time/repetitions)*1000))

For those not familiar with Python – it opens the page, and finds the element – first with the CSS locator, then with the XPath locator; the find operation is repeated 1,000 times. The output is the total time in seconds for the 1,000 repetitions, and average time for one find in milliseconds.

The locators are:

  • for XPath – "a div element having this exact class value, somewhere in the DOM";
  • the CSS is similar – "a div element with this class, somewhere in the DOM".

It is deliberately chosen not to be over-tuned; also, the class selector is cited for the CSS as "the second fastest after an id".

The environment – Chrome v66.0.3359.139, ChromeDriver v2.38, CPU: ULV Core M-5Y10 usually running at 1.5 GHz (yes, a "word-processing" one, not even a regular Core i7 beast).

Here's the output:

CSS total time 1000 repeats: 8.84 s, per find: 8.84 ms


XPath total time for 1000 repeats: 8.52 s, per find: 8.52 ms

Obviously, the per find timings are pretty close; the difference is 0.32 milliseconds. Don't jump "the XPath selector is faster" – sometimes it is, but sometimes it's CSS.


Let's try with another set of locators. It is a tiny-bit more complicated—an attribute having a substring (common approach at least for me, going after an element's class when a part of it bears functional meaning):

xpath_locator = '//div[contains(@class, "button-section")]'
css_locator = 'div[class~=button-section]'

The two locators are again semantically the same – "find a div element having in its class attribute this substring".

Here are the results:

CSS total time 1000 repeats: 8.60 s, per find: 8.60 ms


XPath total time for 1000 repeats: 8.75 s, per find: 8.75 ms

A difference of 0.15 ms.


As an exercise—the same test as done in the linked blog in the comments/other answer—the test page is public, and so is the testing code.

They are doing a couple of things in the code - clicking on a column to sort by it, then getting the values, and checking the UI sort is correct.

I'll cut it - just get the locators, after all - this is the root test, right?

The same code as above, with these changes in:

  • The URL is now http://the-internet.herokuapp.com/tables; there are two tests.

  • The locators for the first one - "Finding Elements By ID and Class" - are:

css_locator = '#table2 tbody .dues'
xpath_locator = "//table[@id='table2']//tr/td[contains(@class,'dues')]"

And here is the outcome:

CSS total time 1000 repeats: 8.24 s, per find: 8.24 ms


XPath total time for 1000 repeats: 8.45 s, per find: 8.45 ms

A difference of 0.2 milliseconds.

The "Finding Elements By Traversing":

css_locator = '#table1 tbody tr td:nth-of-type(4)'
xpath_locator = "//table[@id='table1']//tr/td[4]"

The result:

CSS total time 1000 repeats: 9.29 s, per find: 9.29 ms


XPath total time for 1000 repeats: 8.79 s, per find: 8.79 ms

This time it is 0.5 ms (in reverse, XPath turned out "faster" here).

So five years later (better browsers engines) and focusing only on the locators performance (no actions like sorting in the UI, etc), the same testbed - there is practically no difference between CSS and XPath.


So, out of XPath and CSS, which of the two to choose for performance? The answer is simple – choose locating by id.

Long story short, if the id of an element is unique (as it's supposed to be according to the specifications), its value plays an important role in the browser's internal representation of the DOM, and thus is usually the fastest.

Yet, unique and constant (e.g. not auto-generated) ids are not always available, which brings us to "why XPath if there's CSS?"


The XPath advantage

With the performance out of the picture, why do I think XPath is better? Simple – versatility, and power.

XPath is a language developed for working with XML documents; as such, it allows for much more powerful constructs than CSS.

For example, navigation in every direction in the tree—find an element, then go to its grandparent and search for a child of it having certain properties. It allows embedded boolean conditions—cond1 and not(cond2 or not(cond3 and cond4)); embedded selectors —"find a div having these children with these attributes, and then navigate according to it".

XPath allows searching based on a node's value (its text)—however frowned upon this practice is. It does come in handy especially in badly structured documents (no definite attributes to step on, like dynamic ids and classes - locate the element by its text content).

The stepping in CSS is definitely easier—one can start writing selectors in a matter of minutes; but after a couple of days of usage, the power and possibilities XPath has quickly overcomes CSS.

And purely subjective – a complex CSS expression is much harder to read than a complex XPath expression.

Outro ;)

Finally, again very subjective - which one should we chose?

IMO, there isn’t any right or wrong choice—they are different solutions to the same problem, and whatever is more suitable for the job should be picked.

Being "a fan" of XPath I'm not shy to use in my projects a mix of both - heck, sometimes it is much faster to just throw a CSS one, if I know it will do the work just fine.

The debate between cssSelector vs XPath would remain as one of the most subjective debate in the Selenium Community. What we already know so far can be summarized as:

  • People in favor of cssSelector say that it is more readable and faster (especially when running against Internet Explorer).
  • While those in favor of XPath tout it's ability to transverse the page (while cssSelector cannot).
  • Traversing the DOM in older browsers like IE8 does not work with cssSelector but is fine with XPath.
  • XPath can walk up the DOM (e.g. from child to parent), whereas cssSelector can only traverse down the DOM (e.g. from parent to child)
  • However not being able to traverse the DOM with cssSelector in older browsers isn't necessarily a bad thing as it is more of an indicator that your page has poor design and could benefit from some helpful markup.
  • Ben Burton mentions you should use cssSelector because that's how applications are built. This makes the tests easier to write, talk about, and have others help maintain.
  • Adam Goucher says to adopt a more hybrid approach -- focusing first on IDs, then cssSelector, and leveraging XPath only when you need it (e.g. walking up the DOM) and that XPath will always be more powerful for advanced locators.

Dave Haeffner carried out a test on a page with two HTML data tables, one table is written without helpful attributes (ID and Class), and the other with them. I have analyzed the test procedure and the outcome of this experiment in details in the discussion Why should I ever use cssSelector selectors as opposed to XPath for automated testing?. While this experiment demonstrated that each Locator Strategy is reasonably equivalent across browsers, it didn't adequately paint the whole picture for us. Dave Haeffner in the other discussion Css Vs. X Path, Under a Microscope mentioned, in an an end-to-end test there were a lot of other variables at play Sauce startup, a page with two HTML data tables0, and a page with two HTML data tables1 to and from the application under test. The unfortunate takeaway from that experiment could be that one driver may be faster than the other (e.g. a page with two HTML data tables2 vs a page with two HTML data tables3), when in fact, that's wasn't the case at all. To get a real taste of what the performance difference is between a page with two HTML data tables4 and a page with two HTML data tables5, we needed to dig deeper. We did that by running everything from a local machine while using a performance benchmarking utility. We also focused on a specific Selenium action rather than the entire test run, and run things numerous times. I have analyzed the specific test procedure and the outcome of this experiment in details in the discussion a page with two HTML data tables7. But the tests were still missing one aspect i.e. more browser coverage (e.g., Internet Explorer 9 and 10) and testing against a larger and deeper page.

Dave Haeffner in another discussion Css Vs. X Path, Under a Microscope (Part 2) mentions, in order to make sure the required benchmarks are covered in the best possible way we need to consider an example that demonstrates a large and deep page.


Test SetUp

To demonstrate this detailed example, a Windows XP virtual machine was setup and Ruby (1.9.3) was installed. All the available browsers and their equivalent browser drivers for Selenium was also installed. For benchmarking, Ruby's standard lib benchmark was used.


Test Code

require_relative 'base'
require 'benchmark'


class LargeDOM < Base


LOCATORS = {
nested_sibling_traversal: {
css: "div#siblings > div:nth-of-type(1) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3) > div:nth-of-type(3)",
xpath: "//div[@id='siblings']/div[1]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]/div[3]"
},
nested_sibling_traversal_by_class: {
css: "div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1 > div.item-1",
xpath: "//div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]/div[contains(@class, 'item-1')]"
},
table_header_id_and_class: {
css: "table#large-table thead .column-50",
xpath: "//table[@id='large-table']//thead//*[@class='column-50']"
},
table_header_id_class_and_direct_desc: {
css: "table#large-table > thead .column-50",
xpath: "//table[@id='large-table']/thead//*[@class='column-50']"
},
table_header_traversing: {
css: "table#large-table thead tr th:nth-of-type(50)",
xpath: "//table[@id='large-table']//thead//tr//th[50]"
},
table_header_traversing_and_direct_desc: {
css: "table#large-table > thead > tr > th:nth-of-type(50)",
xpath: "//table[@id='large-table']/thead/tr/th[50]"
},
table_cell_id_and_class: {
css: "table#large-table tbody .column-50",
xpath: "//table[@id='large-table']//tbody//*[@class='column-50']"
},
table_cell_id_class_and_direct_desc: {
css: "table#large-table > tbody .column-50",
xpath: "//table[@id='large-table']/tbody//*[@class='column-50']"
},
table_cell_traversing: {
css: "table#large-table tbody tr td:nth-of-type(50)",
xpath: "//table[@id='large-table']//tbody//tr//td[50]"
},
table_cell_traversing_and_direct_desc: {
css: "table#large-table > tbody > tr > td:nth-of-type(50)",
xpath: "//table[@id='large-table']/tbody/tr/td[50]"
}
}


attr_reader :driver


def initialize(driver)
@driver = driver
visit '/large'
is_displayed?(id: 'siblings')
super
end


# The benchmarking approach was borrowed from
# http://rubylearning.com/blog/2013/06/19/how-do-i-benchmark-ruby-code/
def benchmark
Benchmark.bmbm(27) do |bm|
LOCATORS.each do |example, data|
data.each do |strategy, locator|
bm.report(example.to_s + " using " + strategy.to_s) do
begin
ENV['iterations'].to_i.times do |count|
find(strategy => locator)
end
rescue Selenium::WebDriver::Error::NoSuchElementError => error
puts "( 0.0 )"
end
end
end
end
end
end


end

Results

NOTE: The output is in seconds, and the results are for the total run time of 100 executions.

In Table Form:

css_xpath_under_microscopev2

In Chart Form:

  • Chrome:

chart-chrome

  • Firefox:

chart-firefox

  • Internet Explorer 8:

chart-ie8

  • Internet Explorer 9:

chart-ie9

  • Internet Explorer 10:

chart-ie10

  • Opera:

chart-opera


Analyzing the Results

  • Chrome and Firefox are clearly tuned for faster cssSelector performance.
  • Internet Explorer 8 is a grab bag of cssSelector that won't work, an out of control XPath traversal that takes ~65 seconds, and a 38 second table traversal with no cssSelector result to compare it against.
  • In IE 9 and 10, XPath is faster overall. In Safari, it's a toss up, except for a couple of slower traversal runs with XPath. And across almost all browsers, the nested sibling traversal and table cell traversal done with XPath are an expensive operation.
  • These shouldn't be that surprising since the locators are brittle and inefficient and we need to avoid them.

Summary

  • Overall there are two circumstances where XPath is markedly slower than cssSelector. But they are easily avoidable.
  • The performance difference is slightly in favor of for non-IE browsers and slightly in favor of for IE browsers.

Trivia

You can perform the bench-marking on your own, using this library where Dave Haeffner wrapped up all the code.