硒网络驱动程序: 修改导航器。网络驱动程序标志,以防止硒检测

我试图自动化一个非常基本的任务在网站上使用硒和铬,但不知何故,网站检测时,铬是由硒驱动,并阻止每一个请求。我怀疑该网站是依赖于一个公开的 DOM 变量,如此一个 https://stackoverflow.com/a/41904453/648236,以检测硒驱动的浏览器。

我的问题是,有没有什么方法可以让导航器 webDriver 的标志变成假的?我愿意在进行修改之后尝试重新编译 selenium 源代码,但是我似乎在存储库 https://github.com/SeleniumHQ/selenium的任何地方都找不到 NavigatorAutomationInformation 源代码

非常感谢你的帮助

附注: 我还尝试了 https://w3c.github.io/webdriver/#interface中的以下内容

Object.defineProperty(navigator, 'webdriver', {
get: () => false,
});

但是它只在初始页面加载之后更新属性。我认为站点会在我的脚本执行之前检测到变量。

109910 次浏览

First the update 1

execute_cdp_cmd(): With the availability of execute_cdp_cmd(cmd, cmd_args) command now you can easily execute commands using Selenium. Using this feature you can modify the navigator.webdriver easily to prevent Selenium from getting detected.


Preventing Detection 2

To prevent Selenium driven WebDriver getting detected a niche approach would include either / all of the below mentioned steps:

  • Adding the argument --disable-blink-features=AutomationControlled

    from selenium import webdriver
    
    
    options = webdriver.ChromeOptions()
    options.add_argument('--disable-blink-features=AutomationControlled')
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://www.website.com")
    

You can find a relevant detailed discussion in Selenium can't open a second page

  • Rotating the through execute_cdp_cmd() command as follows:

    #Setting up Chrome/83.0.4103.53 as useragent
    driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
    
  • Change the property value of the navigator for webdriver to undefined

    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    
  • Exclude the collection of enable-automation switches

    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    
  • Turn-off useAutomationExtension

    options.add_experimental_option('useAutomationExtension', False)
    

Sample Code 3

Clubbing up all the steps mentioned above and effective code block will be:

from selenium import webdriver


options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.httpbin.org/headers')

History

As per the W3C Editor's Draft the current implementation strictly mentions:

The webdriver-active flag is set to true when the user agent is under remote control which is initially set to false.

Further,

Navigator includes NavigatorAutomationInformation;

It is to be noted that:

The NavigatorAutomationInformation interface should not be exposed on WorkerNavigator.

The NavigatorAutomationInformation interface is defined as:

interface mixin NavigatorAutomationInformation {
readonly attribute boolean webdriver;
};

which returns true if webdriver-active flag is set, false otherwise.

Finally, the navigator.webdriver defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, so that alternate code paths can be triggered during automation.

Caution: Altering/tweaking the above mentioned parameters may block the navigation and get the WebDriver instance detected.


Update (6-Nov-2019)

As of the current implementation an ideal way to access a web page without getting detected would be to use the ChromeOptions() class to add a couple of arguments to:

  • Exclude the collection of enable-automation switches
  • Turn-off useAutomationExtension

through an instance of ChromeOptions as follows:

  • Java Example:

    System.setProperty("webdriver.chrome.driver", "C:\\Utility\\BrowserDrivers\\chromedriver.exe");
    ChromeOptions options = new ChromeOptions();
    options.setExperimentalOption("excludeSwitches", Collections.singletonList("enable-automation"));
    options.setExperimentalOption("useAutomationExtension", false);
    WebDriver driver =  new ChromeDriver(options);
    driver.get("https://www.google.com/");
    
  • Python Example

    from selenium import webdriver
    
    
    options = webdriver.ChromeOptions()
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\path\to\chromedriver.exe')
    driver.get("https://www.google.com/")
    
  • Ruby Example

      options = Selenium::WebDriver::Chrome::Options.new
    options.add_argument("--disable-blink-features=AutomationControlled")
    driver = Selenium::WebDriver.for :chrome, options: options
    

Legends

1: Applies to Selenium's Python clients only.

2: Applies to Selenium's Python clients only.

3: Applies to Selenium's Python clients only.

Before (in browser console window):

> navigator.webdriver
true

Change (in selenium):

// C#
var options = new ChromeOptions();
options.AddExcludedArguments(new List<string>() { "enable-automation" });


// Python
options.add_experimental_option("excludeSwitches", ['enable-automation'])

After (in browser console window):

> navigator.webdriver
undefined

This will not work for version ChromeDriver 79.0.3945.16 and above. See the release notes here

Nowadays you can accomplish this with cdp command:

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})


driver.get(some_url)

by the way, you want to return undefined, false is a dead giveaway.

I would like to add a Java alternative to the cdp command method mentioned by pguardiario

Map<String, Object> params = new HashMap<String, Object>();
params.put("source", "Object.defineProperty(navigator, 'webdriver', { get: () => undefined })");
driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", params);

In order for this to work you need to use the ChromiumDriver from the org.openqa.selenium.chromium.ChromiumDriver package. From what I can tell that package is not included in Selenium 3.141.59 so I used the Selenium 4 alpha.

Also, the excludeSwitches/useAutomationExtension experimental options do not seem to work for me anymore with ChromeDriver 79 and Chrome 79.

ChromeDriver:

Finally discovered the simple solution for this with a simple flag! :)

--disable-blink-features=AutomationControlled

navigator.webdriver=true will no longer show up with that flag set.

For a list of things you can disable, check them out here

Finally this solved the problem for ChromeDriver, Chrome greater than v79.

ChromeOptions options = new ChromeOptions();
options.addArguments("--disable-blink-features");
options.addArguments("--disable-blink-features=AutomationControlled");
ChromeDriver driver = new ChromeDriver(options);
Map<String, Object> params = new HashMap<String, Object>();
params.put("source", "Object.defineProperty(navigator, 'webdriver', { get: () => undefined })");
driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", params);

To exclude the collection of enable-automation switches as mentioned in the 6-Nov-2019 update of the top voted answer doesn't work anymore as of April 2020. Instead I was getting the following error:

ERROR:broker_win.cc(55)] Error reading broker pipe: The pipe has been ended. (0x6D)

Here's what's working as of 6th April 2020 with Chrome 80.

Before (in the Chrome console window):

> navigator.webdriver
true

Python example:

options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")

After (in the Chrome console window):

> navigator.webdriver
undefined

If you use a Remote Webdriver , the code below will set navigator.webdriver to undefined.

work for ChromeDriver 81.0.4044.122

Python example:

    options = webdriver.ChromeOptions()
# options.add_argument("--headless")
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
driver = webdriver.Remote(
'localhost:9515', desired_capabilities=options.to_capabilities())
script = '''
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
'''
driver.execute_script(script)

Do not use cdp command to change webdriver value as it will lead to inconsistency which later can be used to detect webdriver. Use the below code, this will remove any traces of webdriver.

options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")

As mentioned in the above comment - https://stackoverflow.com/a/60403652/2923098 the following option totally worked for me (in Java)-

ChromeOptions options = new ChromeOptions();
options.addArguments("--incognito", "--disable-blink-features=AutomationControlled");

For those of you who've tried these tricks, please make sure to also check that the user-agent that you are using is the user agent that corresponds to the platform (mobile / desktop / tablet) your crawler is meant to emulate. It took me a while to realize that was my Achilles heel ;)

Simple hack for python:

options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features=AutomationControlled")

Use --disable-blink-features=AutomationControlled to disable navigator.webdriver

Since this question is related to selenium a cross-browser solution to overriding navigator.webdriver is useful. This could be done by patching browser environment before any JS of target page runs, but unfortunately no other browsers except chromium allows one to evaluate arbitrary JavaScript code after document load and before any other JS runs (firefox is close with Remote Protocol).

Before patching we needed to check how the default browser environment looks like. Before changing a property we can see it's default definition with Object.getOwnPropertyDescriptor()

Object.getOwnPropertyDescriptor(navigator, 'webdriver');
// undefined

So with this quick test we can see webdriver property is not defined in navigator. It's actually defined in Navigator.prototype:

Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver');
// {set: undefined, enumerable: true, configurable: true, get: ƒ}

It's highly important to change the property on the object that owns it, otherwise the following can happen:

navigator.webdriver; // true if webdriver controlled, false otherwise
// this lazy patch is commonly found on the internet, it does not even set the right value
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
navigator.webdriver; // undefined
Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get.apply(navigator);
// true

A less naive patch would first target the right object and use right property definition, but digging deeper we can find more inconsistences:

const defaultGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
defaultGetter.toString();
// "function get webdriver() { [native code] }"
Object.defineProperty(Navigator.prototype, 'webdriver', {
set: undefined,
enumerable: true,
configurable: true,
get: () => false
});
const patchedGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
patchedGetter.toString();
// "() => false"

A perfect patch leaves no traces, instead of replacing getter function it would be good if we could just intercept the call to it and change the returned value. JavaScript has native support for that throught Proxy apply handler:

const defaultGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
defaultGetter.apply(navigator); // true
defaultGetter.toString();
// "function get webdriver() { [native code] }"
Object.defineProperty(Navigator.prototype, 'webdriver', {
set: undefined,
enumerable: true,
configurable: true,
get: new Proxy(defaultGetter, { apply: (target, thisArg, args) => {
// emulate getter call validation
Reflect.apply(target, thisArg, args);
return false;
}})
});
const patchedGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
patchedGetter.apply(navigator); // false
patchedGetter.toString();
// "function () { [native code] }"

The only inconsistence now is in the function name, unfortunately there is no way to override the function name shown in native toString() representation. But even so it can pass generic regular expressions that searches for spoofed browser native functions by looking for { [native code] } at the end of its string representation. To remove this inconsistence you can patch Function.prototype.toString and make it return valid native string representations for all native functions you patched.

To sum up, in selenium it could be applied with:

chrome.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': """
Object.defineProperty(Navigator.prototype, 'webdriver', {
set: undefined,
enumerable: true,
configurable: true,
get: new Proxy(
Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get,
{ apply: (target, thisArg, args) => {
// emulate getter call validation
Reflect.apply(target, thisArg, args);
return false;
}}
)
});
"""})

The playwright project maintains a fork of Firefox and WebKit to add features for browser automation, one of them is equivalent to Page.addScriptToEvaluateOnNewDocument, but there is no implementation for Python of the communication protocol but it could be implemented from scratch.

Python

I tried most of the stuff mentioned in this post and i was still facing issues. What saved me for now is https://pypi.org/project/undetected-chromedriver

pip install undetected-chromedriver




import undetected_chromedriver.v2 as uc
from time import sleep
from random import randint




driver = uc.Chrome()
driver.get('www.your_url.here')
driver.maximize_window()


sleep(randint(3,9))

A bit slow but i will take slow over non working.

I guess if every interested could go over the source code and see what provides the win there.