使用无头 Chrome 网络驱动程序运行 Selenium

所以我尝试了一些含硒的东西,我真的希望它能快一点。

所以我的想法是,运行它与无头铬将使我的脚本更快。

首先,这个假设是正确的,还是说我用一个无头驱动程序运行我的脚本并不重要?

无论如何,我仍然想让它工作,运行无头,但我不知道为什么不能,我尝试了不同的东西,大多数建议,它将工作在这里说,在十月更新

如何配置 ChromeDriver 通过 Selenium 在 Headless 模式下启动 Chrome 浏览器?

但是当我尝试这样做时,我得到了奇怪的控制台输出,但它似乎仍然不能工作。

感谢你的提示。

235061 次浏览

To run chrome-headless just add --headless via chrome_options.add_argument, i.e.:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
# chrome_options.headless = True # also works
driver = webdriver.Chrome(options=chrome_options)
start_url = "https://duckgo.com"
driver.get(start_url)
print(driver.page_source.encode("utf-8"))
# b'<!DOCTYPE html><html xmlns="http://www....
driver.quit()

So my thought is that running it with headless chrome would make my script faster.

Try using chrome options like --disable-extensions or --disable-gpu and benchmark it, but I wouldn't count with much improvement.


References: headless-chrome

If you are using Linux environment, may be you have to add --no-sandbox as well and also specific window size settings. The --no-sandbox flag is no needed on Windows if you set user container properly.

Use --disable-gpu only on Windows. Other platforms no longer require it. The --disable-gpu flag is a temporary work around for a few bugs.

//Headless chrome browser and configure
WebDriverManager.chromedriver().setup();
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--no-sandbox");
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("disable-gpu");
//          chromeOptions.addArguments("window-size=1400,2100"); // Linux should be activate
driver = new ChromeDriver(chromeOptions);
from time import sleep


from selenium import webdriver
from selenium.webdriver.chrome.options import Options


chrome_options = Options()
chrome_options.add_argument("--headless")


driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
url = "https://stackoverflow.com/questions/53657215/running-selenium-with-headless-chrome-webdriver"
driver.get(url)


sleep(5)


h1 = driver.find_element_by_xpath("//h1[@itemprop='name']").text
print(h1)

Then I run script on our local machine

➜ python script.py
Running Selenium with Headless Chrome Webdriver

It is working and it is with headless Chrome.

Todo (tested on headless server Debian Linux 9.4):

  1. Do this:

    # install chrome
    curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
    echo "deb [arch=amd64]  http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
    apt-get -y update
    apt-get -y install google-chrome-stable
    
    
    # install chrome driver
    wget https://chromedriver.storage.googleapis.com/77.0.3865.40/chromedriver_linux64.zip
    unzip chromedriver_linux64.zip
    mv chromedriver /usr/bin/chromedriver
    chown root:root /usr/bin/chromedriver
    chmod +x /usr/bin/chromedriver
    
  2. Install selenium:

    pip install selenium
    

    and run this Python code:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    options = Options()
    options.add_argument("no-sandbox")
    options.add_argument("headless")
    options.add_argument("start-maximized")
    options.add_argument("window-size=1900,1080");
    driver = webdriver.Chrome(chrome_options=options, executable_path="/usr/bin/chromedriver")
    driver.get("https://www.example.com")
    html = driver.page_source
    print(html)
    

Install & run containerized Chrome:

docker pull selenium/standalone-chrome
docker run --rm -d -p 4444:4444 --shm-size=2g selenium/standalone-chrome

Connect using webdriver.Remote:

driver = webdriver.Remote('http://localhost:4444/wd/hub', DesiredCapabilities.CHROME)
driver.set_window_size(1280, 1024)
driver.get('https://www.google.com')

Once you have selenium and web driver installed. Below worked for me with headless Chrome on linux cluster :

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_experimental_option("prefs",{"download.default_directory":"/databricks/driver"})
driver = webdriver.Chrome(chrome_options=options)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r"C:\Program
Files\Google\Chrome\Application\chromedriver.exe", options=chrome_options)

This is ok for me.