在新标签页 Selenium + Python 中打开 web

因此,我试图在我的 WebDriver 中打开新标签页上的网站。我想这样做,因为打开一个新的网页驱动程序每个网站需要约3.5秒使用 PhantomJS,我想更快..。

我使用了一个多进程 python 脚本,我想从每个页面中获取一些元素,所以工作流是这样的:

Open Browser


Loop throught my array
For element in array -> Open website in new tab -> do my business -> close it

但我找不到任何方法来实现这一点。

这是我用的代码。在不同的网站之间需要很长的时间,我需要它的快速... 其他工具是允许的,但我不知道有太多的工具可以删除网站内容的加载与 JavaScript (div 创建时,一些事件触发加载等)这就是为什么我需要 Selenium... BeautifulSoup 不能用于我的一些页面。

#!/usr/bin/env python
import multiprocessing, time, pika, json, traceback, logging, sys, os, itertools, urllib, urllib2, cStringIO, mysql.connector, shutil, hashlib, socket, urllib2, re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from PIL import Image
from os import listdir
from os.path import isfile, join
from bs4 import BeautifulSoup
from pprint import pprint


def getPhantomData(parameters):
try:
# We create WebDriver
browser = webdriver.Firefox()
# Navigate to URL
browser.get(parameters['target_url'])
# Find all links by Selector
links = browser.find_elements_by_css_selector(parameters['selector'])


result = []
for link in links:
# Extract link attribute and append to our list
result.append(link.get_attribute(parameters['attribute']))
browser.close()
browser.quit()
return json.dumps({'data': result})
except Exception, err:
browser.close()
browser.quit()
print err


def callback(ch, method, properties, body):
parameters = json.loads(body)
message = getPhantomData(parameters)


if message['data']:
ch.basic_ack(delivery_tag=method.delivery_tag)
else:
ch.basic_reject(delivery_tag=method.delivery_tag, requeue=True)


def consume():
credentials = pika.PlainCredentials('invitado', 'invitado')
rabbit = pika.ConnectionParameters('localhost',5672,'/',credentials)
connection = pika.BlockingConnection(rabbit)
channel = connection.channel()


# Conectamos al canal
channel.queue_declare(queue='com.stuff.images', durable=True)
channel.basic_consume(callback,queue='com.stuff.images')


print ' [*] Waiting for messages. To exit press CTRL^C'
try:
channel.start_consuming()
except KeyboardInterrupt:
pass


workers = 5
pool = multiprocessing.Pool(processes=workers)
for i in xrange(0, workers):
pool.apply_async(consume)


try:
while True:
continue
except KeyboardInterrupt:
print ' [*] Exiting...'
pool.terminate()
pool.join()
269422 次浏览

编者按 : 此答案不再适用于新的 Selenium 版本,请参阅 此评论


您可以通过键 COMMAND + TCOMMAND + W(OSX)的组合来实现选项卡的打开/关闭。在其他操作系统上,你可以使用 CONTROL + T/CONTROL + W

在硒中,你可以模仿这种行为。 您将需要创建一个网络驱动程序和许多标签作为您需要的测试。

这是密码。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys


driver = webdriver.Firefox()
driver.get("http://www.google.com/")


#open tab
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't')
# You can use (Keys.CONTROL + 't') on other OSs


# Load a page
driver.get('http://stackoverflow.com/')
# Make the tests...


# close the tab
# (Keys.CONTROL + 'w') on other OSs.
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 'w')




driver.close()
browser.execute_script('''window.open("http://bings.com","_blank");''')

在哪里 浏览器网络驱动程序

在挣扎了这么长时间之后,下面的方法对我奏效了:

driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)


windows = driver.window_handles


time.sleep(3)
driver.switch_to.window(windows[1])

这是根据另一个例子改编的一个通用代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys


driver = webdriver.Firefox()
driver.get("http://www.google.com/")


#open tab
# ... take the code from the options below


# Load a page
driver.get('http://bings.com')
# Make the tests...


# close the tab
driver.quit()

可能的方法是:

  1. <CTRL> + <T>发送到一个元素

    #open tab
    driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
    
  2. Sending <CTRL> + <T> via Action chains

    ActionChains(driver).key_down(Keys.CONTROL).send_keys('t').key_up(Keys.CONTROL).perform()
    
  3. Execute a javascript snippet

    driver.execute_script('''window.open("http://bings.com","_blank");''')
    

    为了实现这一点,您需要确保首选项 Open _ newwindowOpen _ newwindow. 限制被正确设置。最后版本中的默认值是可以的,否则您可能需要:

    fp = webdriver.FirefoxProfile()
    fp.set_preference("browser.link.open_newwindow", 3)
    fp.set_preference("browser.link.open_newwindow.restriction", 2)
    
    
    driver = webdriver.Firefox(browser_profile=fp)
    

    问题是这些首选项预设为 其他价值观,并且至少是 冻僵了硒3.4.0。当您使用概要文件通过 java 绑定来设置它们时,会出现一个 例外,而 Python 绑定将忽略新值。

    在 Java 中,当与 壁虎对话时,有一种方法可以设置这些首选项,而不需要指定配置文件对象,但是在 python 绑定中似乎还没有实现:

    FirefoxOptions options = new FirefoxOptions().setProfile(fp);
    options.addPreference("browser.link.open_newwindow", 3);
    options.addPreference("browser.link.open_newwindow.restriction", 2);
    FirefoxDriver driver = new FirefoxDriver(options);
    

The third option did stop working for python in selenium 3.4.0.

The first two options also did seem to stop working in selenium 3.4.0. They do depend on sending CTRL key event to an element. At first glance it seem that is a problem of the CTRL key, but it is failing because of the new multiprocess feature of Firefox. It might be that this new architecture impose new ways of doing that, or maybe is a temporary implementation problem. Anyway we can disable it via:

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.tabs.remote.autostart", False)
fp.set_preference("browser.tabs.remote.autostart.1", False)
fp.set_preference("browser.tabs.remote.autostart.2", False)


driver = webdriver.Firefox(browser_profile=fp)

然后你就可以成功地使用第一种方法。

在一次讨论中,西蒙明确提到:

虽然用于存储句柄列表的数据类型可以通过插入进行排序,但 WebDriver 实现在窗口句柄上迭代以插入句柄的顺序并不需要稳定。排序是任意的。


使用 Selenium v3.x新标签通过 巨蟒打开一个网站现在容易得多。我们必须为 number_of_windows_to_be(2)引入 等等,然后在每次打开新的选项卡/窗口时收集窗口句柄,最后根据需要迭代窗口句柄和 switchTo().window(newly_opened)。这里有一个解决方案,你可以在 首字母 TAB中打开 http://www.google.co.in,在 相邻标签中打开 https://www.yahoo.com:

  • 代码块:

      from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("http://www.google.co.in")
    print("Initial Page Title is : %s" %driver.title)
    windows_before  = driver.current_window_handle
    print("First Window Handle is : %s" %windows_before)
    driver.execute_script("window.open('https://www.yahoo.com')")
    WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
    windows_after = driver.window_handles
    new_window = [x for x in windows_after if x != windows_before][0]
    driver.switch_to.window(new_window)
    print("Page Title after Tab Switching is : %s" %driver.title)
    print("Second Window Handle is : %s" %new_window)
    
  • 控制台输出:

      Initial Page Title is : Google
    First Window Handle is : CDwindow-B2B3DE3A222B3DA5237840FA574AF780
    Page Title after Tab Switching is : Yahoo
    Second Window Handle is : CDwindow-D7DA7666A0008ED91991C623105A2EC4
    
  • 浏览器快照:

multiple__tabs


外面

您可以在 使用 WindowHandles 使用 Selenium 跟踪和迭代选项卡和窗口的最佳方法中找到基于 的讨论

我尝试了很长一段时间,在 Chrome 中使用 action _ keys 和 send _ keys 运行来复制标签页。唯一对我有用的就是一个 给你的答案。这就是我的复制标签页最终看起来的样子,可能不是最好的,但它对我来说很好用。

def duplicate_tabs(number, chromewebdriver):
#Once on the page we want to open a bunch of tabs
url = chromewebdriver.current_url
for i in range(number):
print('opened tab: '+str(i))
chromewebdriver.execute_script("window.open('"+url+"', 'new_window"+str(i)+"')")

它基本上是从 python 内部运行一些 java,非常有用。希望这对某些人有帮助。

注意: 我正在使用 Ubuntu,它不应该有什么不同,但如果它不适合你,这可能是原因。

奇怪的是,这么多的答案,而且他们都使用像 JS 和键盘快捷键这样的替代品,而不仅仅是使用硒的特性:

def newTab(driver, url="about:blank"):
wnd = driver.execute(selenium.webdriver.common.action_chains.Command.NEW_WINDOW)
handle = wnd["value"]["handle"]
driver.switch_to.window(handle)
driver.get(url) # changes the handle
return driver.current_window_handle
  • 操作系统: Win 10,
  • Python 3.8.1
    • Selenium = = 3.141.0
from selenium import webdriver
import time


driver = webdriver.Firefox(executable_path=r'TO\Your\Path\geckodriver.exe')
driver.get('https://www.google.com/')


# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[1])
driver.get("http://stackoverflow.com")
time.sleep(3)


# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[2])
driver.get("https://www.reddit.com/")
time.sleep(3)
# close the active tab
driver.close()
time.sleep(3)


# Switch back to the first tab
driver.switch_to.window(driver.window_handles[0])
driver.get("https://bing.com")
time.sleep(3)


# Close the only tab, will also close the browser.
driver.close()

参考资料: 需要帮助在硒中打开一个新标签

其他解决方案不适用于 铬合金驱动 V83

相反,它的工作原理如下,假设只有一个开始选项卡:

driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])
driver.get("https://www.example.com")

如果已经有超过1个打开的选项卡,您应该首先获取最新创建的选项卡的索引,并在调用 url (Credit to Tyler)之前切换到该选项卡:

driver.execute_script("window.open('');")
driver.switch_to.window(len(driver.window_handles)-1)
driver.get("https://www.example.com")

在同一个窗口打开的 新的空标签在铬浏览器是 不可能到我的知识,但你可以打开新的标签与网络链接。

到目前为止,我上网和我得到了很好的工作内容对这个问题。 请尽量按照下列步骤操作。

import selenium.webdriver as webdriver
from selenium.webdriver.common.keys import Keys


driver = webdriver.Chrome()
driver.get('https://www.google.com?q=python#q=python')
first_link = driver.find_element_by_class_name('l')


# Use: Keys.CONTROL + Keys.SHIFT + Keys.RETURN to open tab on top of the stack
first_link.send_keys(Keys.CONTROL + Keys.RETURN)


# Switch tab to the new tab, which we will assume is the next one on the right
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)


driver.quit()

我认为到目前为止这是更好的解决方案。

来源: https://gist.github.com/lrhache/7686903

tabs = {}


def new_tab():
global browser
hpos = browser.window_handles.index(browser.current_window_handle)
browser.execute_script("window.open('');")
browser.switch_to.window(browser.window_handles[hpos + 1])
return(browser.current_window_handle)
    

def switch_tab(name):
global tabs
global browser
if not name in tabs.keys():
tabs[name] = {'window_handle': new_tab(), 'url': url+name}
browser.get(tabs[name]['url'])
else:
browser.switch_to.window(tabs[name]['window_handle'])

我会坚持 行动链的。

下面是一个函数,它打开一个新的选项卡并切换到该选项卡:

import time
from selenium.webdriver.common.action_chains import ActionChains


def open_in_new_tab(driver, element, switch_to_new_tab=True):
base_handle = driver.current_window_handle
# Do some actions
ActionChains(driver) \
.move_to_element(element) \
.key_down(Keys.COMMAND) \
.click() \
.key_up(Keys.COMMAND) \
.perform()
    

# Should you switch to the new tab?
if switch_to_new_tab:
new_handle = [x for x in driver.window_handles if x!=base_handle]
assert len new_handle == 1 # assume you are only opening one tab at a time
        

# Switch to the new window
driver.switch_to.window(new_handle[0])


# I like to wait after switching to a new tab for the content to load
# Do that either with time.sleep() or with WebDriverWait until a basic
# element of the page appears (such as "body") -- reference for this is
# provided below
time.sleep(0.5)


# NOTE: if you choose to switch to the window/tab, be sure to close
# the newly opened window/tab after using it and that you switch back
# to the original "base_handle" --> otherwise, you'll experience many
# errors and a painful debugging experience...

下面是应用这个函数的方法:

# Remember your starting handle
base_handle = driver.current_window_handle


# Say we have a list of elements and each is a link:
links = driver.find_elements_by_css_selector('a[href]')


# Loop through the links and open each one in a new tab
for link in links:
open_in_new_tab(driver, link, True)
    

# Do something on this new page
print(driver.current_url)
    

# Once you're finished, close this tab and switch back to the original one
driver.close()
driver.switch_to.window(base_handle)
    

# You're ready to continue to the next item in your loop

这就是你如何能 等待页面加载

正如已经多次提到的,以下方法不再有效:

driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
ActionChains(driver).key_down(Keys.CONTROL).send_keys('t').key_up(Keys.CONTROL).perform()

此外,driver.execute_script("window.open('');")正在工作,但受到弹出窗口阻断器的限制。我并行处理数百个标签页(使用 头皮屑进行网页抓取)。然而,弹出窗口阻止程序在使用 JavaScript 的 window.open('')打开20个新标签后变得活跃,因此,已经破坏了我的爬虫。

为了解决这个问题,我声明了一个标签作为“ master”,它打开了下面的 helper.html:

<!DOCTYPE html>
<html><body>
<a id="open_new_window" href="about:blank" target="_blank">open a new window</a>
</body></html>

现在,我的(简化的)爬虫程序可以根据需要打开尽可能多的标签页,只要特意点击那个完全没有被弹出博客考虑到的链接:

# master
master_handle = driver.current_window_handle
helper = os.path.join(os.path.dirname(os.path.abspath(__file__)), "helper.html")
driver.get(helper)


# open new tabs
for _ in range(100):
window_handle = driver.window_handles          # current state
driver.switch_to_window(master_handle)
driver.find_element_by_id("open_new_window").click()
window_handle = set(driver.window_handles).difference(window_handle).pop()
print("new window handle:", window_handle)

通过 JavaScript 的 window.close()关闭这些窗口是没有问题的。

#Change the method of finding the element if needed
self.find_element_by_xpath(element).send_keys(Keys.CONTROL + Keys.ENTER)

这将找到元素并在新的选项卡中打开它。Self 只是用于 webDriver 对象的名称。

试试这个,它会奏效的:

# Open a new Tab
driver.execute_script("window.open('');")


# Switch to the new window and open URL B
driver.switch_to.window(driver.window_handles[1])
driver.get(tab_url)
from selenium import webdriver
import time


driver = webdriver.Firefox()
driver.get('https://www.google.com')


driver.execute_script("window.open('');")
time.sleep(5)


driver.switch_to.window(driver.window_handles[1])
driver.get("https://facebook.com")
time.sleep(5)


driver.close()
time.sleep(5)


driver.switch_to.window(driver.window_handles[0])
driver.get("https://www.yahoo.com")
time.sleep(5)


#driver.close()

Https://www.edureka.co/community/52772/close-active-current-without-closing-browser-selenium-python

你可以用这个打开一个新的标签

driver.execute_script("window.open('http://google.com', 'new_window')")

只是为了将来参考,简单的方法可以这样做:

driver.switch_to.new_window()
t=driver.window_handles[-1]# Get the handle of new tab
driver.switch_to.window(t)
driver.get(target_url) # Now the target url is opened in new tab

这对我有用:-

link = "https://www.google.com/"
driver.execute_script('''window.open("about:blank");''')  # Opening a blank new tab
driver.switch_to.window(driver.window_handles[1])  # Switching to newly opend tab
driver.get(link)

Selenium 的4.0.0版本支持以下操作:

  • 打开一个新的标签页尝试:

    driver.switch_to.new_window()

  • 切换到特定的选项卡(注意,tabID从0开始) :

    driver.switch_to.window(driver.window_handles[tabID])

只要足够使用这个打开新窗口(例如) :

driver.find_element_by_link_text("Images").send_keys(Keys.CONTROL + Keys.RETURN)