漂亮的汤来了 - 开卷题库

小开

你可以用下面的方法使用find_all来找到每个具有href属性的a元素，并打印每个元素:

# Python2
from BeautifulSoup import BeautifulSoup
    

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''
    

soup = BeautifulSoup(html)
    

for a in soup.find_all('a', href=True):
print "Found the URL:", a['href']


# The output would be:
# Found the URL: some_url
# Found the URL: another_url

# Python3
from bs4 import BeautifulSoup


html = '''<a href="https://some_url.com">next</a>
<span class="class">
<a href="https://some_other_url.com">another_url</a></span>'''


soup = BeautifulSoup(html)


for a in soup.find_all('a', href=True):
print("Found the URL:", a['href'])


# The output would be:
# Found the URL: https://some_url.com
# Found the URL: https://some_other_url.com

注意，如果你使用的是旧版本的BeautifulSoup(在版本4之前)，这个方法的名字是findAll。在版本4中，BeautifulSoup的方法名为更改为PEP 8兼容，所以你应该使用find_all来代替。

如果你想要带有href的所有标记，你可以省略name参数:

href_tags = soup.find_all(href=True)