上一次我們爬蟲我們已經(jīng)成功的爬下了網(wǎng)頁的源代碼,那么這一次我們將繼續(xù)來寫怎么抓去具體想要的元素

首先回顧以下我們BeautifulSoup的基本結(jié)構(gòu)如下

復(fù)制代碼
#!/usr/bin/env python # -*-coding:utf-8 -*- from bs4 import BeautifulSoup import requests

headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36',
}

url = "爬取網(wǎng)頁的地址" web_data = requests.get(url,headers=headers)
soup = BeautifulSoup(web_data.text,"lxml"