requests
<p>requests: 发送HTTP请求,接收响应</p>
<ol>
<li>如果浏览器能访问,requests不能访问,最坏是把浏览器请求头部,都写到requests请求头部中</li>
</ol>
<p>import requests
from bs4 import BeautifulSoup</p>
<p>r1 = requests.get(
url='<a href="https://dig.chouti.com/">https://dig.chouti.com/</a>',
headers={ # 注意,如果浏览器能访问,requests不能访问,最坏是把浏览器请求头部,都写到次处
'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36' # 伪装成Chrome浏览器访问
}
)</p>
<p>soup = BeautifulSoup(r1.text,'html.parser')</p>
<h1>标签对象</h1>
<p>content_list = soup.find(name='div',id='content-list')</p>
<h1>print(content_list)</h1>
<h1>[标签对象,标签对象]</h1>
<p>item_list = content_list.find_all(name='div',attrs={'class':'item'})
for item in item_list:
a = item.find(name='a',attrs={'class':'show-content color-chag'})
print(a.text.strip())</p>
<h1>print(a.text)</h1>
<p> </p>
<ol>
<li>Referer(根据目标网页检查来源网页,防止盗链)
在HTTP协议中,有一个表头字段叫referer,采用URL的格式来表示从哪儿链接到当前的网页或文件。换句话说,通过referer,网站可以检测目标网页访问的来源网页,如果是资源文件,
则可以跟踪到显示它的网页地址。有了referer跟踪来源就好办了,这时就可以通过技术手段来进行处理,一旦检测到来源不是本站即进行阻止或者返回指定的页面</li>
</ol>
<p>import re
import requests</p>
<p>r1 = requests.get(
url='<a href="https://passport.lagou.com/login/login.html">https://passport.lagou.com/login/login.html</a>',
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
}
)
X_Anti_Forge_Token = re.findall("X_Anti_Forge_Token = '(.<em>?)'", r1.text, re.S)[0]
X_Anti_Forge_Code = re.findall("X_Anti_Forge_Code = '(.</em>?)'", r1.text, re.S)[0]</p>
<h1>print(X_Anti_Forge_Token, X_Anti_Forge_Code)</h1>
<h1>print(r1.text)</h1>
<p>#
r2 = requests.post(
url='<a href="https://passport.lagou.com/login/login.json">https://passport.lagou.com/login/login.json</a>',
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'X-Anit-Forge-Code':X_Anti_Forge_Code,
'X-Anit-Forge-Token':X_Anti_Forge_Token,
'Referer': '<a href="https://passport.lagou.com/login/login.html">https://passport.lagou.com/login/login.html</a>', # 上一次请求地址是什么?
},
data={
"isValidate": True,
'username': '15131255089',
'password': 'ab18d270d7126ea65915c50288c22c0d',
'request_form_verifyCode': '',
'submit': ''
},
cookies=r1.cookies.get_dict()
)
print(r2.text)</p>
<ol>
<li>requests 参数</li>
</ol>
<p>url
headers
data
cookies
json
params
proxies
timeout</p>