目录
1.定制请求头
(1).查看请求头
(2).设置请求头
2.验证 Cookie
3.保持会话
4. SSL 证书验证
1.定制请求头
(1).查看请求头
(2).设置请求头
import requests
# 定义 URL 和请求头
base_url = 'https://www.zhihu.com/signin'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64'
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}
# 根据 URL 和请求头构造请求,发送 GET 请求,接收服务器返回的响应
response = requests.get(base_url, headers=header)
response.encoding = 'utf-8'
# 查看响应内容
print(response.text)
<!DOCTYPE html>
<html lang="zh" data-hairline="true" data-theme="light"> <head> <meta charset="utf-8" /> <title data-react-helmet="true">知乎 - 有问题,就会有答案</title> <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" /> <meta name="renderer" content="webkit" /> <meta name="force-rendering" content="webkit" /> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /> <meta name="google-site-verification" content="FTeR0c8arOPKh8c5DYh_9uu98_zJbaWw53J-Sch9MTg" /> ……</body>
</html>
2.验证 Cookie
import requests
headers = { 'Cookie': '此处填写登录百度网站后查看的 Cookie 信息', # 设置字段 Cookie 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4)' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/53.0.2785.116 Safari/537.36',} # 设置字段 User-Agent
response = requests.get('https://www.baidu.com/', headers=headers)
print(response.text)
第 2 种方式的实现代码如下。
import requests
header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) ' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/53.0.2785.116 Safari/537.36'}
# 准备 Cookie
cookie = '此处填写登录百度网站后查看的 Cookie 信息'
# 创建 RequestsCookieJar 类的对象
jar_obj = requests.cookies.RequestsCookieJar()
# 以逗号为分隔符分隔 Cookie,并将获得的键和值保存至 jar_obj 中
for temp in cookie.split(';'): key, value = temp.split('=', 1) jar_obj.set(key, value)
response = requests.get('https://www.baidu.com/', headers=header, cookies=jar_obj)
print(response.text)
上述两段代码的运行结果如下。
......
"userAttr":Number("")|| 0,
"username":"Itcast_001122",
"unametype":"2",
"userIsSkined":"off",
"userIsNewSkined":"off",
"userSkinName":"",
"userSkinOpacity":"70",
……
3.保持会话
import requests
# 创建会话
sess_obj = requests.Session()
sess_obj.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
response = sess_obj.get("http://httpbin.org/cookies")
print(response.text)
{ "cookies": { "sessioncookie": "123456789" }
}
import requests
requests.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
response = requests.get("http://httpbin.org/cookies")
print(response.text)
运行程序,输出如下结果。
{"cookies": {}}
4. SSL 证书验证
import requests
base_url = 'https://data.stats.gov.cn/'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64' 'AppleWebKit/537.36 (KHTML, like Gecko)' 'Chrome/90.0.4430.212 Safari/537.36'}
# 发送 GET 请求
response = requests.get(base_url, headers=header)
print(response.status_code)
运行代码,程序抛出 SSLError 异常,具体内容如下。
......
requests.exceptions.SSLError: HTTPSConnectionPool(host='data.stats.gov.cn', port=
443): Max retries exceeded with url: / (Caused by SSLError(SSLCert Verification Error(1,
'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in
certificate chain (_ssl.c:1108)')))
这时需要主动关闭 SSL 验证,即在调用 get()函数时将 verify 参数设置为 False,代码如下。
response = requests.get(base_url, headers=header, verify=False)
再次运行代码,发现控制台没有输出 SSLError 异常,而是输出了如下警告信息。
C:\Users\admin\AppData\Roaming\Python\Python38\site-packages\urllib3\
connectionpool.py:981: InsecureRequestWarning: Unverified HTTPS request
is being made to host 'data.stats.gov.cn'. Adding certificate verification is
strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.
html#ssl-warnings
这时,如果不希望收到警告信息,则可以采用如下方式消除警告信息。
import urllib3
urllib3.disable_warnings()
再次运行程序,发现控制台中不再输出上面的警告信息。