文章目录
- 简介
- 安装
- 基本概念
- RESTfulAPI
- OAuth2.0
- Cookie和Session
- 初试
- GET请求
- POST请求
- PUT请求
- DELETE请求
- HEAD请求
- OPTIONS请求
- 传递查询参数
- 响应内容
- 自定义响应头
- 传递表单参数
- 传递文件
- 响应状态码
- 响应头
- Cookies
- 重定向和历史记录
- 超时
- 错误和异常
- Session对象
- 请求和响应对象
- 预处理请求
- SSL证书验证
- 客户端证书
- CA证书
- 响应体工作流
- 长连接
- 流式上传
- 分块编码请求
- POST多个Multipart-Encoded文件
- 事件钩子
- 自定义认证
- 流式请求
- 代理
- SOCKS协议
- HTTP动词
- 自定义动词
- Link头
- 转换适配器
- OAuth认证
- 下载图片
- 取消参数转义
- 转curl
- 封装
- 参考文献
简介
Requests
是一款优雅而简单的Python HTTP库,为人类而建
安装
pip install requests
基本概念
RESTfulAPI
每个网址代表一种资源,对于资源的具体操作类型,由 HTTP 动词表示:
- GET(SELECT):获取一项或多项资源
- POST(CREATE):新建一个资源
- PUT(UPDATE):更新并返回完整资源
- PATCH(UPDATE):更新并返回资源改变的属性
- DELETE(DELETE):删除资源
- HEAD:获取资源的元数据
- OPTIONS:获取资源信息,如哪些属性是客户端可以改变的
OAuth2.0
临时授权机制
Cookie和Session
HTTP 是无状态的,每次 HTTP 请求都是独立的
为了保持状态,在服务端保存 Session,在客户端(浏览器)保存 Cookies
浏览器每次请求附带上 Cookies,服务器通过识别 Cookies 鉴定出是哪个用户
Session 指有始有终的一系列动作,如打电话从拿起电话拨号到挂断电话这一过程可称为一个 Session
Session 在 Web 中用来存储用户属性及其配置信息
初试
GET 请求,模拟登录
import requestsr = requests.get('https://api.github.com/user', auth=('user', 'pass')) # 模拟登录
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())
GET请求
import requestsr = requests.get('https://api.github.com/events')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())
# 200
# application/json; charset=utf-8
# utf-8
# ...
POST请求
import requestsr = requests.post('https://httpbin.org/post', data={'key': 'value'})
print(r.json())
使用文件流
import requestsfiles = {'file': open('test.txt', 'rb'),'key0': (None, 'value0'),'key1': (None, 'value1'),
}
response = requests.post('http://httpbin.org/post', files=files)
print(response.json())
或
import requests
from requests_toolbelt.multipart.encoder import MultipartEncoderdata = {'file': open('test.txt', 'rb'),'key0': 'value0','key1': 'value1',
}
response = requests.post('http://httpbin.org/post', data=MultipartEncoder(data))
print(response.json())
PUT请求
import requestsr = requests.put('https://httpbin.org/put', data={'key': 'value'})
print(r.json())
DELETE请求
import requestsr = requests.delete('https://httpbin.org/delete')
print(r.json())
HEAD请求
import requestsr = requests.head('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
OPTIONS请求
import requestsr = requests.options('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
传递查询参数
在 url 中传递查询参数,如 http://httpbin.org/get?key=val
params
参数
import requestspayload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url) # https://httpbin.org/get?key1=value1&key2=value2
print(r.json())payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url) # https://httpbin.org/get?key1=value1&key2=value2&key2=value3
响应内容
import requestsr = requests.get('https://api.github.com/events')
print(r.text)
print(r.json())
print(r.encoding) # utf-8
自定义响应头
import requestsurl = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
print(r.json())
传递表单参数
data
参数
import requestspayload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post('https://httpbin.org/post', data=payload)
print(r.json())payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
r = requests.post('https://httpbin.org/post', data=payload_tuples)
print(r.json())payload_dict = {'key1': ['value1', 'value2']}
r = requests.post('https://httpbin.org/post', data=payload_dict)
print(r.json())
传递 JSON-Encoded 数据,这两种方法等价
import json
import requestspayload = {'key1': 'value1', 'key2': 'value2'}
url = 'https://api.github.com/some/endpoint'
r = requests.post(url, data=json.dumps(payload))
r = requests.post(url, json=payload)
传递文件
import requestswith open('1.txt', mode='w') as f:f.write('123')url = 'https://httpbin.org/post'
files = {'file': open('1.txt', 'rb')}
r = requests.post(url, files=files)
print(r.json())files = {'file': ('1.txt', open('1.txt', 'rb'), 'text/plain', {'Expires': '0'})} # 设置filename、content_type、headers
r = requests.post(url, files=files)
print(r.json())files = {'file': ('1.csv', 'some,data,to,send\nanother,row,to,send\n')} # 字符串作为文件
r = requests.post(url, files=files)
print(r.json())
- 大文件用
requests-toolbelt
- 建议用二进制形式传输
响应状态码
import requestsr = requests.get('https://httpbin.org/get')
print(r.status_code) # 200
print(r.status_code == requests.codes.ok) # True
r.raise_for_status()bad_r = requests.get('https://httpbin.org/status/404')
print(bad_r.status_code) # 404
print(bad_r.status_code == requests.codes.not_found) # True
try:bad_r.raise_for_status()
except Exception as e:print(e) # 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404
响应头
import requestsr = requests.get('https://api.github.com/events')
print(r.headers) # {'Server': 'GitHub.com', 'Date': 'Mon, 05 Sep 2022 10:35:42 GMT', ...}
print(r.headers['content-type']) # application/json; charset=utf-8
print(r.headers.get('content-type')) # application/json; charset=utf-8
Cookies
import requestsurl = 'https://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
print(r.json())
print(r.cookies)jar = requests.cookies.RequestsCookieJar()
jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
url = 'https://httpbin.org/cookies'
r = requests.get(url, cookies=jar)
print(r.json())
print(r.cookies)
重定向和历史记录
使用响应对象的属性 history
来追踪重定向
import requestsr = requests.get('http://github.com/')
print(r.url) # 'https://github.com/'
print(r.status_code) # 200
print(r.history) # [<Response [301]>]r = requests.get('http://github.com/', allow_redirects=False) # 禁用重定向
print(r.status_code) # 301
print(r.history) # []r = requests.head('http://github.com/', allow_redirects=True)
print(r.url) # 'https://github.com/'
print(r.history) # [<Response [301]>]
超时
参数 timeout
指定超时停止响应时间
import requeststry:requests.get('https://github.com/', timeout=0.001)
except Exception as e:print(e)
错误和异常
网络问题,如 DNS 失败,拒绝连接,会引发异常 ConnectionError
HTTP 请求返回不成功的状态码,Response.raise_for_status()
会引发异常 HTTPError
请求超时,会引发异常 Timeout
请求超过最大重定向数,会引发异常 TooManyRedirects
所有异常都继承 requests.RequestException
Session对象
Session
对象能跨请求持久化某些参数,如 Cookies- 如果向同一主机发出多个请求,重用底层 TCP 连接可以显著提高性能
Session
对象有以上所有 API 方法,还可以为请求提供默认数据- 即使使用
Session
对象,方法级参数也不会跨请求持久化 - 手动添加 Cookies,使用
Session.cookies
Session
对象可以通过上下文管理器使用- 有时不需要
Session
对象的某参数,只需在方法级参数中将该键设为 None
import requests# Session对象能跨请求持久化某些参数,如Cookies
s = requests.Session()
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')
print(r.json()) # {'cookies': {'sessioncookie': '123456789'}}# Session对象有所有API方法,还可以为请求提供默认数据
s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
s.get('https://httpbin.org/headers', headers={'x-test2': 'true'}) # 同时发送x-test和x-test2# 即使使用Session对象,方法级参数也不会跨请求持久化
s = requests.Session()
r = s.get('https://httpbin.org/cookies', cookies={'from-my': 'browser'})
print(r.json()) # {'cookies': {'from-my': 'browser'}}
r = s.get('https://httpbin.org/cookies')
print(r.json()) # {'cookies': {}}# Session对象可以通过上下文管理器使用
with requests.Session() as s:s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
维持 Session,相当于只开一个浏览器在请求
import requestswith requests.Session() as s:s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')r = s.get('http://httpbin.org/cookies')print(r.json())
请求和响应对象
调用 requests.get()
实际上在做两件事:
- 构造一个 Request 对象发送到服务器请求资源
- 一旦请求从服务器获得响应,生成一个 Response 对象
Response 对象有服务器返回的所有信息,还包含最初创建的 Request 对象
import requestsr = requests.get('https://en.wikipedia.org/wiki/Monty_Python')
print(r.headers)
print(r.request.headers)
预处理请求
无论怎样发起请求,实际使用的是 PreparedRequest
如果需要在发送请求前对请求体或头部做一些修改,见原文
SSL证书验证
类似浏览器验证 HTTPS 请求的 SSL 证书,如果无法验证将抛出 SSLError
- 参数
verify
可指定 CA 证书 - 受信任的 CA 列表也可以通过环境变量
REQUESTS_CA_BUNDLE
指定。如果没有设置REQUESTS_CA_BUNDLE
,CURL_CA_BUNDLE
会用于回调 - 参数
verify
设为False
则不进行 SSL 证书验证。但无论是否验证,都会接受服务器提供的 TLS 证书,并忽略和主机名不匹配或过期的证书,这样做可能会受到中间人(MitM)攻击 - 参数
verify
默认为True
,验证仅适用于 host 证书
import requestsr = requests.get('https://requestb.in')
print(r.text)r = requests.get('https://github.com')
print(r.text)r = requests.get('https://github.com', verify='/path/to/certfile')
# 上下两种方式类似
s = requests.Session()
s.verify = '/path/to/certfile'r = requests.get('https://kennethreitz.org', verify=False)
print(r) # <Response [200]>
客户端证书
指定本地证书为客户端证书,可以是单个文件(包含密钥和证书)或一个包含两个文件路径的元组
import requestsrequests.get('https://kennethreitz.org', cert=('/path/client.cert', '/path/client.key'))
# 或
s = requests.Session()
s.cert = '/path/client.cert'
本地证书对应的密钥必须为解密状态
CA证书
Requests
使用 certifiio
的证书,允许在不更新 Requests
版本的情况下更新其受信任的证书
在 2.16 版本之前,Requests
绑定了一组来自 Mozilla 的根 CA,每次 Requests
更新,证书也会更新
如果没有安装 certifiio
,在使用较旧版本的 Requests
时,会出现非常过时的证书
出于安全考虑,建议频繁更新证书!
响应体工作流
- 默认情况下,发出请求后,响应体会立即下载。可以改成访问
Response.content
时才下载响应体 - 请求时设置
stream=True
,连接不会释放,直到获取所有数据或调用Response.close()
,这样可能效率低下,建议用上下文管理器
import requeststarball_url = 'https://github.com/psf/requests/tarball/main'
r = requests.get(tarball_url, stream=True) # 此时只下载了响应头,仍然处于连接打开状态,可以进行有条件的内容检索TOO_LONG = 1024
if int(r.headers['content-length']) < TOO_LONG:content = r.content...with requests.get('https://httpbin.org/get', stream=True) as r:...
长连接
在 Session 中发出的请求都是长连接,且会自动重用合适的连接
流式上传
import requestswith open('massive-body', 'rb') as f:requests.post('http://some.url/streamed', data=f)
分块编码请求
import requestsdef gen():yield 'hi'yield 'there'requests.post('http://some.url/chunked', data=gen())
POST多个Multipart-Encoded文件
import requestsurl = 'https://httpbin.org/post'
multiple_files = [('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
r = requests.post(url, files=multiple_files)
print(r.text)
事件钩子
import requestsdef print_url(r, *args, **kwargs):print(r.url)def record_hook(r, *args, **kwargs):r.hook_called = Truereturn rr = requests.get('https://httpbin.org/', hooks={'response': print_url})
print(r)
# https://httpbin.org/
# <Response [200]>r = requests.get('https://httpbin.org/', hooks={'response': [print_url, record_hook]})
print(r.hook_called)
# https://httpbin.org/
# Trues = requests.Session()
s.hooks['response'].append(print_url)
print(s.get('https://httpbin.org/'))
# https://httpbin.org/
# <Response [200]>
自定义认证
import requests
from requests.auth import AuthBaseclass PizzaAuth(AuthBase):"""Attaches HTTP Pizza Authentication to the given Request object."""def __init__(self, username):# setup any auth-related data hereself.username = usernamedef __call__(self, r):# modify and return the requestr.headers['X-Pizza'] = self.usernamereturn rprint(requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth')))
流式请求
import json
import requestsr = requests.get('https://httpbin.org/stream/20', stream=True)for line in r.iter_lines():if line:decoded_line = line.decode('utf-8')print(json.loads(decoded_line))r = requests.get('https://httpbin.org/stream/20', stream=True)
if r.encoding is None:r.encoding = 'utf-8'
for line in r.iter_lines(decode_unicode=True):if line:print(json.loads(line))
代理
参数 proxies
配置代理
import requestsproxies = {'http': 'http://10.10.1.10:3128','https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)# 或为整个Session配置一次
proxies = {'http': 'http://10.10.1.10:3128','https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')
当代理配置没有覆盖每个请求时,检查 Requests
依赖的环境变量
export HTTP_PROXY="http://10.10.1.10:3128"
export HTTPS_PROXY="http://10.10.1.10:1080"
export ALL_PROXY="socks5://10.10.1.10:3434"
SOCKS协议
HTTP动词
自定义动词
Link头
转换适配器
OAuth认证
安装
pip install requests-oauthlib
代码
from requests_oauthlib import OAuth1Sessiontwitter = OAuth1Session('client_key',client_secret='client_secret',resource_owner_key='resource_owner_key',resource_owner_secret='resource_owner_secret')
url = 'https://api.twitter.com/1/account/settings.json'
r = twitter.get(url)
下载图片
取消参数转义
import requestsparams = {'username': 'abc','password': '%'
}
params = '&'.join('{}={}'.format(k, v) for k, v in params.items())
response = requests.get('https://httpbin.org/get', params=params)
print(response.json())
转curl
安装
pip install curlify
初试
import curlify
import requestsresponse = requests.get("http://google.ru")
print(curlify.to_curl(response.request))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' 'http://www.google.ru/'print(curlify.to_curl(response.request, compressed=True))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' --compressed 'http://www.google.ru/'