Python HTTP库——requests

文章目录

简介
安装
基本概念
- RESTfulAPI
- OAuth2.0
- Cookie和Session
初试
GET请求
POST请求
PUT请求
DELETE请求
HEAD请求
OPTIONS请求
传递查询参数
响应内容
自定义响应头
传递表单参数
传递文件
响应状态码
响应头
Cookies
重定向和历史记录
超时
错误和异常
Session对象
请求和响应对象
预处理请求
SSL证书验证
客户端证书
CA证书
响应体工作流
长连接
流式上传
分块编码请求
POST多个Multipart-Encoded文件
事件钩子
自定义认证
流式请求
代理
SOCKS协议
HTTP动词
自定义动词
Link头
转换适配器
OAuth认证
下载图片
取消参数转义
转curl
封装
参考文献

简介

Requests 是一款优雅而简单的Python HTTP库，为人类而建

安装

pip install requests

基本概念

RESTfulAPI

每个网址代表一种资源，对于资源的具体操作类型，由 HTTP 动词表示：

GET（SELECT）：获取一项或多项资源
POST（CREATE）：新建一个资源
PUT（UPDATE）：更新并返回完整资源
PATCH（UPDATE）：更新并返回资源改变的属性
DELETE（DELETE）：删除资源
HEAD：获取资源的元数据
OPTIONS：获取资源信息，如哪些属性是客户端可以改变的

OAuth2.0

临时授权机制

Cookie和Session

HTTP 是无状态的，每次 HTTP 请求都是独立的

为了保持状态，在服务端保存 Session，在客户端（浏览器）保存 Cookies

浏览器每次请求附带上 Cookies，服务器通过识别 Cookies 鉴定出是哪个用户

Session 指有始有终的一系列动作，如打电话从拿起电话拨号到挂断电话这一过程可称为一个 Session

Session 在 Web 中用来存储用户属性及其配置信息

初试

GET 请求，模拟登录

import requestsr = requests.get('https://api.github.com/user', auth=('user', 'pass'))  # 模拟登录
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())

GET请求

import requestsr = requests.get('https://api.github.com/events')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())
# 200
# application/json; charset=utf-8
# utf-8
# ...

POST请求

import requestsr = requests.post('https://httpbin.org/post', data={'key': 'value'})
print(r.json())

使用文件流

import requestsfiles = {'file': open('test.txt', 'rb'),'key0': (None, 'value0'),'key1': (None, 'value1'),
}
response = requests.post('http://httpbin.org/post', files=files)
print(response.json())

或

import requests
from requests_toolbelt.multipart.encoder import MultipartEncoderdata = {'file': open('test.txt', 'rb'),'key0': 'value0','key1': 'value1',
}
response = requests.post('http://httpbin.org/post', data=MultipartEncoder(data))
print(response.json())

PUT请求

import requestsr = requests.put('https://httpbin.org/put', data={'key': 'value'})
print(r.json())

DELETE请求

import requestsr = requests.delete('https://httpbin.org/delete')
print(r.json())

HEAD请求

import requestsr = requests.head('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)

OPTIONS请求

import requestsr = requests.options('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)

传递查询参数

在 url 中传递查询参数，如 http://httpbin.org/get?key=val

params 参数

import requestspayload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)  # https://httpbin.org/get?key1=value1&key2=value2
print(r.json())payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)  # https://httpbin.org/get?key1=value1&key2=value2&key2=value3

响应内容

import requestsr = requests.get('https://api.github.com/events')
print(r.text)
print(r.json())
print(r.encoding)  # utf-8

自定义响应头

import requestsurl = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
print(r.json())

传递表单参数

data 参数

import requestspayload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post('https://httpbin.org/post', data=payload)
print(r.json())payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
r = requests.post('https://httpbin.org/post', data=payload_tuples)
print(r.json())payload_dict = {'key1': ['value1', 'value2']}
r = requests.post('https://httpbin.org/post', data=payload_dict)
print(r.json())

传递 JSON-Encoded 数据，这两种方法等价

import json
import requestspayload = {'key1': 'value1', 'key2': 'value2'}
url = 'https://api.github.com/some/endpoint'
r = requests.post(url, data=json.dumps(payload))
r = requests.post(url, json=payload)

传递文件

import requestswith open('1.txt', mode='w') as f:f.write('123')url = 'https://httpbin.org/post'
files = {'file': open('1.txt', 'rb')}
r = requests.post(url, files=files)
print(r.json())files = {'file': ('1.txt', open('1.txt', 'rb'), 'text/plain', {'Expires': '0'})}  # 设置filename、content_type、headers
r = requests.post(url, files=files)
print(r.json())files = {'file': ('1.csv', 'some,data,to,send\nanother,row,to,send\n')}  # 字符串作为文件
r = requests.post(url, files=files)
print(r.json())

大文件用 requests-toolbelt
建议用二进制形式传输

响应状态码

import requestsr = requests.get('https://httpbin.org/get')
print(r.status_code)  # 200
print(r.status_code == requests.codes.ok)  # True
r.raise_for_status()bad_r = requests.get('https://httpbin.org/status/404')
print(bad_r.status_code)  # 404
print(bad_r.status_code == requests.codes.not_found)  # True
try:bad_r.raise_for_status()
except Exception as e:print(e)  # 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404

响应头

import requestsr = requests.get('https://api.github.com/events')
print(r.headers)  # {'Server': 'GitHub.com', 'Date': 'Mon, 05 Sep 2022 10:35:42 GMT', ...}
print(r.headers['content-type'])  # application/json; charset=utf-8
print(r.headers.get('content-type'))  # application/json; charset=utf-8

Cookies

import requestsurl = 'https://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
print(r.json())
print(r.cookies)jar = requests.cookies.RequestsCookieJar()
jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
url = 'https://httpbin.org/cookies'
r = requests.get(url, cookies=jar)
print(r.json())
print(r.cookies)

重定向和历史记录

使用响应对象的属性 history 来追踪重定向

import requestsr = requests.get('http://github.com/')
print(r.url)  # 'https://github.com/'
print(r.status_code)  # 200
print(r.history)  # [<Response [301]>]r = requests.get('http://github.com/', allow_redirects=False)  # 禁用重定向
print(r.status_code)  # 301
print(r.history)  # []r = requests.head('http://github.com/', allow_redirects=True)
print(r.url)  # 'https://github.com/'
print(r.history)  # [<Response [301]>]

超时

参数 timeout 指定超时停止响应时间

import requeststry:requests.get('https://github.com/', timeout=0.001)
except Exception as e:print(e)

错误和异常

网络问题，如 DNS 失败，拒绝连接，会引发异常 ConnectionError

HTTP 请求返回不成功的状态码，Response.raise_for_status() 会引发异常 HTTPError

请求超时，会引发异常 Timeout

请求超过最大重定向数，会引发异常 TooManyRedirects

所有异常都继承 requests.RequestException

Session对象

Session 对象能跨请求持久化某些参数，如 Cookies
如果向同一主机发出多个请求，重用底层 TCP 连接可以显著提高性能
Session 对象有以上所有 API 方法，还可以为请求提供默认数据
即使使用 Session 对象，方法级参数也不会跨请求持久化
手动添加 Cookies，使用 Session.cookies
Session 对象可以通过上下文管理器使用
有时不需要 Session 对象的某参数，只需在方法级参数中将该键设为 None

import requests# Session对象能跨请求持久化某些参数，如Cookies
s = requests.Session()
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')
print(r.json())  # {'cookies': {'sessioncookie': '123456789'}}# Session对象有所有API方法，还可以为请求提供默认数据
s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
s.get('https://httpbin.org/headers', headers={'x-test2': 'true'})  # 同时发送x-test和x-test2# 即使使用Session对象，方法级参数也不会跨请求持久化
s = requests.Session()
r = s.get('https://httpbin.org/cookies', cookies={'from-my': 'browser'})
print(r.json())  # {'cookies': {'from-my': 'browser'}}
r = s.get('https://httpbin.org/cookies')
print(r.json())  # {'cookies': {}}# Session对象可以通过上下文管理器使用
with requests.Session() as s:s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')

维持 Session，相当于只开一个浏览器在请求

import requestswith requests.Session() as s:s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')r = s.get('http://httpbin.org/cookies')print(r.json())

请求和响应对象

调用 requests.get() 实际上在做两件事：

构造一个 Request 对象发送到服务器请求资源
一旦请求从服务器获得响应，生成一个 Response 对象

Response 对象有服务器返回的所有信息，还包含最初创建的 Request 对象

import requestsr = requests.get('https://en.wikipedia.org/wiki/Monty_Python')
print(r.headers)
print(r.request.headers)

预处理请求

无论怎样发起请求，实际使用的是 PreparedRequest

如果需要在发送请求前对请求体或头部做一些修改，见原文

SSL证书验证

类似浏览器验证 HTTPS 请求的 SSL 证书，如果无法验证将抛出 SSLError

参数 verify 可指定 CA 证书
受信任的 CA 列表也可以通过环境变量 REQUESTS_CA_BUNDLE 指定。如果没有设置 REQUESTS_CA_BUNDLE，CURL_CA_BUNDLE 会用于回调
参数 verify 设为 False 则不进行 SSL 证书验证。但无论是否验证，都会接受服务器提供的 TLS 证书，并忽略和主机名不匹配或过期的证书，这样做可能会受到中间人(MitM)攻击
参数 verify 默认为 True，验证仅适用于 host 证书

import requestsr = requests.get('https://requestb.in')
print(r.text)r = requests.get('https://github.com')
print(r.text)r = requests.get('https://github.com', verify='/path/to/certfile')
# 上下两种方式类似
s = requests.Session()
s.verify = '/path/to/certfile'r = requests.get('https://kennethreitz.org', verify=False)
print(r)  # <Response [200]>

客户端证书

指定本地证书为客户端证书，可以是单个文件（包含密钥和证书）或一个包含两个文件路径的元组

import requestsrequests.get('https://kennethreitz.org', cert=('/path/client.cert', '/path/client.key'))
# 或
s = requests.Session()
s.cert = '/path/client.cert'

本地证书对应的密钥必须为解密状态

CA证书

Requests 使用 certifiio 的证书，允许在不更新 Requests 版本的情况下更新其受信任的证书

在 2.16 版本之前，Requests 绑定了一组来自 Mozilla 的根 CA，每次 Requests 更新，证书也会更新

如果没有安装 certifiio，在使用较旧版本的 Requests 时，会出现非常过时的证书

出于安全考虑，建议频繁更新证书！

响应体工作流

默认情况下，发出请求后，响应体会立即下载。可以改成访问 Response.content 时才下载响应体
请求时设置 stream=True，连接不会释放，直到获取所有数据或调用Response.close()，这样可能效率低下，建议用上下文管理器

import requeststarball_url = 'https://github.com/psf/requests/tarball/main'
r = requests.get(tarball_url, stream=True)  # 此时只下载了响应头，仍然处于连接打开状态，可以进行有条件的内容检索TOO_LONG = 1024
if int(r.headers['content-length']) < TOO_LONG:content = r.content...with requests.get('https://httpbin.org/get', stream=True) as r:...

长连接

在 Session 中发出的请求都是长连接，且会自动重用合适的连接

流式上传

import requestswith open('massive-body', 'rb') as f:requests.post('http://some.url/streamed', data=f)

分块编码请求

import requestsdef gen():yield 'hi'yield 'there'requests.post('http://some.url/chunked', data=gen())

POST多个Multipart-Encoded文件

import requestsurl = 'https://httpbin.org/post'
multiple_files = [('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
r = requests.post(url, files=multiple_files)
print(r.text)

事件钩子

import requestsdef print_url(r, *args, **kwargs):print(r.url)def record_hook(r, *args, **kwargs):r.hook_called = Truereturn rr = requests.get('https://httpbin.org/', hooks={'response': print_url})
print(r)
# https://httpbin.org/
# <Response [200]>r = requests.get('https://httpbin.org/', hooks={'response': [print_url, record_hook]})
print(r.hook_called)
# https://httpbin.org/
# Trues = requests.Session()
s.hooks['response'].append(print_url)
print(s.get('https://httpbin.org/'))
# https://httpbin.org/
# <Response [200]>

自定义认证

import requests
from requests.auth import AuthBaseclass PizzaAuth(AuthBase):"""Attaches HTTP Pizza Authentication to the given Request object."""def __init__(self, username):# setup any auth-related data hereself.username = usernamedef __call__(self, r):# modify and return the requestr.headers['X-Pizza'] = self.usernamereturn rprint(requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth')))

流式请求

import json
import requestsr = requests.get('https://httpbin.org/stream/20', stream=True)for line in r.iter_lines():if line:decoded_line = line.decode('utf-8')print(json.loads(decoded_line))r = requests.get('https://httpbin.org/stream/20', stream=True)
if r.encoding is None:r.encoding = 'utf-8'
for line in r.iter_lines(decode_unicode=True):if line:print(json.loads(line))

代理

参数 proxies 配置代理

import requestsproxies = {'http': 'http://10.10.1.10:3128','https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)# 或为整个Session配置一次
proxies = {'http': 'http://10.10.1.10:3128','https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')

当代理配置没有覆盖每个请求时，检查 Requests 依赖的环境变量

export HTTP_PROXY="http://10.10.1.10:3128"
export HTTPS_PROXY="http://10.10.1.10:1080"
export ALL_PROXY="socks5://10.10.1.10:3434"

SOCKS协议

HTTP动词

自定义动词

Link头

转换适配器

OAuth认证

安装

pip install requests-oauthlib

代码

from requests_oauthlib import OAuth1Sessiontwitter = OAuth1Session('client_key',client_secret='client_secret',resource_owner_key='resource_owner_key',resource_owner_secret='resource_owner_secret')
url = 'https://api.twitter.com/1/account/settings.json'
r = twitter.get(url)

下载图片

取消参数转义

import requestsparams = {'username': 'abc','password': '%'
}
params = '&'.join('{}={}'.format(k, v) for k, v in params.items())
response = requests.get('https://httpbin.org/get', params=params)
print(response.json())

转curl

安装

pip install curlify

初试

import curlify
import requestsresponse = requests.get("http://google.ru")
print(curlify.to_curl(response.request))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' 'http://www.google.ru/'print(curlify.to_curl(response.request, compressed=True))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' --compressed 'http://www.google.ru/'