欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 科技 > 能源 > Python HTTP库——requests

Python HTTP库——requests

2025/4/21 4:25:56 来源:https://blog.csdn.net/lly1122334/article/details/106781429  浏览:    关键词:Python HTTP库——requests

文章目录

  • 简介
  • 安装
  • 基本概念
    • RESTfulAPI
    • OAuth2.0
    • Cookie和Session
  • 初试
  • GET请求
  • POST请求
  • PUT请求
  • DELETE请求
  • HEAD请求
  • OPTIONS请求
  • 传递查询参数
  • 响应内容
  • 自定义响应头
  • 传递表单参数
  • 传递文件
  • 响应状态码
  • 响应头
  • Cookies
  • 重定向和历史记录
  • 超时
  • 错误和异常
  • Session对象
  • 请求和响应对象
  • 预处理请求
  • SSL证书验证
  • 客户端证书
  • CA证书
  • 响应体工作流
  • 长连接
  • 流式上传
  • 分块编码请求
  • POST多个Multipart-Encoded文件
  • 事件钩子
  • 自定义认证
  • 流式请求
  • 代理
  • SOCKS协议
  • HTTP动词
  • 自定义动词
  • Link头
  • 转换适配器
  • OAuth认证
  • 下载图片
  • 取消参数转义
  • 转curl
  • 封装
  • 参考文献

简介

Requests 是一款优雅而简单的Python HTTP库,为人类而建




安装

pip install requests




基本概念

RESTfulAPI

每个网址代表一种资源,对于资源的具体操作类型,由 HTTP 动词表示:

  • GET(SELECT):获取一项或多项资源
  • POST(CREATE):新建一个资源
  • PUT(UPDATE):更新并返回完整资源
  • PATCH(UPDATE):更新并返回资源改变的属性
  • DELETE(DELETE):删除资源
  • HEAD:获取资源的元数据
  • OPTIONS:获取资源信息,如哪些属性是客户端可以改变的



OAuth2.0

临时授权机制



Cookie和Session

HTTP 是无状态的,每次 HTTP 请求都是独立的

为了保持状态,在服务端保存 Session,在客户端(浏览器)保存 Cookies

浏览器每次请求附带上 Cookies,服务器通过识别 Cookies 鉴定出是哪个用户

Session 指有始有终的一系列动作,如打电话从拿起电话拨号到挂断电话这一过程可称为一个 Session

Session 在 Web 中用来存储用户属性及其配置信息




初试

GET 请求,模拟登录

import requestsr = requests.get('https://api.github.com/user', auth=('user', 'pass'))  # 模拟登录
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())




GET请求

import requestsr = requests.get('https://api.github.com/events')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())
# 200
# application/json; charset=utf-8
# utf-8
# ...




POST请求

import requestsr = requests.post('https://httpbin.org/post', data={'key': 'value'})
print(r.json())

使用文件流

import requestsfiles = {'file': open('test.txt', 'rb'),'key0': (None, 'value0'),'key1': (None, 'value1'),
}
response = requests.post('http://httpbin.org/post', files=files)
print(response.json())

import requests
from requests_toolbelt.multipart.encoder import MultipartEncoderdata = {'file': open('test.txt', 'rb'),'key0': 'value0','key1': 'value1',
}
response = requests.post('http://httpbin.org/post', data=MultipartEncoder(data))
print(response.json())




PUT请求

import requestsr = requests.put('https://httpbin.org/put', data={'key': 'value'})
print(r.json())




DELETE请求

import requestsr = requests.delete('https://httpbin.org/delete')
print(r.json())




HEAD请求

import requestsr = requests.head('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)




OPTIONS请求

import requestsr = requests.options('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)




传递查询参数

在 url 中传递查询参数,如 http://httpbin.org/get?key=val

params 参数

import requestspayload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)  # https://httpbin.org/get?key1=value1&key2=value2
print(r.json())payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)  # https://httpbin.org/get?key1=value1&key2=value2&key2=value3




响应内容

import requestsr = requests.get('https://api.github.com/events')
print(r.text)
print(r.json())
print(r.encoding)  # utf-8




自定义响应头

import requestsurl = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
print(r.json())




传递表单参数

data 参数

import requestspayload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post('https://httpbin.org/post', data=payload)
print(r.json())payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
r = requests.post('https://httpbin.org/post', data=payload_tuples)
print(r.json())payload_dict = {'key1': ['value1', 'value2']}
r = requests.post('https://httpbin.org/post', data=payload_dict)
print(r.json())

传递 JSON-Encoded 数据,这两种方法等价

import json
import requestspayload = {'key1': 'value1', 'key2': 'value2'}
url = 'https://api.github.com/some/endpoint'
r = requests.post(url, data=json.dumps(payload))
r = requests.post(url, json=payload)




传递文件

import requestswith open('1.txt', mode='w') as f:f.write('123')url = 'https://httpbin.org/post'
files = {'file': open('1.txt', 'rb')}
r = requests.post(url, files=files)
print(r.json())files = {'file': ('1.txt', open('1.txt', 'rb'), 'text/plain', {'Expires': '0'})}  # 设置filename、content_type、headers
r = requests.post(url, files=files)
print(r.json())files = {'file': ('1.csv', 'some,data,to,send\nanother,row,to,send\n')}  # 字符串作为文件
r = requests.post(url, files=files)
print(r.json())
  • 大文件用 requests-toolbelt
  • 建议用二进制形式传输




响应状态码

import requestsr = requests.get('https://httpbin.org/get')
print(r.status_code)  # 200
print(r.status_code == requests.codes.ok)  # True
r.raise_for_status()bad_r = requests.get('https://httpbin.org/status/404')
print(bad_r.status_code)  # 404
print(bad_r.status_code == requests.codes.not_found)  # True
try:bad_r.raise_for_status()
except Exception as e:print(e)  # 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404




响应头

import requestsr = requests.get('https://api.github.com/events')
print(r.headers)  # {'Server': 'GitHub.com', 'Date': 'Mon, 05 Sep 2022 10:35:42 GMT', ...}
print(r.headers['content-type'])  # application/json; charset=utf-8
print(r.headers.get('content-type'))  # application/json; charset=utf-8




Cookies

import requestsurl = 'https://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
print(r.json())
print(r.cookies)jar = requests.cookies.RequestsCookieJar()
jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
url = 'https://httpbin.org/cookies'
r = requests.get(url, cookies=jar)
print(r.json())
print(r.cookies)




重定向和历史记录

使用响应对象的属性 history 来追踪重定向

import requestsr = requests.get('http://github.com/')
print(r.url)  # 'https://github.com/'
print(r.status_code)  # 200
print(r.history)  # [<Response [301]>]r = requests.get('http://github.com/', allow_redirects=False)  # 禁用重定向
print(r.status_code)  # 301
print(r.history)  # []r = requests.head('http://github.com/', allow_redirects=True)
print(r.url)  # 'https://github.com/'
print(r.history)  # [<Response [301]>]




超时

参数 timeout 指定超时停止响应时间

import requeststry:requests.get('https://github.com/', timeout=0.001)
except Exception as e:print(e)




错误和异常

网络问题,如 DNS 失败,拒绝连接,会引发异常 ConnectionError

HTTP 请求返回不成功的状态码,Response.raise_for_status() 会引发异常 HTTPError

请求超时,会引发异常 Timeout

请求超过最大重定向数,会引发异常 TooManyRedirects

所有异常都继承 requests.RequestException




Session对象

  • Session 对象能跨请求持久化某些参数,如 Cookies
  • 如果向同一主机发出多个请求,重用底层 TCP 连接可以显著提高性能
  • Session 对象有以上所有 API 方法,还可以为请求提供默认数据
  • 即使使用 Session 对象,方法级参数也不会跨请求持久化
  • 手动添加 Cookies,使用 Session.cookies
  • Session 对象可以通过上下文管理器使用
  • 有时不需要 Session 对象的某参数,只需在方法级参数中将该键设为 None
import requests# Session对象能跨请求持久化某些参数,如Cookies
s = requests.Session()
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')
print(r.json())  # {'cookies': {'sessioncookie': '123456789'}}# Session对象有所有API方法,还可以为请求提供默认数据
s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
s.get('https://httpbin.org/headers', headers={'x-test2': 'true'})  # 同时发送x-test和x-test2# 即使使用Session对象,方法级参数也不会跨请求持久化
s = requests.Session()
r = s.get('https://httpbin.org/cookies', cookies={'from-my': 'browser'})
print(r.json())  # {'cookies': {'from-my': 'browser'}}
r = s.get('https://httpbin.org/cookies')
print(r.json())  # {'cookies': {}}# Session对象可以通过上下文管理器使用
with requests.Session() as s:s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')

维持 Session,相当于只开一个浏览器在请求

import requestswith requests.Session() as s:s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')r = s.get('http://httpbin.org/cookies')print(r.json())




请求和响应对象

调用 requests.get() 实际上在做两件事:

  1. 构造一个 Request 对象发送到服务器请求资源
  2. 一旦请求从服务器获得响应,生成一个 Response 对象

Response 对象有服务器返回的所有信息,还包含最初创建的 Request 对象

import requestsr = requests.get('https://en.wikipedia.org/wiki/Monty_Python')
print(r.headers)
print(r.request.headers)




预处理请求

无论怎样发起请求,实际使用的是 PreparedRequest

如果需要在发送请求前对请求体或头部做一些修改,见原文




SSL证书验证

类似浏览器验证 HTTPS 请求的 SSL 证书,如果无法验证将抛出 SSLError

  • 参数 verify 可指定 CA 证书
  • 受信任的 CA 列表也可以通过环境变量 REQUESTS_CA_BUNDLE 指定。如果没有设置 REQUESTS_CA_BUNDLECURL_CA_BUNDLE 会用于回调
  • 参数 verify 设为 False 则不进行 SSL 证书验证。但无论是否验证,都会接受服务器提供的 TLS 证书,并忽略和主机名不匹配或过期的证书,这样做可能会受到中间人(MitM)攻击
  • 参数 verify 默认为 True,验证仅适用于 host 证书
import requestsr = requests.get('https://requestb.in')
print(r.text)r = requests.get('https://github.com')
print(r.text)r = requests.get('https://github.com', verify='/path/to/certfile')
# 上下两种方式类似
s = requests.Session()
s.verify = '/path/to/certfile'r = requests.get('https://kennethreitz.org', verify=False)
print(r)  # <Response [200]>




客户端证书

指定本地证书为客户端证书,可以是单个文件(包含密钥和证书)或一个包含两个文件路径的元组

import requestsrequests.get('https://kennethreitz.org', cert=('/path/client.cert', '/path/client.key'))
# 或
s = requests.Session()
s.cert = '/path/client.cert'

本地证书对应的密钥必须为解密状态




CA证书

Requests 使用 certifiio 的证书,允许在不更新 Requests 版本的情况下更新其受信任的证书

在 2.16 版本之前,Requests 绑定了一组来自 Mozilla 的根 CA,每次 Requests 更新,证书也会更新

如果没有安装 certifiio,在使用较旧版本的 Requests 时,会出现非常过时的证书

出于安全考虑,建议频繁更新证书!




响应体工作流

  • 默认情况下,发出请求后,响应体会立即下载。可以改成访问 Response.content 时才下载响应体
  • 请求时设置 stream=True,连接不会释放,直到获取所有数据或调用Response.close(),这样可能效率低下,建议用上下文管理器
import requeststarball_url = 'https://github.com/psf/requests/tarball/main'
r = requests.get(tarball_url, stream=True)  # 此时只下载了响应头,仍然处于连接打开状态,可以进行有条件的内容检索TOO_LONG = 1024
if int(r.headers['content-length']) < TOO_LONG:content = r.content...with requests.get('https://httpbin.org/get', stream=True) as r:...




长连接

在 Session 中发出的请求都是长连接,且会自动重用合适的连接




流式上传

import requestswith open('massive-body', 'rb') as f:requests.post('http://some.url/streamed', data=f)




分块编码请求

import requestsdef gen():yield 'hi'yield 'there'requests.post('http://some.url/chunked', data=gen())




POST多个Multipart-Encoded文件

import requestsurl = 'https://httpbin.org/post'
multiple_files = [('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
r = requests.post(url, files=multiple_files)
print(r.text)




事件钩子

import requestsdef print_url(r, *args, **kwargs):print(r.url)def record_hook(r, *args, **kwargs):r.hook_called = Truereturn rr = requests.get('https://httpbin.org/', hooks={'response': print_url})
print(r)
# https://httpbin.org/
# <Response [200]>r = requests.get('https://httpbin.org/', hooks={'response': [print_url, record_hook]})
print(r.hook_called)
# https://httpbin.org/
# Trues = requests.Session()
s.hooks['response'].append(print_url)
print(s.get('https://httpbin.org/'))
# https://httpbin.org/
# <Response [200]>




自定义认证

import requests
from requests.auth import AuthBaseclass PizzaAuth(AuthBase):"""Attaches HTTP Pizza Authentication to the given Request object."""def __init__(self, username):# setup any auth-related data hereself.username = usernamedef __call__(self, r):# modify and return the requestr.headers['X-Pizza'] = self.usernamereturn rprint(requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth')))




流式请求

import json
import requestsr = requests.get('https://httpbin.org/stream/20', stream=True)for line in r.iter_lines():if line:decoded_line = line.decode('utf-8')print(json.loads(decoded_line))r = requests.get('https://httpbin.org/stream/20', stream=True)
if r.encoding is None:r.encoding = 'utf-8'
for line in r.iter_lines(decode_unicode=True):if line:print(json.loads(line))




代理

参数 proxies 配置代理

import requestsproxies = {'http': 'http://10.10.1.10:3128','https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)# 或为整个Session配置一次
proxies = {'http': 'http://10.10.1.10:3128','https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')

当代理配置没有覆盖每个请求时,检查 Requests 依赖的环境变量

export HTTP_PROXY="http://10.10.1.10:3128"
export HTTPS_PROXY="http://10.10.1.10:1080"
export ALL_PROXY="socks5://10.10.1.10:3434"




SOCKS协议




HTTP动词




自定义动词




Link头




转换适配器




OAuth认证

安装

pip install requests-oauthlib

代码

from requests_oauthlib import OAuth1Sessiontwitter = OAuth1Session('client_key',client_secret='client_secret',resource_owner_key='resource_owner_key',resource_owner_secret='resource_owner_secret')
url = 'https://api.twitter.com/1/account/settings.json'
r = twitter.get(url)




下载图片




取消参数转义

import requestsparams = {'username': 'abc','password': '%'
}
params = '&'.join('{}={}'.format(k, v) for k, v in params.items())
response = requests.get('https://httpbin.org/get', params=params)
print(response.json())




转curl

安装

pip install curlify

初试

import curlify
import requestsresponse = requests.get("http://google.ru")
print(curlify.to_curl(response.request))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' 'http://www.google.ru/'print(curlify.to_curl(response.request, compressed=True))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' --compressed 'http://www.google.ru/'




封装




参考文献

  1. Requests Documentation
  2. Requests-OAuth Documentation
  3. requests - 廖雪峰的官方网站
  4. RESTful API 设计指南 - 阮一峰的网络日志
  5. 52讲轻松搞定网络爬虫
  6. How to send a multipart/form-data with requests in python
  7. How to prevent python requests from percent encoding my URLs

热搜词