爬虫代码中如何有效处理异常和错误：策略与代码示例

2025/10/29 14:52:13 来源：https://blog.csdn.net/2401_87849308/article/details/144023109 浏览: 次关键词：爬虫代码中如何有效处理异常和错误：策略与代码示例

在Python爬虫的开发过程中，异常和错误的处理是确保爬虫稳定性和可靠性的关键。本文将探讨几种常见的异常处理策略，并提供相应的代码示例，帮助开发者构建更加健壮的爬虫程序。

1. 异常捕获

异常捕获是最基本的错误处理方式，通过try-except语句捕获可能引发异常的代码块，并在except块中处理异常。这样可以防止程序因异常而崩溃，并提供适当的错误处理。

import requeststry:response = requests.get('http://www.example.com')# 对响应进行处理...
except requests.RequestException as e:print('请求出错：', str(e))

2. 容错机制设计

当爬虫遇到异常时，我们需要有一种容错机制来处理异常情况，以保证程序的正常运行。

2.1 重试机制

当遇到网络异常或超时时，我们可以设置重试机制，让爬虫重新尝试获取数据。可以设置最大重试次数和重试间隔时间，在一定次数的重试后，如果仍然无法成功获取数据，可以选择跳过该URL，继续处理下一个请求。

import timedef fetch_url(url, max_retries=3):for i in range(max_retries):try:response = requests.get(url)if response.status_code == 200:return response.textexcept requests.exceptions.RequestException as e:print(f"Request failed, retrying... ({i+1}/{max_retries})")time.sleep(1)  # 等待1秒后重试return None

3. 网络请求优化

为了提高爬取速度和成功率，我们可以采取一些优化网络请求的策略。比如，使用连接池、设置请求头和代理等。

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retrysession = requests.Session()
retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[500, 502, 503, 504])
session.mount('http://', HTTPAdapter(max_retries=retries))
session.mount('https://', HTTPAdapter(max_retries=retries))headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = session.get('http://www.example.com', headers=headers, timeout=10)

4. 日志记录

使用日志记录工具（如Python的logging模块）记录错误和异常信息。这样可以方便地查看和分析程序的运行情况，并帮助调试和排查问题。

import logginglogging.basicConfig(filename="error.log", level=logging.ERROR)
try:# 可能引发异常的代码块...
except Exception as e:logging.error("发生异常: %s", str(e))

5. 动态调整XPath或CSS选择器

针对不同HTML结构设计备选方案，增加容错机制，使用try-except捕获异常。

from bs4 import BeautifulSouphtml = "<div class='product'>Price: $100</div>"
soup = BeautifulSoup(html, "html.parser")
try:price = soup.find("span", class_="price").text
except AttributeError:price = "N/A"
print(price)