【Python项目实战】爬取中国天气网天气数据

1. 引言

在日常生活中，我们经常需要获取实时的天气数据。中国天气网www.weather.com.cn提供了较为丰富的天气数据资源，同时爬取不设过多限制，对新手友好。

代码资源：https://download.csdn.net/download/weixin_74773078/90274520

（有个性化程序定制需求可私信作者）

2. 准备工作

在开始之前，我们需要安装以下Python库：

requests：用于发送HTTP请求。
beautifulsoup4：用于解析HTML内容。

可以通过以下命令安装这些库：

pip install requests beautifulsoup4

3. 爬虫代码实现

以下是完整的爬虫代码，用于爬取中国天气网上的天气数据。

import requests
from bs4 import BeautifulSoup# 设置请求头，模拟浏览器请求
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}# 目标城市天气页面的URL（以北京为例）
city_code = "101010100"  # 北京的城市代码
url = f"http://www.weather.com.cn/weather/{city_code}.shtml"# 发送HTTP GET请求
response = requests.get(url, headers=headers)# 检查请求是否成功
if response.status_code == 200:print("成功获取天气数据")# 使用BeautifulSoup解析HTML内容soup = BeautifulSoup(response.content, 'html.parser')# 查找天气数据所在的HTML元素weather_data = soup.find('ul', class_='t clearfix')if weather_data:# 提取每天的天气信息for day in weather_data.find_all('li'):# 提取日期date = day.find('h1').text.strip() if day.find('h1') else "未知日期"# 提取天气状况weather = day.find('p', class_='wea').text.strip() if day.find('p', class_='wea') else "未知天气"# 提取温度temp = day.find('p', class_='tem')if temp:high_temp = temp.find('span').text.strip() if temp.find('span') else "未知高温"low_temp = temp.find('i').text.strip() if temp.find('i') else "未知低温"else:high_temp = "未知高温"low_temp = "未知低温"# 提取风力wind = day.find('p', class_='win')wind_level = wind.find('i').text.strip() if wind and wind.find('i') else "未知风力"# 打印天气信息print(f"日期: {date}")print(f"天气: {weather}")print(f"温度: {low_temp} ~ {high_temp}")print(f"风力: {wind_level}")print("-" * 30)else:print("未找到天气数据")
else:print(f"请求失败，状态码: {response.status_code}")

4. 代码解析

4.1 设置请求头

为了模拟浏览器请求，我们设置了User-Agent请求头。这可以避免被网站的反爬虫机制拦截。

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

4.2 发送HTTP请求

使用requests.get()方法发送HTTP GET请求，并传入目标URL和请求头。

response = requests.get(url, headers=headers)

4.3 解析HTML内容

使用BeautifulSoup解析返回的HTML内容，并查找包含天气数据的HTML元素。

soup = BeautifulSoup(response.content, 'html.parser')
weather_data = soup.find('ul', class_='t clearfix')

4.4 提取天气信息

通过遍历<li>标签，提取每天的天气信息，包括日期、天气状况、温度和风力。

for day in weather_data.find_all('li'):date = day.find('h1').text.strip() if day.find('h1') else "未知日期"weather = day.find('p', class_='wea').text.strip() if day.find('p', class_='wea') else "未知天气"temp = day.find('p', class_='tem')if temp:high_temp = temp.find('span').text.strip() if temp.find('span') else "未知高温"low_temp = temp.find('i').text.strip() if temp.find('i') else "未知低温"else:high_temp = "未知高温"low_temp = "未知低温"wind = day.find('p', class_='win')wind_level = wind.find('i').text.strip() if wind and wind.find('i') else "未知风力"

4.5 打印天气信息

将提取的天气信息打印出来。

print(f"日期: {date}")
print(f"天气: {weather}")
print(f"温度: {low_temp} ~ {high_temp}")
print(f"风力: {wind_level}")
print("-" * 30)

5. 运行结果

运行上述代码后，程序会输出目标城市（如北京）未来几天的天气信息，包括日期、天气状况、温度和风力等级。例如：

日期: 10日（今天）
天气: 晴
温度: 10℃ ~ 22℃
风力: 3-4级
------------------------------
日期: 11日（明天）
天气: 多云
温度: 12℃ ~ 24℃
风力: 微风
------------------------------
日期: 12日（后天）
天气: 阴
温度: 14℃ ~ 20℃
风力: 3-4级
------------------------------