欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 文旅 > 文化 > 深度学习笔记24_天气预测

深度学习笔记24_天气预测

2024/11/30 3:15:51 来源:https://blog.csdn.net/L1073203482/article/details/143972703  浏览:    关键词:深度学习笔记24_天气预测
  •   🍨 本文为🔗365天深度学习训练营 中的学习记录博客
  • 🍖 原作者:K同学啊 | 接辅导、项目定制

一、我的环境

1.语言环境:Python 3.9

2.编译器:Pycharm

3.深度学习环境:TensorFlow 2.10.0

二、GPU设置

       若使用的是cpu则可忽略

import osimport matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from sklearn import metrics
from sklearn.preprocessing import MinMaxScaleros.environ['KMP_DUPLICATE_LIB_OK'] = 'True'gpus = tf.config.list_physical_devices("GPU")
if gpus:gpu0 = gpus[0]  # 如果有多个GPU,仅使用tf.config.experimental.set_memory_growth(gpu0, True)  # 设置GPU显存用量按需使tf.config.set_visible_devices([gpu0], "GPU")

、导入数据

data = pd.read_csv("data/weather.csv")
df = data.copy()
print(data.head())

 运行结果:

         Date Location  MinTemp  ...  Temp3pm  RainToday  RainTomorrow
0  2008-12-01   Albury     13.4  ...     21.8         No            No
1  2008-12-02   Albury      7.4  ...     24.3         No            No
2  2008-12-03   Albury     12.9  ...     23.2         No            No
3  2008-12-04   Albury      9.2  ...     26.5         No            No
4  2008-12-05   Albury     17.5  ...     29.7         No            No[5 rows x 23 columns]
print(data.describe())

 运行结果:

             MinTemp        MaxTemp  ...        Temp9am       Temp3pm
count  143975.000000  144199.000000  ...  143693.000000  141851.00000
mean       12.194034      23.221348  ...      16.990631      21.68339
std         6.398495       7.119049  ...       6.488753       6.93665
min        -8.500000      -4.800000  ...      -7.200000      -5.40000
25%         7.600000      17.900000  ...      12.300000      16.60000
50%        12.000000      22.600000  ...      16.700000      21.10000
75%        16.900000      28.200000  ...      21.600000      26.40000
max        33.900000      48.100000  ...      40.200000      46.70000[8 rows x 16 columns]
print(data.dtypes)

 运行结果:

Date              object
Location          object
MinTemp          float64
MaxTemp          float64
Rainfall         float64
Evaporation      float64
Sunshine         float64
WindGustDir       object
WindGustSpeed    float64
WindDir9am        object
WindDir3pm        object
WindSpeed9am     float64
WindSpeed3pm     float64
Humidity9am      float64
Humidity3pm      float64
Pressure9am      float64
Pressure3pm      float64
Cloud9am         float64
Cloud3pm         float64
Temp9am          float64
Temp3pm          float64
RainToday         object
RainTomorrow      object
dtype: object
# 将数据转换为日期时间格式
data['Date'] = pd.to_datetime(data['Date'])
data['year'] = data['Date'].dt.year
data['Month'] = data['Date'].dt.month
data['day'] = data['Date'].dt.day
print(data.head())

运行结果:

        Date Location  MinTemp  MaxTemp  ...  RainTomorrow  year  Month day
0 2008-12-01   Albury     13.4     22.9  ...            No  2008     12   1
1 2008-12-02   Albury      7.4     25.1  ...            No  2008     12   2
2 2008-12-03   Albury     12.9     25.7  ...            No  2008     12   3
3 2008-12-04   Albury      9.2     28.0  ...            No  2008     12   4
4 2008-12-05   Albury     17.5     32.3  ...            No  2008     12   5[5 rows x 26 columns]
data.drop('Date', axis=1, inplace=True)
print(data.columns)

 运行结果:

Index(['Location', 'MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation', 'Sunshine','WindGustDir', 'WindGustSpeed', 'WindDir9am', 'WindDir3pm','WindSpeed9am', 'WindSpeed3pm', 'Humidity9am', 'Humidity3pm','Pressure9am', 'Pressure3pm', 'Cloud9am', 'Cloud3pm', 'Temp9am','Temp3pm', 'RainToday', 'RainTomorrow', 'year', 'Month', 'day'],dtype='object')

、数据分析

plt.figure(figsize=(10, 8))
# data.corr()表示了data中的两个变量之间的相关性
numeric_data = data.select_dtypes(include=[np.number])
ax = sns.heatmap(numeric_data.corr(), square=True, annot=True, fmt='.2f')
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)

运行结果:

​是否下雨

# 设置样式和调色板
sns.set(style="whitegrid", palette="Set2")
# 创建一个 1 行 2 列的图像布局
fig, axes = plt.subplots(1, 2, figsize=(10, 4))  # 图形尺寸调大 (10, 4)
# 图表标题样式
title_font = {'fontsize': 14, 'fontweight': 'bold', 'color': 'darkblue'}
# 第一张图:RainTomorrow
sns.countplot(x='RainTomorrow', data=data, ax=axes[0], edgecolor='black')  #
axes[0].set_title('Rain Tomorrow', fontdict=title_font)  # 设置标题
axes[0].set_xlabel('Will it Rain Tomorrow?', fontsize=12)  # X轴标签
axes[0].set_ylabel('Count', fontsize=12)  # Y轴标签
axes[0].tick_params(axis='x', labelsize=11)  # X轴刻度字体大小
axes[0].tick_params(axis='y', labelsize=11)  # Y轴刻度字体大小
# 第二张图:RainToday
sns.countplot(x='RainToday', data=data, ax=axes[1], edgecolor='black')  # 添加
axes[1].set_title('Rain Today', fontdict=title_font)  # 设置标题
axes[1].set_xlabel('Did it Rain Today?', fontsize=12)  # X轴标签
axes[1].set_ylabel('Count', fontsize=12)  # Y轴标签
axes[1].tick_params(axis='x', labelsize=11)  # X轴刻度字体大小
axes[1].tick_params(axis='y', labelsize=11)  # Y轴刻度字体大小
sns.despine()  # 去除图表顶部和右侧的边框
plt.tight_layout()  # 调整布局,避免图形之间的重叠
plt.savefig("02.png")
plt.show()

运行结果:

 

x = pd.crosstab(data['RainTomorrow'], data['RainToday'])
print(x)

 运行结果:

RainToday	No	Yes
RainTomorrow		
No	92728	16858
Yes	16604	14597
y = x / x.transpose().sum().values.reshape(2, 1) * 100
print(y)
 运行结果:
RainToday	No	Yes
RainTomorrow		
No	84.616648	15.383352
Yes	53.216243	46.783757
y.plot(kind="bar", figsize=(4, 3), color=['#006666', '#d279a6']);

地理位置与下雨关系:

x = pd.crosstab(data['Location'], data['RainToday'])
# 获取每个城市下雨天数和非下雨天数的百分比
y = x / x.transpose().sum().values.reshape((-1, 1)) * 100
# 按每个城市的雨天百分比排序
y = y.sort_values(by='Yes', ascending=True)
color = ['#cc6699', '#006699', '#006666', '#862d86', '#ff9966']
y.Yes.plot(kind="barh", figsize=(15, 20), color=color)

湿度和压力对下雨的影响:

data.columns
plt.figure(figsize=(8,6))
sns.scatterplot(data=data,x='Pressure9am',y='Pressure3pm',hue='RainTomorrow');
plt.savefig("04.png")
plt.show()

plt.figure(figsize=(8,6))
sns.scatterplot(data=data,x='Humidity9am',y='Humidity3pm',hue='RainTomorrow');
plt.savefig("05.png")
plt.show()

 气温对下雨的影响:

plt.figure(figsize=(8,6))
sns.scatterplot(x='MaxTemp', y='MinTemp',data=data, hue='RainTomorrow');
plt.savefig("06.png")
plt.show()

、数据预处理

# 每列中缺失数据的百分比
data.isnull().sum()/data.shape[0]*100

 运行结果:

Location          0.000000
MinTemp           1.020899
MaxTemp           0.866905
Rainfall          2.241853
Evaporation      43.166506
Sunshine         48.009762
WindGustDir       7.098859
WindGustSpeed     7.055548
WindDir9am        7.263853
WindDir3pm        2.906641
WindSpeed9am      1.214767
WindSpeed3pm      2.105046
Humidity9am       1.824557
Humidity3pm       3.098446
Pressure9am      10.356799
Pressure3pm      10.331363
Cloud9am         38.421559
Cloud3pm         40.807095
Temp9am           1.214767
Temp3pm           2.481094
RainToday         2.241853
RainTomorrow      2.245978
year              0.000000
Month             0.000000
day               0.000000
dtype: float64
# 在该列中随机选择数进行填充
lst=['Evaporation','Sunshine','Cloud9am','Cloud3pm']
for col in lst:fill_list = data[col].dropna()data[col] = data[col].fillna(pd.Series(np.random.choice(fill_list, size=len(data.index))))
s = (data.dtypes == "object")
object_cols = list(s[s].index)
object_cols
['Location','WindGustDir','WindDir9am','WindDir3pm','RainToday','RainTomorrow']
# inplace=True:直接修改原对象,不创建副本
# data[i].mode()[0] 返回频率出现最高的选项,众数
for i in object_cols:data[i].fillna(data[i].mode()[0], inplace=True)
t = (data.dtypes == "float64")
num_cols = list(t[t].index)
num_cols
['MinTemp','MaxTemp','Rainfall','Evaporation','Sunshine','WindGustSpeed','WindSpeed9am','WindSpeed3pm','Humidity9am','Humidity3pm','Pressure9am','Pressure3pm','Cloud9am','Cloud3pm','Temp9am','Temp3pm']
# .median(), 中位数
for i in num_cols:data[i].fillna(data[i].median(), inplace=True)
data.isnull().sum()
Location         0
MinTemp          0
MaxTemp          0
Rainfall         0
Evaporation      0
Sunshine         0
WindGustDir      0
WindGustSpeed    0
WindDir9am       0
WindDir3pm       0
WindSpeed9am     0
WindSpeed3pm     0
Humidity9am      0
Humidity3pm      0
Pressure9am      0
Pressure3pm      0
Cloud9am         0
Cloud3pm         0
Temp9am          0
Temp3pm          0
RainToday        0
RainTomorrow     0
year             0
Month            0
day              0
dtype: int64

六、构建数据集

label_encoder = LabelEncoder()
for i in object_cols:data[i] = label_encoder.fit_transform(data[i])X = data.drop(['RainTomorrow', 'day'], axis=1).values
y = data['RainTomorrow'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=101)
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

七、预测是否会下雨

model = Sequential()
model.add(Dense(units=24, activation='tanh', ))
model.add(Dense(units=18, activation='tanh'))
model.add(Dense(units=23, activation='tanh'))
model.add(Dropout(0.5))
model.add(Dense(units=12, activation='tanh'))
model.add(Dropout(0.2))
model.add(Dense(units=1, activation='sigmoid'))
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)
model.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=["accuracy"])
early_stop = EarlyStopping(monitor='val_loss',mode='min',min_delta=0.001,verbose=1,patience=25,restore_best_weights=True)

八、模型训练

model.fit(x=X_train,y=y_train,validation_data=(X_test, y_test), verbose=1,callbacks=[early_stop],epochs=10,batch_size=32)

Epoch 1/10
3410/3410 [==============================] - 8s 2ms/step - loss: 0.4558 - accuracy: 0.8031 - val_loss: 0.3886 - val_accuracy: 0.8328
Epoch 2/10
3410/3410 [==============================] - 7s 2ms/step - loss: 0.3971 - accuracy: 0.8324 - val_loss: 0.3785 - val_accuracy: 0.8374
Epoch 3/10
3410/3410 [==============================] - 16s 5ms/step - loss: 0.3896 - accuracy: 0.8355 - val_loss: 0.3757 - val_accuracy: 0.8382
Epoch 4/10
3410/3410 [==============================] - 15s 5ms/step - loss: 0.3859 - accuracy: 0.8371 - val_loss: 0.3732 - val_accuracy: 0.8389
Epoch 5/10
3410/3410 [==============================] - 15s 5ms/step - loss: 0.3837 - accuracy: 0.8376 - val_loss: 0.3720 - val_accuracy: 0.8389
Epoch 6/10
3410/3410 [==============================] - 15s 4ms/step - loss: 0.3816 - accuracy: 0.8381 - val_loss: 0.3712 - val_accuracy: 0.8394
Epoch 7/10
3410/3410 [==============================] - 15s 5ms/step - loss: 0.3798 - accuracy: 0.8391 - val_loss: 0.3723 - val_accuracy: 0.8379
Epoch 8/10
3410/3410 [==============================] - 15s 4ms/step - loss: 0.3791 - accuracy: 0.8398 - val_loss: 0.3701 - val_accuracy: 0.8392
Epoch 9/10
3410/3410 [==============================] - 15s 5ms/step - loss: 0.3782 - accuracy: 0.8391 - val_loss: 0.3706 - val_accuracy: 0.8401
Epoch 10/10
3410/3410 [==============================] - 15s 4ms/step - loss: 0.3778 - accuracy: 0.8389 - val_loss: 0.3693 - val_accuracy: 0.8397

九、结果可视化

acc = model.history.history['accuracy']
val_acc = model.history.history['val_accuracy']
loss = model.history.history['loss']
val_loss = model.history.history['val_loss']
epochs_range = range(10)
plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy', color="#c94733")
plt.plot(epochs_range, val_acc, label='Validation Accuracy', color="#3fab47")
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.grid(False)
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss', color="#c94733")
plt.plot(epochs_range, val_loss, label='Validation Loss', color="#3fab47")
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.grid(False)plt.show()

十、总结

        这周学习天气预测,其中主要包括EDA(Exploratory Data Analysis)探索性数据分析,使用EDA的好处有:

  • 可以有效发现变量类型、分布趋势、缺失值、异常值等。
  • 缺失值处理:(i)删除缺失值较多的列,通常缺失超过50%的列需要删除;(ii)缺失值填充。对于离散特征,通常将NAN单独作为一个类别;对于连续特征,通常使用均值、中值、0或机器学习算法进行填充。具体填充方法因业务的不同而不同。
  • 异常值处理(主要针对连续特征)。如:Winsorizer方法处理。
  • 类别合并(主要针对离散特征)。如果某个取值对应的样本个数太少,就需要将该取值与其他值合并。因为样本过少会使数据的稳定性变差,且不具有统计意义,可能导致结论错误。由于展示空间有限,通常选择取值个数最少或最多的多个取值进行展示。
  • ​删除取值单一的列。
  • ​删除最大类别取值数量占比超过阈值的列。

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com