在 PyTorch 中实现模型训练看板(Dashboard)可以帮助你实时监控训练过程中的关键指标(如损失、准确率、学习率等)。常用的工具包括 TensorBoard、Weights & Biases (W&B) 和 Matplotlib。以下是使用这些工具实现训练看板的详细方法。
1. 使用 TensorBoard
TensorBoard 是 PyTorch 官方推荐的训练可视化工具,支持实时监控训练过程中的各种指标。
安装 TensorBoard
pip install tensorboard
在 PyTorch 中使用 TensorBoard
以下是一个简单的示例,展示如何在训练过程中记录损失和准确率。
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter
from torchvision import datasets, transforms# 定义模型
class SimpleModel(nn.Module):def __init__(self):super(SimpleModel, self).__init__()self.fc = nn.Linear(28 * 28, 10)def forward(self, x):return self.fc(x.view(x.size(0), -1))# 初始化模型、损失函数和优化器
model = SimpleModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)# 加载数据集
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)# 初始化 TensorBoard
writer = SummaryWriter('runs/experiment_1')# 训练循环
for epoch in range(5):running_loss = 0.0correct = 0total = 0for i, (inputs, labels) in enumerate(train_loader):optimizer.zero_grad()outputs = model(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# 记录损失running_loss += loss.item()_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()if i % 100 == 99: # 每 100 个 batch 记录一次avg_loss = running_loss / 100accuracy = 100 * correct / totalprint(f'Epoch {epoch + 1}, Batch {i + 1}: Loss={avg_loss:.4f}, Accuracy={accuracy:.2f}%')# 将损失和准确率写入 TensorBoardwriter.add_scalar('Training Loss', avg_loss, epoch * len(train_loader) + i)writer.add_scalar('Training Accuracy', accuracy, epoch * len(train_loader) + i)running_loss = 0.0correct = 0total = 0# 关闭 TensorBoard
writer.close()
启动 TensorBoard
在终端运行以下命令启动 TensorBoard:
tensorboard --logdir=runs
然后在浏览器中打开 http://localhost:6006
查看训练看板。
2. 使用 Weights & Biases (W&B)
Weights & Biases 是一个强大的实验跟踪工具,支持实时监控训练过程,并且可以与其他团队成员共享结果。
安装 W&B
pip install wandb
在 PyTorch 中使用 W&B
以下是一个简单的示例:
import wandb
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms# 初始化 W&B
wandb.init(project="pytorch-dashboard-example", config={"learning_rate": 0.01,"epochs": 5,"batch_size": 64
})# 定义模型
class SimpleModel(nn.Module):def __init__(self):super(SimpleModel, self).__init__()self.fc = nn.Linear(28 * 28, 10)def forward(self, x):return self.fc(x.view(x.size(0), -1))# 初始化模型、损失函数和优化器
model = SimpleModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=wandb.config.learning_rate)# 加载数据集
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=wandb.config.batch_size, shuffle=True)# 训练循环
for epoch in range(wandb.config.epochs):running_loss = 0.0correct = 0total = 0for i, (inputs, labels) in enumerate(train_loader):optimizer.zero_grad()outputs = model(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# 记录损失和准确率running_loss += loss.item()_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()if i % 100 == 99: # 每 100 个 batch 记录一次avg_loss = running_loss / 100accuracy = 100 * correct / totalprint(f'Epoch {epoch + 1}, Batch {i + 1}: Loss={avg_loss:.4f}, Accuracy={accuracy:.2f}%')# 将指标记录到 W&Bwandb.log({"Epoch": epoch + 1,"Batch": i + 1,"Training Loss": avg_loss,"Training Accuracy": accuracy})running_loss = 0.0correct = 0total = 0# 完成训练
wandb.finish()
查看 W&B 看板
运行代码后,可以在 W&B 的网页端查看实时训练看板。
3. 使用 Matplotlib
如果你需要一个简单的本地可视化工具,可以使用 Matplotlib 绘制训练过程中的指标。
安装 Matplotlib
pip install matplotlib
在 PyTorch 中使用 Matplotlib
以下是一个简单的示例:
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms# 定义模型
class SimpleModel(nn.Module):def __init__(self):super(SimpleModel, self).__init__()self.fc = nn.Linear(28 * 28, 10)def forward(self, x):return self.fc(x.view(x.size(0), -1))# 初始化模型、损失函数和优化器
model = SimpleModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)# 加载数据集
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)# 记录训练过程中的指标
losses = []
accuracies = []# 训练循环
for epoch in range(5):running_loss = 0.0correct = 0total = 0for i, (inputs, labels) in enumerate(train_loader):optimizer.zero_grad()outputs = model(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# 记录损失和准确率running_loss += loss.item()_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()if i % 100 == 99: # 每 100 个 batch 记录一次avg_loss = running_loss / 100accuracy = 100 * correct / totalprint(f'Epoch {epoch + 1}, Batch {i + 1}: Loss={avg_loss:.4f}, Accuracy={accuracy:.2f}%')losses.append(avg_loss)accuracies.append(accuracy)running_loss = 0.0correct = 0total = 0# 绘制损失和准确率曲线
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(losses, label='Training Loss')
plt.xlabel('Batch')
plt.ylabel('Loss')
plt.legend()plt.subplot(1, 2, 2)
plt.plot(accuracies, label='Training Accuracy')
plt.xlabel('Batch')
plt.ylabel('Accuracy')
plt.legend()plt.show()
总结
- TensorBoard:适合本地实时监控,功能强大,支持多种可视化。
- Weights & Biases:适合团队协作和实验跟踪,支持云端存储和共享。
- Matplotlib:适合简单的本地可视化,无需额外依赖。
根据你的需求选择合适的工具来实现模型训练看板。