!pip install torchprofile 1>/dev/null
👩‍💻 Lab 0
Lab 0 PyTorch Tutorial
In this tutorial, we will explore how to train a neural network with PyTorch.
Setup
We will first install a few packages that will be used in this tutorial:
We will then import a few libraries:
import random
from collections import OrderedDict, defaultdict
import numpy as np
import torch
from matplotlib import pyplot as plt
from torch import nn
from torch.optim import *
from torch.optim.lr_scheduler import *
from torch.utils.data import DataLoader
from torchprofile import profile_macs
from torchvision.datasets import *
from torchvision.transforms import *
from tqdm.auto import tqdm
To ensure the reproducibility, we will control the seed of random generators:
0)
random.seed(0)
np.random.seed(0) torch.manual_seed(
Data
In this tutorial, we will use CIFAR-10 as our target dataset. This dataset contains images from 10 classes, where each image is of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.
= {
transforms "train": Compose([
32, padding=4),
RandomCrop(
RandomHorizontalFlip(),
ToTensor(),
]),"test": ToTensor(),
}
= {}
dataset for split in ["train", "test"]:
= CIFAR10(
dataset[split] ="data/cifar10",
root=(split == "train"),
train=True,
download=transforms[split],
transform )
We can visualize a few images in the dataset and their corresponding class labels:
= [[] for _ in range(10)]
samples for image, label in dataset["test"]:
if len(samples[label]) < 4:
samples[label].append(image)
=(20, 9))
plt.figure(figsizefor index in range(40):
= index % 10
label = samples[label][index // 10]
image
# Convert from CHW to HWC for visualization
= image.permute(1, 2, 0)
image
# Convert from class index to class name
= dataset["test"].classes[label]
label
# Visualize the image
4, 10, index + 1)
plt.subplot(
plt.imshow(image)
plt.title(label)"off")
plt.axis( plt.show()
To train a neural network, we will need to feed data in batches. We create data loaders with batch size of 512:
= {}
dataflow for split in ['train', 'test']:
= DataLoader(
dataflow[split]
dataset[split],=512,
batch_size=(split == 'train'),
shuffle=0,
num_workers=True,
pin_memory )
We can print the data type and shape from the training data loader:
for inputs, targets in dataflow["train"]:
print("[inputs] dtype: {}, shape: {}".format(inputs.dtype, inputs.shape))
print("[targets] dtype: {}, shape: {}".format(targets.dtype, targets.shape))
break
Model
In this tutorial, we will use a variant of VGG-11 (with fewer downsamples and a smaller classifier) as our model.
class VGG(nn.Module):
= [64, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M']
ARCH
def __init__(self) -> None:
super().__init__()
= []
layers = defaultdict(int)
counts
def add(name: str, layer: nn.Module) -> None:
f"{name}{counts[name]}", layer))
layers.append((+= 1
counts[name]
= 3
in_channels for x in self.ARCH:
if x != 'M':
# conv-bn-relu
"conv", nn.Conv2d(in_channels, x, 3, padding=1, bias=False))
add("bn", nn.BatchNorm2d(x))
add("relu", nn.ReLU(True))
add(= x
in_channels else:
# maxpool
"pool", nn.MaxPool2d(2))
add(
self.backbone = nn.Sequential(OrderedDict(layers))
self.classifier = nn.Linear(512, 10)
def forward(self, x: torch.Tensor) -> torch.Tensor:
# backbone: [N, 3, 32, 32] => [N, 512, 2, 2]
= self.backbone(x)
x
# avgpool: [N, 512, 2, 2] => [N, 512]
= x.mean([2, 3])
x
# classifier: [N, 512] => [N, 10]
= self.classifier(x)
x return x
= VGG().cuda() model
Its backbone is composed of eight conv-bn-relu
blocks interleaved with four maxpool
’s to downsample the feature map by 2^4 = 16 times:
print(model.backbone)
After the feature map is pooled, its classifier predicts the final output with a linear layer:
print(model.classifier)
As this course focuses on efficiency, we will then inspect its model size and (theoretical) computation cost.
- The model size can be estimated by the number of trainable parameters:
= 0
num_params for param in model.parameters():
if param.requires_grad:
+= param.numel()
num_params print("#Params:", num_params)
- The computation cost can be estimated by the number of multiply–accumulate operations (MACs) using TorchProfile:
= profile_macs(model, torch.zeros(1, 3, 32, 32).cuda())
num_macs print("#MACs:", num_macs)
This model has 9.2M parameters and requires 606M MACs for inference. We will work together in the next few labs to improve its efficiency.
Optimization
As we are working on a classification problem, we will apply cross entropy as our loss function to optimize the model:
= nn.CrossEntropyLoss() criterion
Optimization will be carried out using stochastic gradient descent (SGD) with momentum:
= SGD(
optimizer
model.parameters(),=0.4,
lr=0.9,
momentum=5e-4,
weight_decay )
The learning rate will be modulated using the following scheduler (which is adapted from this blog series):
= 20
num_epochs = len(dataflow["train"])
steps_per_epoch
# Define the piecewise linear scheduler
= lambda step: np.interp(
lr_lambda / steps_per_epoch],
[step 0, num_epochs * 0.3, num_epochs],
[0, 1, 0]
[0]
)[
# Visualize the learning rate schedule
= np.arange(steps_per_epoch * num_epochs)
steps * 0.4 for step in steps])
plt.plot(steps, [lr_lambda(step) "Number of Steps")
plt.xlabel("Learning Rate")
plt.ylabel("on")
plt.grid(
plt.show()
= LambdaLR(optimizer, lr_lambda) scheduler
Training
We first define the training function that optimizes the model for one epoch (i.e., a pass over the training set):
def train(
model: nn.Module,
dataflow: DataLoader,
criterion: nn.Module,
optimizer: Optimizer,
scheduler: LambdaLR,-> None:
)
model.train()
for inputs, targets in tqdm(dataflow, desc='train', leave=False):
# Move the data from CPU to GPU
= inputs.cuda()
inputs = targets.cuda()
targets
# Reset the gradients (from the last iteration)
optimizer.zero_grad()
# Forward inference
= model(inputs)
outputs = criterion(outputs, targets)
loss
# Backward propagation
loss.backward()
# Update optimizer and LR scheduler
optimizer.step() scheduler.step()
We then define the evaluation function that calculates the metric (i.e., accuracy in our case) on the test set:
@torch.inference_mode()
def evaluate(
model: nn.Module,
dataflow: DataLoader-> float:
) eval()
model.
= 0
num_samples = 0
num_correct
for inputs, targets in tqdm(dataflow, desc="eval", leave=False):
# Move the data from CPU to GPU
= inputs.cuda()
inputs = targets.cuda()
targets
# Inference
= model(inputs)
outputs
# Convert logits to class indices
= outputs.argmax(dim=1)
outputs
# Update metrics
+= targets.size(0)
num_samples += (outputs == targets).sum()
num_correct
return (num_correct / num_samples * 100).item()
With training and evaluation functions, we can finally start training the model! This will take around 10 minutes.
for epoch_num in tqdm(range(1, num_epochs + 1)):
"train"], criterion, optimizer, scheduler)
train(model, dataflow[= evaluate(model, dataflow["test"])
metric print(f"epoch {epoch_num}:", metric)
If everything goes well, your trained model should be able to achieve >92.5% of accuracy!
Visualization
We can visualize the model’s prediction to see how the model truly performs:
=(20, 10))
plt.figure(figsizefor index in range(40):
= dataset["test"][index]
image, label
# Model inference
eval()
model.with torch.inference_mode():
= model(image.unsqueeze(dim=0).cuda())
pred = pred.argmax(dim=1)
pred
# Convert from CHW to HWC for visualization
= image.permute(1, 2, 0)
image
# Convert from class indices to class names
= dataset["test"].classes[pred]
pred = dataset["test"].classes[label]
label
# Visualize the image
4, 10, index + 1)
plt.subplot(
plt.imshow(image)f"pred: {pred}" + "\n" + f"label: {label}")
plt.title("off")
plt.axis( plt.show()