def plot_weight_distribution(model, bins=256, count_nonzero_only=False):
= plt.subplots(3,3, figsize=(10, 6))
fig, axes = axes.ravel()
axes = 0
plot_index for name, param in model.named_parameters():
if param.dim() > 1:
= axes[plot_index]
ax if count_nonzero_only:
= param.detach().view(-1).cpu()
param_cpu = param_cpu[param_cpu != 0].view(-1)
param_cpu =bins, density=True,
ax.hist(param_cpu, bins= 'blue', alpha = 0.5)
color else:
-1).cpu(), bins=bins, density=True,
ax.hist(param.detach().view(= 'blue', alpha = 0.5)
color
ax.set_xlabel(name)'density')
ax.set_ylabel(+= 1
plot_index 'Histogram of Weights')
fig.suptitle(
fig.tight_layout()=0.925)
fig.subplots_adjust(top
plt.show()
plot_weight_distribution(model)
๐ฉโ๐ป Lab 1
Lab 1 Pruning
์ด๋ฒ Lab 1 Pruning์ ๋ค์๊ณผ ๊ฐ์ ๋ชฉํ์ ๋ด์ฉ์ผ๋ก ๊ตฌ์ฑ๋์ด ์์ต๋๋ค. ์๋ ๋ฒํผ์ ๋๋ฌ Colaboratory์์ ๋ฐ๋ก ์คํํ ์ ์์ต๋๋ค.
Goals
- pruning์ ๊ธฐ๋ณธ ๊ฐ๋ ์ ์ดํดํฉ๋๋ค.
- fine-grained pruning์ ๊ตฌํํ๊ณ ์ ์ฉํฉ๋๋ค.
- channel pruning์ ๊ตฌํํ๊ณ ์ ์ฉํฉ๋๋ค.
- pruning์ผ๋ก๋ถํฐ์ ์ฑ๋ฅ ๊ฐ์ (์: ์๋ ํฅ์)์ ๋ํ ๊ธฐ๋ณธ์ ์ธ ์ดํด๋ฅผ ์ป์ต๋๋ค.
- ์ด๋ฌํ pruning ์ ๊ทผ ๋ฐฉ์ ๊ฐ์ ์ฐจ์ด์ ๊ณผ tradeoffs๋ฅผ ์ดํดํฉ๋๋ค.
Contents
์ด ์ค์ต์๋ Fine-grained Pruning๊ณผ Channel Pruning์ ๋ ๊ฐ์ง ์ฃผ์ ์น์ ์ด ์์ต๋๋ค.
์ด 9๊ฐ์ ์ง๋ฌธ์ด ์์ต๋๋ค:
- Fine-grained Pruning์ ๋ํด์๋ 5๊ฐ์ ์ง๋ฌธ์ด ์์ต๋๋ค (Question 1-5).
- Channel Pruning์ ๋ํด์๋ 3๊ฐ์ ์ง๋ฌธ์ด ์์ต๋๋ค (Question 6-8).
- Question 9๋ fine-grained pruning๊ณผ channel pruning์ ๋น๊ตํฉ๋๋ค.
์ค์ต๋ ธํธ์ ๋ํ ์ค์ ๋ถ๋ถ(Setup)์ Colaboratory Note๋ฅผ ์ด๋ฉด ํ์ธํ์ค ์ ์์ต๋๋ค. ํฌ์คํ ์์๋ ๋ณด๋ค ์ค์ต๋ด์ฉ์ ์ง์คํ ์ ์๋๋ก ์๋ต๋์ด ์์ต๋๋ค.
weight ๊ฐ์ ๋ถํฌ๋ฅผ ์ดํด๋ด ์๋ค.
pruning์ผ๋ก ๋์ด๊ฐ๊ธฐ ์ ์, dense ๋ชจ๋ธ์์ ๊ฐ์ค์น ๊ฐ์ ๋ถํฌ๋ฅผ ์ดํด๋ด ์๋ค.
Question 1 (10 pts)
์ weight ํ์คํ ๊ทธ๋จ๋ค์ ๋ณด๊ณ ๋ค์ ์ง๋ฌธ์ ๋ตํด ์ฃผ์ธ์.
Question 1.1 (5 pts)
๊ฐ๊ธฐ ๋ค๋ฅธ ๊ณ์ธต์์ weight ๋ถํฌ๋ค์ ๊ณตํต์ ์ธ ํน์ฑ์ ๋ฌด์์ธ๊ฐ์?
Your Answer:
mean์ด 0์ธ normal ๋ถํฌ๋ฅผ ๋ฐ๋ฅด๊ณ ์๋ค (backbone์ ๊ฒฝ์ฐ, classifier ์ ์ธ)
Question 1.2 (5 pts)
์ด๋ฌํ ํน์ฑ๋ค์ด pruning์ ์ด๋ป๊ฒ ๋์์ด ๋๋์?
Your Answer:
0์ด ๋ง์ผ๋ฏ๋ก, ๊ณ์ฐํ์ง ์๊ฑฐ๋ ์์จ ์ ์๋ค.
Fine-grained Pruning
์ด ์น์ ์์๋ fine-grained pruning์ ๊ตฌํํ๊ณ ์ํํ ๊ฒ์ ๋๋ค.
Fine-grained pruning์ ๊ฐ์ฅ ์ค์๋๊ฐ ๋ฎ์ synapses๋ฅผ ์ ๊ฑฐํฉ๋๋ค. Fine-grained pruning ํ์๋ ๊ฐ์ค์น ํ ์ \(W\)๊ฐ sparseํด์ง๋ฉฐ, ์ด๋ sparsity๋ก ์ค๋ช ํ ์ ์์ต๋๋ค:
\(\mathrm{sparsity} := \#\mathrm{Zeros} / \#W = 1 - \#\mathrm{Nonzeros} / \#W\)
์ฌ๊ธฐ์ \(\#W\)๋ \(W\)์ element ์์ ๋๋ค.
์ค์ ๋ก, ๋ชฉํ sparsity \(s\)๊ฐ ์ฃผ์ด์ง๋ฉด, ๊ฐ์ค์น ํ ์ \(W\)๋ ์ ๊ฑฐ๋ ๊ฐ์ค์น๋ฅผ ๋ฌด์ํ๊ธฐ ์ํด ์ด์ง ๋ง์คํฌ \(M\)๊ณผ ๊ณฑํด์ง๋๋ค:
\(v_{\mathrm{thr}} = \texttt{kthvalue}(Importance, \#W \cdot s)\)
\(M = Importance > v_{\mathrm{thr}}\)
\(W = W \cdot M\)
์ฌ๊ธฐ์ \(Importance\)๋ \(W\)์ ๋์ผํ ํํ์ ์ค์๋ ํ ์์ด๋ฉฐ, \(\texttt{kthvalue}(X, k)\)๋ ํ ์ \(X\)์ \(k\)๋ฒ์งธ๋ก ์์ ๊ฐ์ ์ฐพ์ผ๋ฉฐ, \(v_{\mathrm{thr}}\)๋ ์๊ณ๊ฐ์ ๋๋ค.
Magnitude-based Pruning
Fine-grained pruning์ ์์ด์ ๋๋ฆฌ ์ฌ์ฉ๋๋ importance(์ค์๋)๋ weight ๊ฐ์ ํฌ๊ธฐ, ์ฆ,
\(Importance=|W|\)
์ ๋๋ค. Magnitude-based Pruning์ผ๋ก ์๋ ค์ ธ ์์ต๋๋ค (Learning both Weights and Connections for Efficient Neural Networks ์ฐธ์กฐ).
Question 2 (15 pts)
๋ค์ magnitude-based fine-grained pruning ํจ์๋ฅผ ์์ฑํด ์ฃผ์ธ์.
Hint:
- 1๋จ๊ณ์์๋ pruning ํ์ 0์ ๊ฐ์(
num_zeros
)๋ฅผ ๊ณ์ฐํฉ๋๋ค.num_zeros
๋ ์ ์์ฌ์ผ ํฉ๋๋ค. ๋ถ๋ ์์์ ์ซ์๋ฅผ ์ ์๋ก ๋ณํํ๊ธฐ ์ํดround()
๋๋int()
๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค. ์ฌ๊ธฐ์๋round()
๋ฅผ ์ฌ์ฉํฉ๋๋ค. - 2๋จ๊ณ์์๋ ๊ฐ์ค์น ํ
์์
importance
๋ฅผ ๊ณ์ฐํฉ๋๋ค. Pytorch๋torch.abs()
,torch.Tensor.abs()
,torch.Tensor.abs_()
API๋ฅผ ์ ๊ณตํฉ๋๋ค. - 3๋จ๊ณ์์๋
threshold
๋ฅผ ๊ณ์ฐํ์ฌthreshold
๋ณด๋ค ์ค์๋๊ฐ ๋ฎ์ ๋ชจ๋ synapses๊ฐ ์ ๊ฑฐ๋๋๋ก ํฉ๋๋ค. Pytorch๋torch.kthvalue()
,torch.Tensor.kthvalue()
,torch.topk()
API๋ฅผ ์ ๊ณตํฉ๋๋ค. - 4๋จ๊ณ์์๋
threshold
๋ฅผ ๊ธฐ๋ฐ์ผ๋ก pruningmask
๋ฅผ ๊ณ์ฐํฉ๋๋ค.mask
์์ 1์ synapse๊ฐ ์ ์ง๋จ์ ๋ํ๋ด๊ณ , 0์ synapse๊ฐ ์ ๊ฑฐ๋จ์ ๋ํ๋ ๋๋ค.mask = importance > threshold
. Pytorch๋torch.gt()
API๋ฅผ ์ ๊ณตํฉ๋๋ค.
def fine_grained_prune(tensor: torch.Tensor, sparsity : float) -> torch.Tensor:
"""
magnitude-based pruning for single tensor
:param tensor: torch.(cuda.)Tensor, weight of conv/fc layer
:param sparsity: float, pruning sparsity
sparsity = #zeros / #elements = 1 - #nonzeros / #elements
:return:
torch.(cuda.)Tensor, mask for zeros
"""
= min(max(0.0, sparsity), 1.0)
sparsity if sparsity == 1.0:
tensor.zero_()return torch.zeros_like(tensor)
elif sparsity == 0.0:
return torch.ones_like(tensor)
= tensor.numel()
num_elements
##################### YOUR CODE STARTS HERE #####################
# Step 1: calculate the #zeros (please use round())
= round(num_elements * sparsity)
num_zeros # Step 2: calculate the importance of weight
= torch.abs(tensor)
importance # Step 3: calculate the pruning threshold
= torch.kthvalue(torch.flatten(importance), num_zeros)[0]
threshold # Step 4: get binary mask (1 for nonzeros, 0 for zeros)
= importance > threshold
mask
##################### YOUR CODE ENDS HERE #######################
# Step 5: apply mask to prune the tensor
tensor.mul_(mask)
return mask
์์์ ์ ์ํ fine-grained pruning ๊ธฐ๋ฅ์ ํ์ธํ๊ธฐ ์ํด, ๋๋ฏธ ํ ์์ ์ ํจ์๋ฅผ ์ ์ฉํด ๋ด ์๋ค.
test_fine_grained_prune()
* Test fine_grained_prune()
target sparsity: 0.75
sparsity before pruning: 0.04
sparsity after pruning: 0.76
sparsity of pruning mask: 0.76
* Test passed.
Question 3 (5 pts)
๋ง์ง๋ง ์
์ pruning ์ ํ์ ํ
์๋ฅผ ๊ทธ๋ฆฝ๋๋ค. 0์ด ์๋ ๊ฐ์ ํ๋์์ผ๋ก, 0์ ํ์์ผ๋ก ํ์๋ฉ๋๋ค. ๋ค์ ์ฝ๋ ์
์์ target_sparsity
์ ๊ฐ์ ์์ ํ์ฌ pruning ํ sparse ํ
์์ 0์ด ์๋ ๊ฐ์ด 10๊ฐ๋ง ๋จ๋๋ก ํด ์ฃผ์ธ์.
##################### YOUR CODE STARTS HERE #####################
# sparsity:=#Zeros/#๐=1โ#Nonzeros/#๐
# 1 - 10/25
= 0.6 # please modify the value of target_sparsit
target_sparsity ##################### YOUR CODE ENDS HERE #####################
=target_sparsity, target_nonzeros=10) test_fine_grained_prune(target_sparsity
* Test fine_grained_prune()
target sparsity: 0.60
sparsity before pruning: 0.04
sparsity after pruning: 0.60
sparsity of pruning mask: 0.60
* Test passed.
์ด์ fine-grained pruning ํจ์๋ฅผ ์ ์ฒด ๋ชจ๋ธ์ pruningํ๋ ํด๋์ค๋ก ๋ํํฉ๋๋ค. FineGrainedPruner
ํด๋์ค์์๋ ๋ชจ๋ธ ๊ฐ์ค์น๊ฐ ๋ณ๊ฒฝ๋ ๋๋ง๋ค ๋ง์คํฌ๋ฅผ ์ ์ฉํ์ฌ ๋ชจ๋ธ์ด ํญ์ sparse ์ํ๋ฅผ ์ ์งํ ์ ์๋๋ก pruning ๋ง์คํฌ ๊ธฐ๋ก์ ๊ฐ์ง๊ณ ์์ด์ผ ํฉ๋๋ค.
class FineGrainedPruner:
def __init__(self, model, sparsity_dict):
self.masks = FineGrainedPruner.prune(model, sparsity_dict)
@torch.no_grad()
def apply(self, model):
for name, param in model.named_parameters():
if name in self.masks:
*= self.masks[name]
param
@staticmethod
@torch.no_grad()
def prune(model, sparsity_dict):
= dict()
masks for name, param in model.named_parameters():
if param.dim() > 1: # we only prune conv and fc weights
= fine_grained_prune(param, sparsity_dict[name])
masks[name] return masks
Sensitivity Scan
๊ฐ ๋ ์ด์ด๋ ๋ชจ๋ธ ์ฑ๋ฅ์ ๋ํด ๊ฐ๊ฐ ๋ค๋ฅด๊ฒ ๊ธฐ์ฌํฉ๋๋ค. ๊ฐ ๋ ์ด์ด์ ์ ์ ํ sparsity๋ฅผ ๊ฒฐ์ ํ๋ ๊ฒ์ ์ด๋ ค์ด ์ผ์ ๋๋ค. ๋๋ฆฌ ์ฌ์ฉ๋๋ ์ ๊ทผ ๋ฐฉ์์ sensitivity scan์ ๋๋ค.
sensitivity scan ๋์, ๊ฐ ์๊ฐ๋ง๋ค ํ๋์ ๋ ์ด์ด๋ง์ pruneํ์ฌ accuracy ์ ํ๋ฅผ ๊ด์ฐฐํฉ๋๋ค. ๋ค์ํ sparsities๋ฅผ ์ค์บํจ์ผ๋ก์จ, ํด๋น ๋ ์ด์ด์ sensitivity curve (์ฆ, ์ ํ๋ ๋๋น sparsity)๋ฅผ ๊ทธ๋ฆด ์ ์์ต๋๋ค.
๋ค์์ sensitivity curves์ ์์ ๊ทธ๋ฆผ์ ๋๋ค. x์ถ์ sparsity ๋๋ #parameters๊ฐ ๊ฐ์ํ ๋น์จ (์ฆ, sparsity)์ ๋๋ค. y์ถ์ ๊ฒ์ฆ ์ ํ๋์ ๋๋ค. (Learning both Weights and Connections for Efficient Neural Networks์ Figure 6)
๋ค์ ์ฝ๋ ์ ์ ์ค์บ๋ sparsities์ ๊ฐ ๊ฐ์ค์น๊ฐ prune๋ ๋์ ์ ํ๋ ๋ฆฌ์คํธ๋ฅผ ๋ฐํํ๋ sensitivity scan ํจ์๋ฅผ ์ ์ํฉ๋๋ค.
@torch.no_grad()
def sensitivity_scan(model, dataloader, scan_step=0.1, scan_start=0.4, scan_end=1.0, verbose=True):
= np.arange(start=scan_start, stop=scan_end, step=scan_step)
sparsities = []
accuracies = [(name, param) for (name, param) \
named_conv_weights in model.named_parameters() if param.dim() > 1]
for i_layer, (name, param) in enumerate(named_conv_weights):
= param.detach().clone()
param_clone = []
accuracy for sparsity in tqdm(sparsities, desc=f'scanning {i_layer}/{len(named_conv_weights)} weight - {name}'):
=sparsity)
fine_grained_prune(param.detach(), sparsity= evaluate(model, dataloader, verbose=False)
acc if verbose:
print(f'\r sparsity={sparsity:.2f}: accuracy={acc:.2f}%', end='')
# restore
param.copy_(param_clone)
accuracy.append(acc)if verbose:
print(f'\r sparsity=[{",".join(["{:.2f}".format(x) for x in sparsities])}]: accuracy=[{", ".join(["{:.2f}%".format(x) for x in accuracy])}]', end='')
accuracies.append(accuracy)return sparsities, accuracies
๋ค์ ์ ๋ค์ ์คํํ์ฌ sensitivity curves๋ฅผ ๊ทธ๋ ค์ฃผ์ธ์. ์๋ฃํ๋ ๋ฐ ์ฝ 2๋ถ ์ ๋ ๊ฑธ๋ฆด ๊ฒ์ ๋๋ค.
= sensitivity_scan(
sparsities, accuracies 'test'], scan_step=0.1, scan_start=0.4, scan_end=1.0) model, dataloader[
sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.42%, 91.19%, 87.55%, 83.39%, 69.41%, 31.81%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.93%, 92.88%, 92.71%, 92.40%, 91.32%, 84.78%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.94%, 92.64%, 92.46%, 91.77%, 89.85%, 78.56%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.86%, 92.72%, 92.23%, 91.09%, 85.35%, 51.31%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.88%, 92.68%, 92.22%, 89.47%, 76.86%, 38.78%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.92%, 92.71%, 92.63%, 91.88%, 89.90%, 82.19%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.94%, 92.86%, 92.65%, 92.10%, 90.58%, 83.65%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.94%, 92.92%, 92.88%, 92.81%, 92.63%, 91.34%] sparsity=[0.40,0.50,0.60,0.70,0.80,0.90]: accuracy=[92.91%, 92.83%, 92.81%, 92.97%, 92.68%, 92.52%]
def plot_sensitivity_scan(sparsities, accuracies, dense_model_accuracy):
= 100 - (100 - dense_model_accuracy) * 1.5
lower_bound_accuracy = plt.subplots(3, int(math.ceil(len(accuracies) / 3)),figsize=(15,8))
fig, axes = axes.ravel()
axes = 0
plot_index for name, param in model.named_parameters():
if param.dim() > 1:
= axes[plot_index]
ax = ax.plot(sparsities, accuracies[plot_index])
curve = ax.plot(sparsities, [lower_bound_accuracy] * len(sparsities))
line =0.4, stop=1.0, step=0.1))
ax.set_xticks(np.arange(start80, 95)
ax.set_ylim(
ax.set_title(name)'sparsity')
ax.set_xlabel('top-1 accuracy')
ax.set_ylabel(
ax.legend(['accuracy after pruning',
f'{lower_bound_accuracy / dense_model_accuracy * 100:.0f}% of dense model accuracy'
])='x')
ax.grid(axis+= 1
plot_index 'Sensitivity Curves: Validation Accuracy vs. Pruning Sparsity')
fig.suptitle(
fig.tight_layout()=0.925)
fig.subplots_adjust(top
plt.show()
plot_sensitivity_scan(sparsities, accuracies, dense_model_accuracy)
Question 4 (15 pts)
์ sensitivity curves์ ์ ๋ณด๋ฅผ ์ฌ์ฉํ์ฌ ๋ค์ ์ง๋ฌธ์ ๋ตํด ์ฃผ์ธ์.
Question 4.1 (5 pts)
pruning sparsity์ ๋ชจ๋ธ ์ ํ๋ ์ฌ์ด์ ๊ด๊ณ๋ ๋ฌด์์ธ๊ฐ์? (์ฆ, sparsity๊ฐ ๋์์ง ๋ ์ ํ๋๊ฐ ์ฆ๊ฐํ๋์, ์๋๋ฉด ๊ฐ์ํ๋์?)
Your Answer:
pruning sparsity๊ฐ ๋์์ง ์๋ก, model accuracy๋ ๊ฐ์ํ๋ ๊ฒฝํฅ์ ๋ณด์ธ๋ค
Question 4.2 (5 pts)
๋ชจ๋ ๋ ์ด์ด๊ฐ ๊ฐ์ sensitivity๋ฅผ ๊ฐ์ง๊ณ ์๋์?
Your Answer:
์ด๋ค ๋ ์ด์ด๋ sensitiveํ์ง ์๊ณ (classifier), ์ด๋ค ๋ ์ด์ด๋ sensitiveํ๋ค(conv0) ๋์ฒด๋ก, ์์ชฝ ๋ ์ด์ด(0~1..)์ด ๋ฏผ๊ฐํด๋ณด์ธ๋ค
Question 4.3 (5 pts)
์ด๋ค ๋ ์ด์ด๊ฐ pruning sparsity์ ๊ฐ์ฅ ๋ฏผ๊ฐํ๊ฐ์?
Your Answer:
conv0 layer
#Parameters of each layer
์ ํ๋๋ฟ๋ง ์๋๋ผ ๊ฐ ๋ ์ด์ด์ ๋งค๊ฐ๋ณ์(parameter) ์๋ sparsity ์ ํ์ ์ํฅ์ ๋ฏธ์นฉ๋๋ค. ๋งค๊ฐ๋ณ์๊ฐ ๋ ๋ง์ ๋ ์ด์ด๋ ๋ ํฐ sparsities๋ฅผ ์๊ตฌํฉ๋๋ค.
๋ค์ ์ฝ๋ ์ ์ ์คํํ์ฌ ์ ์ฒด ๋ชจ๋ธ์์ #parameters์ ๋ถํฌ๋ฅผ ๊ทธ๋ ค์ฃผ์ธ์.
def plot_num_parameters_distribution(model):
= dict()
num_parameters for name, param in model.named_parameters():
if param.dim() > 1:
= param.numel()
num_parameters[name] = plt.figure(figsize=(8, 6))
fig ='y')
plt.grid(axislist(num_parameters.keys()), list(num_parameters.values()))
plt.bar('#Parameter Distribution')
plt.title('Number of Parameters')
plt.ylabel(=60)
plt.xticks(rotation
plt.tight_layout()
plt.show()
plot_num_parameters_distribution(model)
Sensitivity Curves
์ #Parameters ๋ถํฌ
๋ฅผ ๊ธฐ๋ฐ์ผ๋ก Sparsity ์ ํํ๊ธฐ
Question 5 (10 pts)
sensitivity curves์ ๋ชจ๋ธ์ #parameters
๋ถํฌ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๊ฐ ๋ ์ด์ด์ sparsity๋ฅผ ์ ํํด ์ฃผ์ธ์.
pruned ๋ชจ๋ธ์ ์ ์ฒด ์์ถ ๋น์จ์ ๋์ฒด๋ก #parameters๊ฐ ํฐ ๋ ์ด์ด์ ์ฃผ๋ก ์์กดํ๋ฉฐ, ๋ค๋ฅธ ๋ ์ด์ด๋ pruning์ ๋ํ sensitivity๊ฐ ๋ค๋ฆ ๋๋ค(Question 4 ์ฐธ์กฐ).
pruning ํ์ sparse ๋ชจ๋ธ์ด dense ๋ชจ๋ธ์ ํฌ๊ธฐ์ 25%์ด๋ฉฐ, finetuning ํ์ ๊ฒ์ฆ ์ ํ๋๊ฐ 92.5% ์ด์์ธ์ง ํ์ธํ์ธ์.
Hint:
#parameters
๊ฐ ๋ ๋ง์ ๋ ์ด์ด๋ ๋ ํฐ sparsity๋ฅผ ๊ฐ์ ธ์ผ ํฉ๋๋ค. (Figure#Parameter
Distribution ์ฐธ์กฐ)- pruning sparsity์ ๋ฏผ๊ฐํ ๋ ์ด์ด(์ฆ, sparsity๊ฐ ๋์์ง์๋ก ์ ํ๋๊ฐ ๋น ๋ฅด๊ฒ ๋จ์ด์ง๋ ๋ ์ด์ด)๋ ๋ ์์ sparsity๋ฅผ ๊ฐ์ ธ์ผ ํฉ๋๋ค. (Figure Sensitivity Curves ์ฐธ์กฐ)
recover_model()
= {
sparsity_dict ##################### YOUR CODE STARTS HERE #####################
# please modify the sparsity value of each layer
# please DO NOT modify the key of sparsity_dict
'backbone.conv0.weight': 0,
'backbone.conv1.weight': 0.5,
'backbone.conv2.weight': 0.5,
'backbone.conv3.weight': 0.5,
'backbone.conv4.weight': 0.5,
'backbone.conv5.weight': 0.8,
'backbone.conv6.weight': 0.8,
'backbone.conv7.weight': 0.9,
'classifier.weight': 0
##################### YOUR CODE ENDS HERE #######################
}
์ ์๋ sparsity_dict
์ ๋ฐ๋ผ ๋ชจ๋ธ์ pruneํ๊ณ sparse ๋ชจ๋ธ์ ์ ๋ณด๋ฅผ ์ถ๋ ฅํ๊ธฐ ์ํด ๋ค์ ์
์ ์คํํด ์ฃผ์ธ์.
= FineGrainedPruner(model, sparsity_dict)
pruner print(f'After pruning with sparsity dictionary')
for name, sparsity in sparsity_dict.items():
print(f' {name}: {sparsity:.2f}')
print(f'The sparsity of each layer becomes')
for name, param in model.named_parameters():
if name in sparsity_dict:
print(f' {name}: {get_sparsity(param):.2f}')
= get_model_size(model, count_nonzero_only=True)
sparse_model_size print(f"Sparse model has size={sparse_model_size / MiB:.2f} MiB = {sparse_model_size / dense_model_size * 100:.2f}% of dense model size")
= evaluate(model, dataloader['test'])
sparse_model_accuracy print(f"Sparse model has accuracy={sparse_model_accuracy:.2f}% before fintuning")
=True) plot_weight_distribution(model, count_nonzero_only
After pruning with sparsity dictionary
backbone.conv0.weight: 0.00
backbone.conv1.weight: 0.50
backbone.conv2.weight: 0.50
backbone.conv3.weight: 0.50
backbone.conv4.weight: 0.50
backbone.conv5.weight: 0.80
backbone.conv6.weight: 0.80
backbone.conv7.weight: 0.90
classifier.weight: 0.00
The sparsity of each layer becomes
backbone.conv0.weight: 0.00
backbone.conv1.weight: 0.50
backbone.conv2.weight: 0.50
backbone.conv3.weight: 0.50
backbone.conv4.weight: 0.50
backbone.conv5.weight: 0.80
backbone.conv6.weight: 0.80
backbone.conv7.weight: 0.90
classifier.weight: 0.00
Sparse model has size=8.63 MiB = 24.50% of dense model size
Sparse model has accuracy=87.00% before fintuning
Finetune the fine-grained pruned model
์ด์ ์ ์ ์ถ๋ ฅ์์ ๋ณผ ์ ์๋ฏ์ด, fine-grained pruning์ด ๋ชจ๋ธ ๊ฐ์ค์น์ ๋๋ถ๋ถ์ ์ค์ด์ง๋ง ๋ชจ๋ธ์ ์ ํ๋๋ ๋จ์ด์ก์ต๋๋ค. ๋ฐ๋ผ์, sparse ๋ชจ๋ธ์ ์ ํ๋๋ฅผ ํ๋ณตํ๊ธฐ ์ํด finetuneํด์ผ ํฉ๋๋ค.
sparse ๋ชจ๋ธ์ finetuneํ๊ธฐ ์ํด ๋ค์ ์ ์ ์คํํด ์ฃผ์ธ์. ์๋ฃํ๋ ๋ฐ ์ฝ 3๋ถ ์ ๋ ๊ฑธ๋ฆด ๊ฒ์ ๋๋ค.
= 5
num_finetune_epochs = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)
optimizer = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, num_finetune_epochs)
scheduler = nn.CrossEntropyLoss()
criterion
= dict()
best_sparse_model_checkpoint = 0
best_accuracy print(f'Finetuning Fine-grained Pruned Sparse Model')
for epoch in range(num_finetune_epochs):
# At the end of each train iteration, we have to apply the pruning mask
# to keep the model sparse during the training
'train'], criterion, optimizer, scheduler,
train(model, dataloader[=[lambda: pruner.apply(model)])
callbacks= evaluate(model, dataloader['test'])
accuracy = accuracy > best_accuracy
is_best if is_best:
'state_dict'] = copy.deepcopy(model.state_dict())
best_sparse_model_checkpoint[= accuracy
best_accuracy print(f' Epoch {epoch+1} Accuracy {accuracy:.2f}% / Best Accuracy: {best_accuracy:.2f}%')
Finetuning Fine-grained Pruned Sparse Model
Epoch 1 Accuracy 92.66% / Best Accuracy: 92.66%
Epoch 2 Accuracy 92.77% / Best Accuracy: 92.77%
Epoch 3 Accuracy 92.80% / Best Accuracy: 92.80%
Epoch 4 Accuracy 92.68% / Best Accuracy: 92.80%
Epoch 5 Accuracy 92.77% / Best Accuracy: 92.80%
best finetuned sparse ๋ชจ๋ธ์ ์ ๋ณด๋ฅผ ๋ณด๊ธฐ ์ํด ๋ค์ ์ ์ ์คํํด ์ฃผ์ธ์.
# load the best sparse model checkpoint to evaluate the final performance
'state_dict'])
model.load_state_dict(best_sparse_model_checkpoint[= get_model_size(model, count_nonzero_only=True)
sparse_model_size print(f"Sparse model has size={sparse_model_size / MiB:.2f} MiB = {sparse_model_size / dense_model_size * 100:.2f}% of dense model size")
= evaluate(model, dataloader['test'])
sparse_model_accuracy print(f"Sparse model has accuracy={sparse_model_accuracy:.2f}% after fintuning")
Sparse model has size=8.63 MiB = 24.50% of dense model size
Sparse model has accuracy=92.80% after fintuning
Channel Pruning
์ด ์น์ ์์๋ channel pruning์ ๊ตฌํํ ๊ฒ์ ๋๋ค. Channel pruning์ ์ ์ฒด ์ฑ๋์ ์ ๊ฑฐํ์ฌ ๊ธฐ์กด ํ๋์จ์ด(์: GPU)์์ ์ถ๋ก ์๋๋ฅผ ํฅ์์ํฌ ์ ์์ต๋๋ค. ๋ง์ฐฌ๊ฐ์ง๋ก, ๋ ์์ ํฌ๊ธฐ(Frobenius norm์ผ๋ก ์ธก์ )์ ๊ฐ์ค์น๋ฅผ ๊ฐ์ง ์ฑ๋์ ์ ๊ฑฐํฉ๋๋ค.
# firstly, let's restore the model weights to the original dense version
# and check the validation accuracy
recover_model()= evaluate(model, dataloader['test'])
dense_model_accuracy print(f"dense model has accuracy={dense_model_accuracy:.2f}%")
dense model has accuracy=92.95%
Remove Channel Weights
Fine-grained pruning๊ณผ ๋ฌ๋ฆฌ, channel pruning์์๋ ํ ์์์ ๊ฐ์ค์น๋ฅผ ์์ ํ ์ ๊ฑฐํ ์ ์์ต๋๋ค. ์ฆ, ์ถ๋ ฅ ์ฑ๋์ ์๊ฐ ์ค์ด๋ญ๋๋ค:
\(\#\mathrm{out\_channels}_{\mathrm{new}} = \#\mathrm{out\_channels}_{\mathrm{origin}} \cdot (1 - \mathrm{sparsity})\)
Channel pruning ํ์๋ ๊ฐ์ค์น ํ ์ \(W\)๋ ์ฌ์ ํ denseํฉ๋๋ค. ๋ฐ๋ผ์, sparsity๋ฅผ prune ratio๋ผ๊ณ ํฉ๋๋ค.
Fine-grained pruning์ฒ๋ผ, ๋ค๋ฅธ ๋ ์ด์ด์ ๋ํด ๋ค๋ฅธ pruning ๋น์จ์ ์ฌ์ฉํ ์ ์์ต๋๋ค. ํ์ง๋ง ์ง๊ธ์ ๋ชจ๋ ๋ ์ด์ด์ ๋ํด ๊ท ์ผํ pruning ๋น์จ์ ์ฌ์ฉํฉ๋๋ค. ์ฐ๋ฆฌ๋ ๋๋ต 30%์ ๊ท ์ผํ pruning ๋น์จ๋ก 2๋ฐฐ์ ๊ณ์ฐ ๊ฐ์๋ฅผ ๋ชฉํ๋ก ํฉ๋๋ค(์ ๊ทธ๋ฐ์ง ์๊ฐํด ๋ณด์ธ์).
์ด ์น์
์ ๋์์ ๋ ์ด์ด๋ณ๋ก ๋ค๋ฅธ pruning ๋น์จ์ ์๋ํด ๋ณผ ์ ์์ต๋๋ค. channel_prune
ํจ์์ ๋น์จ ๋ฆฌ์คํธ๋ฅผ ์ ๋ฌํ ์ ์์ต๋๋ค.
Question 6 (10 pts)
Channel pruning์ ์ํ ๋ค์ ํจ์๋ฅผ ์์ฑํด ์ฃผ์ธ์.
์ฌ๊ธฐ์ ์ฐ๋ฆฌ๋ ์ฒซ ๋ฒ์งธ \(\#\mathrm{out\_channels}_{\mathrm{new}}\) ์ฑ๋์ ์ ์ธํ ๋ชจ๋ ์ถ๋ ฅ ์ฑ๋์ ๋จ์ํ pruneํฉ๋๋ค.
def get_num_channels_to_keep(channels: int, prune_ratio: float) -> int:
"""A function to calculate the number of layers to PRESERVE after pruning
Note that preserve_rate = 1. - prune_ratio
"""
##################### YOUR CODE STARTS HERE #####################
return int(round((1-prune_ratio)*channels))
##################### YOUR CODE ENDS HERE #####################
@torch.no_grad()
def channel_prune(model: nn.Module,
float]) -> nn.Module:
prune_ratio: Union[List, """Apply channel pruning to each of the conv layer in the backbone
Note that for prune_ratio, we can either provide a floating-point number,
indicating that we use a uniform pruning rate for all layers, or a list of
numbers to indicate per-layer pruning rate.
"""
# sanity check of provided prune_ratio
assert isinstance(prune_ratio, (float, list))
= len([m for m in model.backbone if isinstance(m, nn.Conv2d)])
n_conv # note that for the ratios, it affects the previous conv output and next
# conv input, i.e., conv0 - ratio0 - conv1 - ratio1-...
if isinstance(prune_ratio, list):
assert len(prune_ratio) == n_conv - 1
else: # convert float to list
= [prune_ratio] * (n_conv - 1)
prune_ratio
# we prune the convs in the backbone with a uniform ratio
= copy.deepcopy(model) # prevent overwrite
model # we only apply pruning to the backbone features
= [m for m in model.backbone if isinstance(m, nn.Conv2d)]
all_convs = [m for m in model.backbone if isinstance(m, nn.BatchNorm2d)]
all_bns # apply pruning. we naively keep the first k channels
assert len(all_convs) == len(all_bns)
for i_ratio, p_ratio in enumerate(prune_ratio):
= all_convs[i_ratio]
prev_conv = all_bns[i_ratio]
prev_bn = all_convs[i_ratio + 1]
next_conv = prev_conv.out_channels # same as next_conv.in_channels
original_channels = get_num_channels_to_keep(original_channels, p_ratio)
n_keep
# prune the output of the previous conv and bn
prev_conv.weight.set_(prev_conv.weight.detach()[:n_keep])
prev_bn.weight.set_(prev_bn.weight.detach()[:n_keep])
prev_bn.bias.set_(prev_bn.bias.detach()[:n_keep])
prev_bn.running_mean.set_(prev_bn.running_mean.detach()[:n_keep])
prev_bn.running_var.set_(prev_bn.running_var.detach()[:n_keep])
# prune the input of the next conv (hint: just one line of code)
##################### YOUR CODE STARTS HERE #####################
next_conv.weight.set_(next_conv.weight.detach()[:, :n_keep])##################### YOUR CODE ENDS HERE #####################
return model
๊ตฌํ์ด ์ฌ๋ฐ๋ฅธ์ง ํ์ธํ๊ธฐ ์ํด ๋ค์ ์ ์ ์คํํ์ฌ ํ์ธํ์ธ์.
= torch.randn(1, 3, 32, 32).cuda()
dummy_input = channel_prune(model, prune_ratio=0.3)
pruned_model = get_model_macs(pruned_model, dummy_input)
pruned_macs assert pruned_macs == 305388064
print('* Check passed. Right MACs for the pruned model.')
* Check passed. Right MACs for the pruned model.
์ด์ 30% pruning ๋น์จ์ ๊ฐ์ง ๊ท ์ผ channel pruning ํ ๋ชจ๋ธ์ ์ฑ๋ฅ์ ํ๊ฐํด ๋ด ์๋ค.
์ง์ ์ ์ผ๋ก 30%์ ์ฑ๋์ ์ ๊ฑฐํ๋ ๊ฒ์ ๋ฎ์ ์ ํ๋๋ก ์ด์ด์ง๋๋ค.
= evaluate(pruned_model, dataloader['test'])
pruned_model_accuracy print(f"pruned model has accuracy={pruned_model_accuracy:.2f}%")
pruned model has accuracy=28.14%
Ranking Channels by Importance
๋ณด์๋ค์ํผ, ๋ชจ๋ ๋ ์ด์ด์์ ์ฒซ 30%์ ์ฑ๋์ ์ ๊ฑฐํ๋ฉด ์ ํ๋๊ฐ ํฌ๊ฒ ๊ฐ์ํฉ๋๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ํ ๊ฐ์ง ๊ฐ๋ฅํ ๋ฐฉ๋ฒ์ ๋ ์ค์ํ ์ฑ๋ ๊ฐ์ค์น๋ฅผ ์ฐพ์์ ์ ๊ฑฐํ๋ ๊ฒ์ ๋๋ค. ์ค์๋๋ฅผ ํ๊ฐํ๋ ์ธ๊ธฐ ์๋ ๊ธฐ์ค์ ๊ฐ ์ ๋ ฅ ์ฑ๋์ ํด๋นํ๋ ๊ฐ์ค์น์ Frobenius norm์ ์ฌ์ฉํ๋ ๊ฒ์ ๋๋ค:
\(importance_{i} = \|W_{i}\|_2, \;\; i = 0, 1, 2,\cdots, \#\mathrm{in\_channels}-1\)
์ฐ๋ฆฌ๋ ์ฑ๋ ๊ฐ์ค์น๋ฅผ ๋ ์ค์ํ ๊ฒ์์ ๋ ์ค์ํ ๊ฒ์ผ๋ก ์ ๋ ฌํ ๋ค์, ๊ฐ ๋ ์ด์ด์ ๋ํด ์ฒ์ \(k\)๊ฐ์ ์ฑ๋์ ์ ์งํ ์ ์์ต๋๋ค.
Question 7 (15 pts)
Frobenius norm์ ๊ธฐ๋ฐํ์ฌ ๊ฐ์ค์น ํ ์๋ฅผ ์ ๋ ฌํ๋ ๋ค์ ํจ์๋ฅผ ์์ฑํด ์ฃผ์ธ์.
Hint:
- ํ
์์ Frobenius norm์ ๊ณ์ฐํ๊ธฐ ์ํด, Pytorch๋
torch.norm
API๋ฅผ ์ ๊ณตํฉ๋๋ค.
# function to sort the channels from important to non-important
def get_input_channel_importance(weight):
= weight.shape[1]
in_channels = []
importances # compute the importance for each input channel
for i_c in range(weight.shape[1]):
= weight.detach()[:, i_c]
channel_weight ##################### YOUR CODE STARTS HERE #####################
= torch.norm(channel_weight, p=2)
importance ##################### YOUR CODE ENDS HERE #####################
1))
importances.append(importance.view(return torch.cat(importances)
@torch.no_grad()
def apply_channel_sorting(model):
= copy.deepcopy(model) # do not modify the original model
model # fetch all the conv and bn layers from the backbone
= [m for m in model.backbone if isinstance(m, nn.Conv2d)]
all_convs = [m for m in model.backbone if isinstance(m, nn.BatchNorm2d)]
all_bns # iterate through conv layers
for i_conv in range(len(all_convs) - 1):
# each channel sorting index, we need to apply it to:
# - the output dimension of the previous conv
# - the previous BN layer
# - the input dimension of the next conv (we compute importance here)
= all_convs[i_conv]
prev_conv = all_bns[i_conv]
prev_bn = all_convs[i_conv + 1]
next_conv # note that we always compute the importance according to input channels
= get_input_channel_importance(next_conv.weight)
importance # sorting from large to small
= torch.argsort(importance, descending=True)
sort_idx
# apply to previous conv and its following bn
prev_conv.weight.copy_(torch.index_select(0, sort_idx))
prev_conv.weight.detach(), for tensor_name in ['weight', 'bias', 'running_mean', 'running_var']:
= getattr(prev_bn, tensor_name)
tensor_to_apply
tensor_to_apply.copy_(0, sort_idx)
torch.index_select(tensor_to_apply.detach(),
)
# apply to the next conv input (hint: one line of code)
##################### YOUR CODE STARTS HERE #####################
1, sort_idx))
next_conv.weight.copy_(torch.index_select(next_conv.weight.detach(), ##################### YOUR CODE ENDS HERE #####################
return model
์ด์ ๋ค์ ์ ์ ์คํํ์ฌ ๊ฒฐ๊ณผ๊ฐ ์ฌ๋ฐ๋ฅธ์ง ํ์ธํ์ธ์.
print('Before sorting...')
= evaluate(model, dataloader['test'])
dense_model_accuracy print(f"dense model has accuracy={dense_model_accuracy:.2f}%")
print('After sorting...')
= apply_channel_sorting(model)
sorted_model = evaluate(sorted_model, dataloader['test'])
sorted_model_accuracy print(f"sorted model has accuracy={sorted_model_accuracy:.2f}%")
# make sure accuracy does not change after sorting, since it is
# equivalent transform
assert abs(sorted_model_accuracy - dense_model_accuracy) < 0.1
print('* Check passed.')
Before sorting...
dense model has accuracy=92.95%
After sorting...
sorted model has accuracy=92.95%
* Check passed.
๋ง์ง๋ง์ผ๋ก ํ๋ฃจ๋๋ ๋ชจ๋ธ์ ์ ํ๋๋ฅผ ์ ๋ ฌํ ๋์ ๊ทธ๋ ์ง ์์ ๋๋ฅผ ๋น๊ตํฉ๋๋ค.
= 0.3 # pruned-out ratio
channel_pruning_ratio
print(" * Without sorting...")
= channel_prune(model, channel_pruning_ratio)
pruned_model = evaluate(pruned_model, dataloader['test'])
pruned_model_accuracy print(f"pruned model has accuracy={pruned_model_accuracy:.2f}%")
print(" * With sorting...")
= apply_channel_sorting(model)
sorted_model = channel_prune(sorted_model, channel_pruning_ratio)
pruned_model = evaluate(pruned_model, dataloader['test'])
pruned_model_accuracy print(f"pruned model has accuracy={pruned_model_accuracy:.2f}%")
* Without sorting...
pruned model has accuracy=28.14%
* With sorting...
pruned model has accuracy=36.81%
๋ณด์๋ค์ํผ channel sorting์ pruned model์ ์ ํ๋๋ฅผ ์ฝ๊ฐ ํฅ์์ํฌ ์ ์์ง๋ง ์ฌ์ ํ channel pruning์ ๋งค์ฐ ์ผ๋ฐ์ ์ธ ํฐ ์ ํ๊ฐ ์๋ ๊ฑธ ์ ์ ์์ต๋๋ค. ์ด๋ฌํ ์ ํ๋ ์ ํ๋ฅผ ํ๋ณตํ๊ธฐ ์ํด fine-tuning์ ์ํํ ์ ์์ต๋๋ค.
= 5
num_finetune_epochs = torch.optim.SGD(pruned_model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)
optimizer = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, num_finetune_epochs)
scheduler = nn.CrossEntropyLoss()
criterion
= 0
best_accuracy for epoch in range(num_finetune_epochs):
'train'], criterion, optimizer, scheduler)
train(pruned_model, dataloader[= evaluate(pruned_model, dataloader['test'])
accuracy = accuracy > best_accuracy
is_best if is_best:
= accuracy
best_accuracy print(f'Epoch {epoch+1} Accuracy {accuracy:.2f}% / Best Accuracy: {best_accuracy:.2f}%')
Epoch 1 Accuracy 91.66% / Best Accuracy: 91.66%
Epoch 2 Accuracy 92.10% / Best Accuracy: 92.10%
Epoch 3 Accuracy 92.01% / Best Accuracy: 92.10%
Epoch 4 Accuracy 92.18% / Best Accuracy: 92.18%
Epoch 5 Accuracy 92.16% / Best Accuracy: 92.18%
Measure acceleration from pruning
fine-tuning์ด ๋๋๋ฉด ๋ชจ๋ธ์ ์ ํ๋๋ฅผ ๊ฑฐ์ ํ๋ณตํฉ๋๋ค. channel pruning๋ fine-grained pruning์ ๋นํด ์ผ๋ฐ์ ์ผ๋ก ์ ํ๋๋ฅผ ํ๋ณตํ๊ธฐ๊ฐ ๋ ์ด๋ ต๋ค๋ ๊ฒ์ ์ด๋ฏธ ์๊ณ ๊ณ์ค ์๋ ์์ต๋๋ค. ๊ทธ๋ฌ๋ specialized model format์ด ์์ผ๋ฉด ์ง์ ์ ์ผ๋ก ๋ชจ๋ธ ํฌ๊ธฐ๊ฐ ์์์ง๊ณ ๊ณ์ฐ์ด ์์์ง๋๋ค. GPU์์๋ ๋ ๋น ๋ฅด๊ฒ ์คํ๋ ์ ์์ต๋๋ค.
์ด์ pruned model์ ๋ชจ๋ธ ํฌ๊ธฐ, ๊ณ์ฐ ๋ฐ ์ง์ฐ ์๊ฐ์ ๋น๊ตํด๋ด ์๋ค.
# helper functions to measure latency of a regular PyTorch models.
# Unlike fine-grained pruning, channel pruning
# can directly leads to model size reduction and speed up.
@torch.no_grad()
def measure_latency(model, dummy_input, n_warmup=20, n_test=100):
eval()
model.# warmup
for _ in range(n_warmup):
= model(dummy_input)
_ # real test
= time.time()
t1 for _ in range(n_test):
= model(dummy_input)
_ = time.time()
t2 return (t2 - t1) / n_test # average latency
= "{:<15} {:<15} {:<15} {:<15}"
table_template print (table_template.format('', 'Original','Pruned','Reduction Ratio'))
# 1. measure the latency of the original model and the pruned model on CPU
# which simulates inference on an edge device
= torch.randn(1, 3, 32, 32).to('cpu')
dummy_input = pruned_model.to('cpu')
pruned_model = model.to('cpu')
model
= measure_latency(pruned_model, dummy_input)
pruned_latency = measure_latency(model, dummy_input)
original_latency print(table_template.format('Latency (ms)',
round(original_latency * 1000, 1),
round(pruned_latency * 1000, 1),
round(original_latency / pruned_latency, 1)))
# 2. measure the computation (MACs)
= get_model_macs(model, dummy_input)
original_macs = get_model_macs(pruned_model, dummy_input)
pruned_macs print(table_template.format('MACs (M)',
round(original_macs / 1e6),
round(pruned_macs / 1e6),
round(original_macs / pruned_macs, 1)))
# 3. measure the model size (params)
= get_num_parameters(model)
original_param = get_num_parameters(pruned_model)
pruned_param print(table_template.format('Param (M)',
round(original_param / 1e6, 2),
round(pruned_param / 1e6, 2),
round(original_param / pruned_param, 1)))
# put model back to cuda
= pruned_model.to('cuda')
pruned_model = model.to('cuda') model
Original Pruned Reduction Ratio
Latency (ms) 24.2 13.0 1.9
MACs (M) 606 305 2.0
Param (M) 9.23 5.01 1.8
Question 8 (10 pts)
์ด์ ์ฝ๋์ ์ ์ ๋ณด๋ฅผ ์ด์ฉํ์ฌ ๋ค์ ์ง๋ฌธ์ ๋ต๋ณํด ์ฃผ์๊ธฐ ๋ฐ๋๋๋ค.
Question 8.1 (5 pts)
30%์ ์ฑ๋์ ์ ๊ฑฐํ๋ฉด ๋๋ต 50%์ ๊ณ์ฐ ์ ๊ฐ ํจ๊ณผ๊ฐ ๋ฐ์ํ๋ ์ด์ ๋ฅผ ์ค๋ช ํ์ธ์.
Your Answer:
MAC์ 2๋ฐฐ, Param์ 1.8๋ฐฐ ์ค์ด๋ค์์ง๋ง, latency๋ 1.7๋ฐฐ๋ง ๋ ๋นจ๋ผ์ก๋ค ๋ฉ๋ชจ๋ฆฌ ๊ด๋ จ๋ ์ด์ ๋ผ๊ณ ์ถ์ ๋จ
Question 8.2 (5 pts)
์ง์ฐ ์๊ฐ ๊ฐ์ ๋น์จ(latency reduction ratio)์ด ๊ณ์ฐ ๊ฐ์(computation reduction)๋ณด๋ค ์ฝ๊ฐ ์์ ์ด์ ๋ฅผ ์ค๋ช ํ์ธ์.
Your Answer:
0.7^2 = 0.49 ๋๊น, ์ผ์ถ 2๋ฐฐ. ํ๋ผ๋ฏธํฐ ์๊ฐ ์ค์ด๋ค์๋ก latency๋ quadraticํ๊ฒ ์ค์ด๋ ๋ค
Compare Fine-grained Pruning and Channel Pruning
Question 9 (10 pts)
์ด๋ฒ ๋ฉ์์ ๋ชจ๋ ์คํ์ ํ ํ์๋ fine-grained pruning์ channel pruning์ ์ต์ํด์ง ์ ์์ต๋๋ค.
lecture์ ์ด๋ฒ lab์์ ๋ฐฐ์ด ๋ด์ฉ์ ํ์ฉํ์ฌ ๋ค์ ์ง๋ฌธ์ ๋ต๋ณํด ์ฃผ์๊ธฐ ๋ฐ๋๋๋ค.
Question 9.1 (5 pts)
fine-grained pruning์ channel pruning์ ์ฅ๋จ์ ์ ๋ฌด์์ ๋๊น?
compression ratio, accuracy, latency, hardware support(i.e., ์ ๋ฌธ ํ๋์จ์ด ๊ฐ์๊ธฐ ํ์) ๋ฑ์ ๊ด์ ์์ ๋ ผ์ํ ์ ์์ต๋๋ค.
Your Answer:
- fine-grained
- ์ฅ์
- ์ ํ๋๊ฐ ๋์
- Usually larger compression ratio since we can flexibly find โredundantโ weights
- ๋จ์
- cpu overhead
- memory overhead
- hardware support ํ์(eieโฆ)
- channel pruning
- ์ฅ์
- ๋น ๋ฅธ inference
- ๋จ์
- smaller compression ratio
Question 9.2 (5 pts)
์ค๋งํธํฐ์์ ๋ชจ๋ธ์ ๋ ๋นจ๋ฆฌ ์คํ์ํค๊ณ ์ถ๋ค๋ฉด, ์ด๋ค ๊ฐ์ง์น๊ธฐ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ ๊ฒ์ธ๊ฐ์? ๊ทธ ์ด์ ๋ ๋ฌด์์ธ๊ฐ์?
Your Answer:
ํน๋ณํ ํ๋์จ์ด ์ํฌํธ๊ฐ ํ์ํ์ง์๊ณ , inference time์ด ๋น ๋ฅธ channel pruning.