๐Ÿ‘ฉโ€๐Ÿ’ป Lab 2

lab
quantization
linear
kmeans
K-means & Linear Quantization
Author

Jung Yeon Lee

Published

March 6, 2024

Lecture 5์™€ 6์„ ํ†ตํ•ด ๋ฐฐ์šด Quantization ๋‚ด์šฉ ์ค‘์— K-means Quantization๊ณผ Linear Quantization์— ๋Œ€ํ•ด ์‹ค์Šตํ•˜๋ฉฐ ๋ฐฐ์›Œ๋ณด๋Š” Lab2์— ๋Œ€ํ•œ ํ’€์ด์™€ ์„ค๋ช…์— ๋Œ€ํ•œ ํฌ์ŠคํŒ…์ด๋‹ค. ๊ธฐ์กด์˜ ์‹ค์Šต ๋…ธํŠธ๋Š” Original ๊ฐ•์˜์˜ ๋งํฌ๋ฅผ, ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ๊ณผ Solution์€ ์ด ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ Colaboratory ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด ์‹ค์Šต๋…ธํŠธ๋ฅผ ๋ฐ”๋กœ ์‹คํ–‰์‹œํ‚ค๋Š” Colab Notebook์„ ์‹คํ–‰์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Lab 2: Quantization

Goals

์ด๋ฒˆ ์‹ค์Šต์—์„œ๋Š” ๋ชจ๋ธ ํฌ๊ธฐ์™€ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ํด๋ž˜์‹ํ•œ neural network model์„ quantizingํ•˜๋Š” ์—ฐ์Šต์„ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ์‹ค์Šต์˜ ๋ชฉํ‘œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • Quantization์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค.
  • k-means quantization์„ ๊ตฌํ˜„ํ•˜๊ณ  ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • k-means quantization์— ๋Œ€ํ•ด quantization-aware training์„ ๊ตฌํ˜„ํ•˜๊ณ  ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • linear quantization์„ ๊ตฌํ˜„ํ•˜๊ณ  ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • linear quantization์— ๋Œ€ํ•ด integer-only inference๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Quantization์—์„œ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ (์˜ˆ: ์†๋„ ํ–ฅ์ƒ)์— ๋Œ€ํ•œ ๊ธฐ๋ณธ์ ์ธ ์ดํ•ด๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ quantization ์ ‘๊ทผ ๋ฐฉ์‹ ์‚ฌ์ด์˜ ์ฐจ์ด์ ๊ณผ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๋ฅผ ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค.

Contents

์ฃผ์š” ์„น์…˜์€ K-Means Quantization ๊ณผ Linear Quantization 2๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ฒˆ ์‹ค์Šต ๋…ธํŠธ์—์„œ ์ด 10๊ฐœ์˜ ์งˆ๋ฌธ์„ ํ†ตํ•ด ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.:

  • K-Means Quantization์— ๋Œ€ํ•ด์„œ๋Š” 3๊ฐœ์˜ ์งˆ๋ฌธ์ด ์žˆ์Šต๋‹ˆ๋‹ค (Question 1-3).
  • Linear Quantization์— ๋Œ€ํ•ด์„œ๋Š” 6๊ฐœ์˜ ์งˆ๋ฌธ์ด ์žˆ์Šต๋‹ˆ๋‹ค (Question 4-9).
  • Question 10์€ k-means quantization๊ณผ linear quantization์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

์‹ค์Šต๋…ธํŠธ์— ๋Œ€ํ•œ ์„ค์ • ๋ถ€๋ถ„(Setup)์€ Colaboratory Note๋ฅผ ์—ด๋ฉด ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํฌ์ŠคํŒ…์—์„œ๋Š” ๋ณด๋‹ค ์‹ค์Šต๋‚ด์šฉ์— ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ƒ๋žต๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋จผ์ € FP32 Model์˜ ์ •ํ™•๋„์™€ ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ํ‰๊ฐ€ํ•ด๋ด…์‹œ๋‹ค

fp32_model_accuracy = evaluate(model, dataloader['test'])
fp32_model_size = get_model_size(model)
print(f"fp32 model has accuracy={fp32_model_accuracy:.2f}%")
print(f"fp32 model has size={fp32_model_size/MiB:.2f} MiB")
fp32 model has accuracy=92.95%
fp32 model has size=35.20 MiB

K-Means Quantization

Network quantization์€ deep network๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ฐ€์ค‘์น˜ ๋‹น ๋น„ํŠธ(bits per weight) ์ˆ˜๋ฅผ ์ค„์—ฌ ๋„คํŠธ์›Œํฌ๋ฅผ ์••์ถ•ํ•ฉ๋‹ˆ๋‹ค. quantized network๋Š” ํ•˜๋“œ์›จ์–ด ์ง€์›์ด ์žˆ์„ ๊ฒฝ์šฐ ๋” ๋น ๋ฅธ ์ถ”๋ก  ์†๋„๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์„น์…˜์—์„œ๋Š” Deep Compression: Compressing Deep Neural Networks With Pruning, Trained Quantization And Huffman Coding์—์„œ์ฒ˜๋Ÿผ ์‹ ๊ฒฝ๋ง์— ๋Œ€ํ•œ K-means quantization์„ ํƒ๊ตฌํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

kmeans.png

quantized_weight = codebook.centroids[codebook.labels].view_as(weight)

\(n\)-bit k-means quantization์€ ์‹œ๋ƒ…์Šค๋ฅผ \(2^n\) ๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋‚˜๋ˆ„๊ณ , ๋™์ผํ•œ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์˜ ์‹œ๋ƒ…์Šค๋Š” ๋™์ผํ•œ ๊ฐ€์ค‘์น˜ ๊ฐ’์„ ๊ณต์œ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ, k-means quantization์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ codebook์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค: * centroids: \(2^n\) fp32 ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ. * labels: ์›๋ž˜ fp32 ๊ฐ€์ค‘์น˜ ํ…์„œ์™€ ๋™์ผํ•œ #elements๋ฅผ ๊ฐ€์ง„ \(n\)-bit ์ •์ˆ˜ ํ…์„œ. ๊ฐ ์ •์ˆ˜๋Š” ํ•ด๋‹น ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์–ด๋””์— ์†ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

์ถ”๋ก ํ•˜๋Š” ๋™์•ˆ, codebook์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ fp32 ํ…์„œ๊ฐ€ ์ถ”๋ก ์„ ์œ„ํ•ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค:

quantized_weight = codebook.centroids[codebook.labels].view_as(weight)

from collections import namedtuple

Codebook = namedtuple('Codebook', ['centroids', 'labels'])

Question 1 (10 pts)

์•„๋ž˜์˜ K-Means quantization function์„ ์™„์„ฑํ•˜์„ธ์š”.

from fast_pytorch_kmeans import KMeans

def k_means_quantize(fp32_tensor: torch.Tensor, bitwidth=4, codebook=None):
    """
    quantize tensor using k-means clustering
    :param fp32_tensor:
    :param bitwidth: [int] quantization bit width, default=4
    :param codebook: [Codebook] (the cluster centroids, the cluster label tensor)
    :return:
        [Codebook = (centroids, labels)]
            centroids: [torch.(cuda.)FloatTensor] the cluster centroids
            labels: [torch.(cuda.)LongTensor] cluster label tensor
    """
    if codebook is None:
        ############### YOUR CODE STARTS HERE ###############
        # get number of clusters based on the quantization precision
        n_clusters = 2 ** bitwidth  # Calculate number of clusters as 2^bitwidth
        ############### YOUR CODE ENDS HERE #################
        # use k-means to get the quantization centroids
        kmeans = KMeans(n_clusters=n_clusters, mode='euclidean', verbose=0)
        labels = kmeans.fit_predict(fp32_tensor.view(-1, 1)).to(torch.long)
        centroids = kmeans.centroids.to(torch.float).view(-1)
        codebook = Codebook(centroids, labels)

    ############### YOUR CODE STARTS HERE ###############
    # decode the codebook into k-means quantized tensor for inference
    # hint: one line of code
    quantized_tensor = codebook.centroids[codebook.labels].view_as(fp32_tensor)
    ############### YOUR CODE ENDS HERE #################
    fp32_tensor.set_(quantized_tensor.view_as(fp32_tensor))
    return codebook

์œ„์—์„œ ์ž‘์„ฑํ•œ k-means quantization function์„ ๋”๋ฏธ ํ…์„œ์— ์ ์šฉํ•˜์—ฌ ํ™•์ธํ•ด๋ด…์‹œ๋‹ค.

test_k_means_quantize()
tensor([[-0.3747,  0.0874,  0.3200, -0.4868,  0.4404],
        [-0.0402,  0.2322, -0.2024, -0.4986,  0.1814],
        [ 0.3102, -0.3942, -0.2030,  0.0883, -0.4741],
        [-0.1592, -0.0777, -0.3946, -0.2128,  0.2675],
        [ 0.0611, -0.1933, -0.4350,  0.2928, -0.1087]])
* Test k_means_quantize()
    target bitwidth: 2 bits
        num unique values before k-means quantization: 25
        num unique values after  k-means quantization: 4
* Test passed.

Question 2 (10 pts)

๋งˆ์ง€๋ง‰ ์ฝ”๋“œ ์…€์€ 2๋น„ํŠธ k-means quantization์„ ์ˆ˜ํ–‰ํ•˜๊ณ  quantization ์ „ํ›„์˜ ํ…์„œ๋ฅผ ํ”Œ๋กฏํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ๋Š” ๊ณ ์œ ํ•œ ์ƒ‰์ƒ์œผ๋กœ ๋ Œ๋”๋ง๋˜๋ฉฐ, quantized ํ…์„œ๋“ค์ด 4(\(2^2\))๊ฐ€์ง€ ๊ณ ์œ ํ•œ ์ƒ‰์ƒ์œผ๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ˜„์ƒ์„ ๊ด€์ฐฐํ•œ ๊ฒƒ์„ ๋ฐ”ํƒ•์œผ๋กœ ์งˆ๋ฌธ๋“ค์— ๋‹ตํ•˜์„ธ์š”.

Question 2.1 (5 pts)

4๋น„ํŠธ๋กœ k-means quantization์ด ์ˆ˜ํ–‰๋˜๋ฉด, quantized ํ…์„œ์—๋Š” ๋ช‡ ๊ฐœ์˜ ๊ณ ์œ ํ•œ ์ƒ‰์ƒ์ด ๋ Œ๋”๋ง๋ ๊นŒ์š”?

Your Answer:

4๋น„ํŠธ k-means quantization์ด ์ˆ˜ํ–‰๋˜๋ฉด, quantized ํ…์„œ์— \((2^4 = 16)\)๊ฐœ์˜ ๊ณ ์œ ํ•œ ์ƒ‰์ƒ์ด ๋ Œ๋”๋ง๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” 4๋น„ํŠธ๋กœ 0000๋ถ€ํ„ฐ 1111๊นŒ์ง€์˜ 16๊ฐ€์ง€ ๋‹ค๋ฅธ ์ƒํƒœ๋‚˜ ์กฐํ•ฉ์„ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ํ…์„œ ๊ฐ’์ด ๊ทธ๋ฃนํ™”๋  ์ˆ˜ ์žˆ๋Š” 16๊ฐœ์˜ ๊ณ ์œ ํ•œ ํด๋Ÿฌ์Šคํ„ฐ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

Question 2.2 (5 pts)

n-๋น„ํŠธ k-means quantization์ด ์ˆ˜ํ–‰๋˜๋ฉด, quantized ํ…์„œ์— ๋ช‡ ๊ฐœ์˜ ๊ณ ์œ ํ•œ ์ƒ‰์ƒ์ด ๋ Œ๋”๋ง๋ ๊นŒ์š”?

Your Answer:

n-๋น„ํŠธ k-means quantization์ด ์ˆ˜ํ–‰๋˜๋ฉด, quantized ํ…์„œ์—๋Š” \((2^n)\)๊ฐœ์˜ ๊ณ ์œ ํ•œ ์ƒ‰์ƒ์ด ๋ Œ๋”๋ง ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” n๋น„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ \((2^n)\)๊ฐœ์˜ ๋‹ค๋ฅธ ์ƒํƒœ๋‚˜ ์กฐํ•ฉ์„ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ํ…์„œ ๊ฐ’์ด ๊ทธ๋ฃนํ™”๋  ์ˆ˜ ์žˆ๋Š” \((2^n)\)๊ฐœ์˜ ๊ณ ์œ ํ•œ ํด๋Ÿฌ์Šคํ„ฐ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

K-Means Quantization on Whole Model

lab 1์—์„œ ํ–ˆ๋˜ ๊ฒƒ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ, ์ด์ œ ์ „์ฒด ๋ชจ๋ธ์„ quantizingํ•˜๊ธฐ ์œ„ํ•ด k-means quantization ํ•จ์ˆ˜๋ฅผ ํด๋ž˜์Šค๋กœ ๋ž˜ํ•‘ํ•ฉ๋‹ˆ๋‹ค. KMeansQuantizer ํด๋ž˜์Šค์—์„œ๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๊ฐ€ ๋ณ€๊ฒฝ๋  ๋•Œ๋งˆ๋‹ค codebooks(i.e., centroids์™€ labels)์„ ์ ์šฉํ•˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ๋„๋ก codebooks์˜ ๋ณ€ํ™”๋ฅผ ๊ธฐ๋กํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

from torch.nn import parameter
class KMeansQuantizer:
    def __init__(self, model : nn.Module, bitwidth=4):
        self.codebook = KMeansQuantizer.quantize(model, bitwidth)

    @torch.no_grad()
    def apply(self, model, update_centroids):
        for name, param in model.named_parameters():
            if name in self.codebook:
                if update_centroids:
                    update_codebook(param, codebook=self.codebook[name])
                self.codebook[name] = k_means_quantize(
                    param, codebook=self.codebook[name])

    @staticmethod
    @torch.no_grad()
    def quantize(model: nn.Module, bitwidth=4):
        codebook = dict()
        if isinstance(bitwidth, dict):
            for name, param in model.named_parameters():
                if name in bitwidth:
                    codebook[name] = k_means_quantize(param, bitwidth=bitwidth[name])
        else:
            for name, param in model.named_parameters():
                if param.dim() > 1:
                    codebook[name] = k_means_quantize(param, bitwidth=bitwidth)
        return codebook

์ด์ œ K-Means Quantization์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ 8๋น„ํŠธ, 4๋น„ํŠธ, 2๋น„ํŠธ๋กœ quantizeํ•ด๋ด…์‹œ๋‹ค. ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ codebooks์˜ ์ €์žฅ ๊ณต๊ฐ„์€ ๋ฌด์‹œํ•œ๋‹ค๋Š” ์ ์„ ์œ ์˜ํ•˜์„ธ์š”.

print('Note that the storage for codebooks is ignored when calculating the model size.')
quantizers = dict()
for bitwidth in [8, 4, 2]:
    recover_model()
    print(f'k-means quantizing model into {bitwidth} bits')
    quantizer = KMeansQuantizer(model, bitwidth)
    quantized_model_size = get_model_size(model, bitwidth)
    print(f"    {bitwidth}-bit k-means quantized model has size={quantized_model_size/MiB:.2f} MiB")
    quantized_model_accuracy = evaluate(model, dataloader['test'])
    print(f"    {bitwidth}-bit k-means quantized model has accuracy={quantized_model_accuracy:.2f}%")
    quantizers[bitwidth] = quantizer
Note that the storage for codebooks is ignored when calculating the model size.
k-means quantizing model into 8 bits
    8-bit k-means quantized model has size=8.80 MiB
    8-bit k-means quantized model has accuracy=92.76%
k-means quantizing model into 4 bits
    4-bit k-means quantized model has size=4.40 MiB
    4-bit k-means quantized model has accuracy=79.07%
k-means quantizing model into 2 bits
    2-bit k-means quantized model has size=2.20 MiB
    2-bit k-means quantized model has accuracy=10.00%

Trained K-Means Quantization

๋งˆ์ง€๋ง‰ ์…€์˜ ๊ฒฐ๊ณผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ๋ชจ๋ธ์„ ์ ์€ ๋น„ํŠธ๋กœ quantizeํ•  ๋•Œ ์ •ํ™•๋„๊ฐ€ ํฌ๊ฒŒ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ •ํ™•๋„๋ฅผ ํšŒ๋ณตํ•˜๊ธฐ ์œ„ํ•ด quantization-aware training์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

k-means quantization-aware ํ›ˆ๋ จ ๋™์•ˆ, centroids๋„ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” Deep Compression: Compressing Deep Neural Networks With Pruning, Trained Quantization And Huffman Coding์—์„œ ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

centroids์— ๋Œ€ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค,

\(\frac{\partial \mathcal{L} }{\partial C_k} = \sum_{j} \frac{\partial \mathcal{L} }{\partial W_{j}} \frac{\partial W_{j} }{\partial C_k} = \sum_{j} \frac{\partial \mathcal{L} }{\partial W_{j}} \mathbf{1}(I_{j}=k)\)

์—ฌ๊ธฐ์„œ \(\mathcal{L}\)์€ ์†์‹ค, \(C_k\)๋Š” k-๋ฒˆ์งธ centroid, \(I_{j}\)๋Š” ๊ฐ€์ค‘์น˜ \(W_{j}\)์˜ ๋ผ๋ฒจ์ž…๋‹ˆ๋‹ค.

\(\mathbf{1}()\)์€ ์ง€์‹œ ํ•จ์ˆ˜์ด๋ฉฐ, \(\mathbf{1}(I_{j}=k)\)๋Š” \(1\;\mathrm{if}\;I_{j}=k\;\mathrm{else}\;0\), ์ฆ‰, \(I_{j}==k\)๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

lab์—์„œ๋Š” ๊ฐ„๋‹จํžˆ ์ตœ์‹  ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ centroids๋ฅผ ์ง์ ‘ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค:

\(C_k = \frac{\sum_{j}W_{j}\mathbf{1}(I_{j}=k)}{\sum_{j}\mathbf{1}(I_{j}=k)}\)

Question 3 (10 pts)

์•„๋ž˜์˜ codebook update function์„ ์™„์„ฑํ•˜์„ธ์š”.

Hint:

์œ„์˜ centroids๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ์ •์‹์€ ์‹ค์ œ๋กœ ๋™์ผํ•œ ํด๋Ÿฌ์Šคํ„ฐ์— ์žˆ๋Š” ๊ฐ€์ค‘์น˜์˜ ํ‰๊ท (mean)์„ ์—…๋ฐ์ดํŠธ๋œ centroid ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

def update_codebook(fp32_tensor: torch.Tensor, codebook: Codebook):
    """
    update the centroids in the codebook using updated fp32_tensor
    :param fp32_tensor: [torch.(cuda.)Tensor]
    :param codebook: [Codebook] (the cluster centroids, the cluster label tensor)
    """
    n_clusters = codebook.centroids.numel()
    fp32_tensor = fp32_tensor.view(-1)
    for k in range(n_clusters):
    ############### YOUR CODE STARTS HERE ###############
        codebook.centroids[k] = fp32_tensor[codebook.labels == k].mean()
    ############### YOUR CODE ENDS HERE #################

์ด์ œ ๋‹ค์Œ ์ฝ”๋“œ ์…€์„ ์‹คํ–‰ํ•˜์—ฌ k-means quantized ๋ชจ๋ธ์„ finetuningํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ํšŒ๋ณตํ•ด๋ด…์‹œ๋‹ค. ์ •ํ™•๋„ ํ•˜๋ฝ์ด 0.5๋ณด๋‹ค ์ž‘์œผ๋ฉด finetuning์„ ์ค‘๋‹จํ•ฉ๋‹ˆ๋‹ค.

accuracy_drop_threshold = 0.5
quantizers_before_finetune = copy.deepcopy(quantizers)
quantizers_after_finetune = quantizers

for bitwidth in [8, 4, 2]:
    recover_model()
    quantizer = quantizers[bitwidth]
    print(f'k-means quantizing model into {bitwidth} bits')
    quantizer.apply(model, update_centroids=False)
    quantized_model_size = get_model_size(model, bitwidth)
    print(f"    {bitwidth}-bit k-means quantized model has size={quantized_model_size/MiB:.2f} MiB")
    quantized_model_accuracy = evaluate(model, dataloader['test'])
    print(f"    {bitwidth}-bit k-means quantized model has accuracy={quantized_model_accuracy:.2f}% before quantization-aware training ")
    accuracy_drop = fp32_model_accuracy - quantized_model_accuracy
    if accuracy_drop > accuracy_drop_threshold:
        print(f"        Quantization-aware training due to accuracy drop={accuracy_drop:.2f}% is larger than threshold={accuracy_drop_threshold:.2f}%")
        num_finetune_epochs = 5
        optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, num_finetune_epochs)
        criterion = nn.CrossEntropyLoss()
        best_accuracy = 0
        epoch = num_finetune_epochs
        while accuracy_drop > accuracy_drop_threshold and epoch > 0:
            train(model, dataloader['train'], criterion, optimizer, scheduler,
                  callbacks=[lambda: quantizer.apply(model, update_centroids=True)])
            model_accuracy = evaluate(model, dataloader['test'])
            is_best = model_accuracy > best_accuracy
            best_accuracy = max(model_accuracy, best_accuracy)
            print(f'        Epoch {num_finetune_epochs-epoch} Accuracy {model_accuracy:.2f}% / Best Accuracy: {best_accuracy:.2f}%')
            accuracy_drop = fp32_model_accuracy - best_accuracy
            epoch -= 1
    else:
        print(f"        No need for quantization-aware training since accuracy drop={accuracy_drop:.2f}% is smaller than threshold={accuracy_drop_threshold:.2f}%")
k-means quantizing model into 8 bits
    8-bit k-means quantized model has size=8.80 MiB
    8-bit k-means quantized model has accuracy=92.76% before quantization-aware training 
        No need for quantization-aware training since accuracy drop=0.19% is smaller than threshold=0.50%
k-means quantizing model into 4 bits
    4-bit k-means quantized model has size=4.40 MiB
    4-bit k-means quantized model has accuracy=79.07% before quantization-aware training 
        Quantization-aware training due to accuracy drop=13.88% is larger than threshold=0.50%
        Epoch 0 Accuracy 92.47% / Best Accuracy: 92.47%
k-means quantizing model into 2 bits
    2-bit k-means quantized model has size=2.20 MiB
    2-bit k-means quantized model has accuracy=10.00% before quantization-aware training 
        Quantization-aware training due to accuracy drop=82.95% is larger than threshold=0.50%
        Epoch 0 Accuracy 90.21% / Best Accuracy: 90.21%
        Epoch 1 Accuracy 90.82% / Best Accuracy: 90.82%
        Epoch 2 Accuracy 91.00% / Best Accuracy: 91.00%
        Epoch 3 Accuracy 91.12% / Best Accuracy: 91.12%
        Epoch 4 Accuracy 91.17% / Best Accuracy: 91.17%

Linear Quantization

์ด ์„น์…˜์—์„œ๋Š” linear quantization์„ ๊ตฌํ˜„ํ•˜๊ณ  ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Linear quantization์€ range truncation ๊ณผ scaling ๊ณผ์ •์„ ๊ฑฐ์นœ ํ›„ ๋ถ€๋™ ์†Œ์ˆ˜์  ๊ฐ’์„ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์–‘์žํ™”๋œ ์ •์ˆ˜๋กœ ์ง์ ‘ ๋ฐ˜์˜ฌ๋ฆผํ•ฉ๋‹ˆ๋‹ค.

Linear quantization์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

\(r = S(q-Z)\)

์—ฌ๊ธฐ์„œ \(r\)์€ ๋ถ€๋™ ์†Œ์ˆ˜์  ์‹ค์ˆ˜, \(q\)๋Š” n-๋น„ํŠธ ์ •์ˆ˜, \(Z\)๋Š” n-๋น„ํŠธ ์ •์ˆ˜์ด๋ฉฐ, \(S\)๋Š” ๋ถ€๋™ ์†Œ์ˆ˜์  ์‹ค์ˆ˜์ž…๋‹ˆ๋‹ค.

\(Z\)๋Š” quantization zero point์ด๊ณ  \(S\)๋Š” quantization scaling factor์ž…๋‹ˆ๋‹ค. ์ƒ์ˆ˜ \(Z\)์™€ \(S\) ๋ชจ๋‘ ์–‘์žํ™” ๋งค๊ฐœ๋ณ€์ˆ˜(parameter)์ž…๋‹ˆ๋‹ค.

n-bit Integer

n-๋น„ํŠธ signed integer๋Š” ๋ณดํ†ต twoโ€™s complement ํ‘œ๊ธฐ๋ฒ•์œผ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.

n-๋น„ํŠธ signed integer๋Š” ๋ฒ”์œ„ \([-2^{n-1}, 2^{n-1}-1]\) ๋‚ด์˜ ์ •์ˆ˜๋ฅผ ์ธ์ฝ”๋”ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 8๋น„ํŠธ ์ •์ˆ˜๋Š” [-128, 127] ๋ฒ”์œ„์— ์†ํ•ฉ๋‹ˆ๋‹ค.

def get_quantized_range(bitwidth):
    quantized_max = (1 << (bitwidth - 1)) - 1
    quantized_min = -(1 << (bitwidth - 1))
    return quantized_min, quantized_max

Question 4 (15 pts)

์•„๋ž˜์˜ linear quantization function์„ ์™„์„ฑํ•˜์„ธ์š”.

Hint:

  • \(r=S(q-Z)\)์—์„œ, \(q = r/S + Z\)์œผ๋กœ ๋ฐ”๊ฟ”์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • \(r\)๊ณผ \(S\) ๋ชจ๋‘ ๋ถ€๋™ ์†Œ์ˆ˜์  ์ˆซ์ž(floating number)์ด๋ฏ€๋กœ, ์ •์ˆ˜ \(Z\)๋ฅผ ์ง์ ‘ \(r/S\)์— ๋”ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ \(q = \mathrm{int}(\mathrm{round}(r/S)) + Z\)์ž…๋‹ˆ๋‹ค.
  • torch.FloatTensor๋ฅผ torch.IntTensor๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ, torch.round(), torch.Tensor.round(), torch.Tensor.round_()์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋“  ๊ฐ’์„ ๋ถ€๋™ ์†Œ์ˆ˜์  ์ •์ˆ˜๋กœ ๋จผ์ € ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ ๋‹ค์Œ torch.Tensor.to(torch.int8)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ torch.float์—์„œ torch.int8๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
def linear_quantize(fp_tensor, bitwidth, scale, zero_point, dtype=torch.int8) -> torch.Tensor:
    """
    linear quantization for single fp_tensor
      from
        fp_tensor = (quantized_tensor - zero_point) * scale
      we have,
        quantized_tensor = int(round(fp_tensor / scale)) + zero_point
    :param tensor: [torch.(cuda.)FloatTensor] floating tensor to be quantized
    :param bitwidth: [int] quantization bit width
    :param scale: [torch.(cuda.)FloatTensor] scaling factor
    :param zero_point: [torch.(cuda.)IntTensor] the desired centroid of tensor values
    :return:
        [torch.(cuda.)FloatTensor] quantized tensor whose values are integers
    """
    assert(fp_tensor.dtype == torch.float)
    assert(isinstance(scale, float) or
           (scale.dtype == torch.float and scale.dim() == fp_tensor.dim()))
    assert(isinstance(zero_point, int) or
           (zero_point.dtype == dtype and zero_point.dim() == fp_tensor.dim()))

    ############### YOUR CODE STARTS HERE ###############
    # Step 1: scale the fp_tensor
    scaled_tensor = fp_tensor / scale
    # Step 2: round the floating value to integer value
    rounded_tensor = torch.round(scaled_tensor)
    ############### YOUR CODE ENDS HERE #################

    rounded_tensor = rounded_tensor.to(dtype)

    ############### YOUR CODE STARTS HERE ###############
    # Step 3: shift the rounded_tensor to make zero_point 0
    shifted_tensor = rounded_tensor + zero_point
    ############### YOUR CODE ENDS HERE #################

    # Step 4: clamp the shifted_tensor to lie in bitwidth-bit range
    quantized_min, quantized_max = get_quantized_range(bitwidth)
    quantized_tensor = shifted_tensor.clamp_(quantized_min, quantized_max)
    return quantized_tensor

์œ„์—์„œ ์ž‘์„ฑํ•œ linear quantization ๊ธฐ๋Šฅ์„ ๋”๋ฏธ ํ…์„œ์— ์ ์šฉํ•˜์—ฌ ๊ธฐ๋Šฅ์„ ๊ฒ€์ฆํ•ด๋ด…์‹œ๋‹ค.

test_linear_quantize()
* Test linear_quantize()
    target bitwidth: 2 bits
        scale: 0.3333333333333333
        zero point: -1
* Test passed.

Question 5 (10 pts)

์ด์ œ linear quantization์„ ์œ„ํ•œ ์Šค์ผ€์ผ๋ง ์ธ์ž \(S\)์™€ ์ œ๋กœ ํฌ์ธํŠธ \(Z\)๋ฅผ ๊ฒฐ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

linear quantization์€ \(r = S(q-Z)\) ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•˜์„ธ์š”.

Scale

Linear quantization์€ ๋ถ€๋™ ์†Œ์ˆ˜์  ๋ฒ”์œ„ [fp_min, fp_max]๋ฅผ ์–‘์žํ™”๋œ ๋ฒ”์œ„ [quantized_min, quantized_max]๋กœ ํˆฌ์˜(projection)ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰,

\(r_{\mathrm{max}} = S(q_{\mathrm{max}}-Z)\)

\(r_{\mathrm{min}} = S(q_{\mathrm{min}}-Z)\)

์ด ๋‘ ๋ฐฉ์ •์‹์„ ๋นผ๋ฉด, ์šฐ๋ฆฌ๋Š” ๋‹ค์Œ์„ ์–ป์Šต๋‹ˆ๋‹ค,

Question 5.1 (1 pts)

๋‹ค์Œ ํ…์ŠคํŠธ ์…€์—์„œ ์˜ฌ๋ฐ”๋ฅธ ๋‹ต์„ ์„ ํƒํ•˜๊ณ  ์ž˜๋ชป๋œ ๋‹ต์„ ์‚ญ์ œํ•ด์ฃผ์„ธ์š”.

\(S=r_{\mathrm{max}} / q_{\mathrm{max}}\)

\(S=(r_{\mathrm{max}} + r_{\mathrm{min}}) / (q_{\mathrm{max}} + q_{\mathrm{min}})\)

โœ…\(S=(r_{\mathrm{max}} - r_{\mathrm{min}}) / (q_{\mathrm{max}} - q_{\mathrm{min}})\)

\(S=r_{\mathrm{max}} / q_{\mathrm{max}} - r_{\mathrm{min}} / q_{\mathrm{min}}\)

fp_tensor์˜ \(r_{\mathrm{min}}\)๊ณผ \(r_{\mathrm{max}}\)๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๊ฐ€์žฅ ํ”ํ•œ ๋ฐฉ๋ฒ•์€ fp_tensor์˜ ์ตœ์†Œ๊ฐ’๊ณผ ์ตœ๋Œ€๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ๋˜ ๋‹ค๋ฅธ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•์€ Kullback-Leibler-J ๋ฐœ์‚ฐ์„ ์ตœ์†Œํ™”ํ•˜์—ฌ fp_max๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

zero point

์Šค์ผ€์ผ๋ง ์ธ์ž \(S\)๋ฅผ ๊ฒฐ์ •ํ•˜๋ฉด, \(r_{\mathrm{min}}\)๊ณผ \(q_{\mathrm{min}}\) ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ œ๋กœ ํฌ์ธํŠธ \(Z\)๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Question 5.2 (1 pts)

๋‹ค์Œ ํ…์ŠคํŠธ ์…€์—์„œ ์˜ฌ๋ฐ”๋ฅธ ๋‹ต์„ ์„ ํƒํ•˜๊ณ  ์ž˜๋ชป๋œ ๋‹ต์„ ์‚ญ์ œํ•ด์ฃผ์„ธ์š”.

\(Z = \mathrm{int}(\mathrm{round}(r_{\mathrm{min}} / S - q_{\mathrm{min}})\)

\(Z = \mathrm{int}(\mathrm{round}(q_{\mathrm{min}} - r_{\mathrm{min}} / S))\)

โœ…\(Z = q_{\mathrm{min}} - r_{\mathrm{min}} / S\)

\(Z = r_{\mathrm{min}} / S - q_{\mathrm{min}}\)

Question 5.3 (8 pts)

floating point tensor \(r\)๋กœ๋ถ€ํ„ฐ scale \(S\)์™€ zero point \(Z\)๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์•„๋ž˜์˜ ํ•จ์ˆ˜๋ฅผ ์™„์„ฑํ•˜์„ธ์š”.

def get_quantization_scale_and_zero_point(fp_tensor, bitwidth):
    """
    get quantization scale for single tensor
    :param fp_tensor: [torch.(cuda.)Tensor] floating tensor to be quantized
    :param bitwidth: [int] quantization bit width
    :return:
        [float] scale
        [int] zero_point
    """
    quantized_min, quantized_max = get_quantized_range(bitwidth)
    fp_max = fp_tensor.max().item()
    fp_min = fp_tensor.min().item()

    ############### YOUR CODE STARTS HERE ###############
    # Calculate scale
    scale = (fp_max - fp_min) / (quantized_max - quantized_min)
    # Calculate zero_point
    zero_point = quantized_min - round(fp_min / scale)
    ############### YOUR CODE ENDS HERE #################

    # clip the zero_point to fall in [quantized_min, quantized_max]
    if zero_point < quantized_min:
        zero_point = quantized_min
    elif zero_point > quantized_max:
        zero_point = quantized_max
    else: # convert from float to int using round()
        zero_point = round(zero_point)
    return scale, int(zero_point)

์ด์ œ Question 4์˜ linear_quantize()์™€ Question 5์˜ get_quantization_scale_and_zero_point()์„ ํ•˜๋‚˜์˜ ํ•จ์ˆ˜๋กœ ๋ž˜ํ•‘ํ•ฉ๋‹ˆ๋‹ค.

def linear_quantize_feature(fp_tensor, bitwidth):
    """
    linear quantization for feature tensor
    :param fp_tensor: [torch.(cuda.)Tensor] floating feature to be quantized
    :param bitwidth: [int] quantization bit width
    :return:
        [torch.(cuda.)Tensor] quantized tensor
        [float] scale tensor
        [int] zero point
    """
    scale, zero_point = get_quantization_scale_and_zero_point(fp_tensor, bitwidth)
    quantized_tensor = linear_quantize(fp_tensor, bitwidth, scale, zero_point)
    return quantized_tensor, scale, zero_point

Special case: linear quantization on weight tensor

๋จผ์ € ๊ฐ€์ค‘์น˜ ๊ฐ’์˜ ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ด…์‹œ๋‹ค.

def plot_weight_distribution(model, bitwidth=32):
    # bins = (1 << bitwidth) if bitwidth <= 8 else 256
    if bitwidth <= 8:
        qmin, qmax = get_quantized_range(bitwidth)
        bins = np.arange(qmin, qmax + 2)
        align = 'left'
    else:
        bins = 256
        align = 'mid'
    fig, axes = plt.subplots(3,3, figsize=(10, 6))
    axes = axes.ravel()
    plot_index = 0
    for name, param in model.named_parameters():
        if param.dim() > 1:
            ax = axes[plot_index]
            ax.hist(param.detach().view(-1).cpu(), bins=bins, density=True,
                    align=align, color = 'blue', alpha = 0.5,
                    edgecolor='black' if bitwidth <= 4 else None)
            if bitwidth <= 4:
                quantized_min, quantized_max = get_quantized_range(bitwidth)
                ax.set_xticks(np.arange(start=quantized_min, stop=quantized_max+1))
            ax.set_xlabel(name)
            ax.set_ylabel('density')
            plot_index += 1
    fig.suptitle(f'Histogram of Weights (bitwidth={bitwidth} bits)')
    fig.tight_layout()
    fig.subplots_adjust(top=0.925)
    plt.show()

recover_model()
plot_weight_distribution(model)

์œ„์˜ ํžˆ์Šคํ† ๊ทธ๋žจ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ๊ฐ€์ค‘์น˜ ๊ฐ’์˜ ๋ถ„ํฌ๋Š” (์ด ๊ฒฝ์šฐ์—๋Š” classifier๋ฅผ ์ œ์™ธํ•˜๊ณ ) ๊ฑฐ์˜ 0์„ ์ค‘์‹ฌ์œผ๋กœ ๋Œ€์นญ์ ์ž…๋‹ˆ๋‹ค . ๋”ฐ๋ผ์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์–‘์žํ™”ํ•  ๋•Œ ๋ณดํ†ต ์ œ๋กœ ํฌ์ธํŠธ \(Z=0\)์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

\(r = S(q-Z)\)์—์„œ,

\(r_{\mathrm{max}} = S \cdot q_{\mathrm{max}}\)

\(S = r_{\mathrm{max}} / q_{\mathrm{max}}\)

๊ฐ€์ค‘์น˜ ๊ฐ’์˜ ์ตœ๋Œ€ ์ ˆ๋Œ“๊ฐ’์„ \(r_{\mathrm{max}}\)๋กœ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

def get_quantization_scale_for_weight(weight, bitwidth):
    """
    get quantization scale for single tensor of weight
    :param weight: [torch.(cuda.)Tensor] floating weight to be quantized
    :param bitwidth: [integer] quantization bit width
    :return:
        [floating scalar] scale
    """
    # we just assume values in weight are symmetric
    # we also always make zero_point 0 for weight
    fp_max = max(weight.abs().max().item(), 5e-7)
    _, quantized_max = get_quantized_range(bitwidth)
    return fp_max / quantized_max

Per-channel Linear Quantization

2D convolution์˜ ๊ฒฝ์šฐ, ๊ฐ€์ค‘์น˜ ํ…์„œ๋Š” (num_output_channels, num_input_channels, kernel_height, kernel_width) ๋ชจ์–‘์˜ 4์ฐจ์› ํ…์„œ์ž…๋‹ˆ๋‹ค.

๋งŽ์€ ์‹คํ—˜๋“ค์„ ํ†ตํ•ด, ์„œ๋กœ ๋‹ค๋ฅธ ์ถœ๋ ฅ ์ฑ„๋„์— ๋Œ€ํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ์Šค์ผ€์ผ๋ง ์ธ์ž \(S\)์™€ ์ œ๋กœ ํฌ์ธํŠธ \(Z\)๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ ์ถœ๋ ฅ ์ฑ„๋„์˜ ์„œ๋ธŒํ…์„œ์— ๋Œ€ํ•œ ์Šค์ผ€์ผ๋ง ์ธ์ž \(S\)์™€ ์ œ๋กœ ํฌ์ธํŠธ \(Z\)๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

def linear_quantize_weight_per_channel(tensor, bitwidth):
    """
    linear quantization for weight tensor
        using different scales and zero_points for different output channels
    :param tensor: [torch.(cuda.)Tensor] floating weight to be quantized
    :param bitwidth: [int] quantization bit width
    :return:
        [torch.(cuda.)Tensor] quantized tensor
        [torch.(cuda.)Tensor] scale tensor
        [int] zero point (which is always 0)
    """
    dim_output_channels = 0
    num_output_channels = tensor.shape[dim_output_channels]
    scale = torch.zeros(num_output_channels, device=tensor.device)
    for oc in range(num_output_channels):
        _subtensor = tensor.select(dim_output_channels, oc)
        _scale = get_quantization_scale_for_weight(_subtensor, bitwidth)
        scale[oc] = _scale
    scale_shape = [1] * tensor.dim()
    scale_shape[dim_output_channels] = -1
    scale = scale.view(scale_shape)
    quantized_tensor = linear_quantize(tensor, bitwidth, scale, zero_point=0)
    return quantized_tensor, scale, 0

A Quick Peek at Linear Quantization on Weights

์ด์ œ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•ด linear quantization๋ฅผ ์ ์šฉํ•  ๋•Œ ๊ฐ€์ค‘์น˜ ๋ถ„ํฌ์™€ ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ์„œ๋กœ ๋‹ค๋ฅธ bitwidths๋กœ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

@torch.no_grad()
def peek_linear_quantization():
    for bitwidth in [4, 2]:
        for name, param in model.named_parameters():
            if param.dim() > 1:
                quantized_param, scale, zero_point = \
                    linear_quantize_weight_per_channel(param, bitwidth)
                param.copy_(quantized_param)
        plot_weight_distribution(model, bitwidth)
        recover_model()

peek_linear_quantization()

Quantized Inference

์–‘์žํ™” ํ›„, convolution ๋ฐ fully-connected layer์˜ ์ถ”๋ก ๋„ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.

\(r = S(q-Z)\)๋ฅผ ์ƒ๊ธฐํ•ด ๋ณด๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

\(r_{\mathrm{input}} = S_{\mathrm{input}}(q_{\mathrm{input}}-Z_{\mathrm{input}})\)

\(r_{\mathrm{weight}} = S_{\mathrm{weight}}(q_{\mathrm{weight}}-Z_{\mathrm{weight}})\)

\(r_{\mathrm{bias}} = S_{\mathrm{bias}}(q_{\mathrm{bias}}-Z_{\mathrm{bias}})\)

\(Z_{\mathrm{weight}}=0\)์ด๋ฏ€๋กœ, \(r_{\mathrm{weight}} = S_{\mathrm{weight}}q_{\mathrm{weight}}\)์ž…๋‹ˆ๋‹ค.

๋ถ€๋™ ์†Œ์ˆ˜์  convolution์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

\(r_{\mathrm{output}} = \mathrm{CONV}[r_{\mathrm{input}}, r_{\mathrm{weight}}] + r_{\mathrm{bias}}\) \(\;\;\;\;\;\;\;\;= \mathrm{CONV}[S_{\mathrm{input}}(q_{\mathrm{input}}-Z_{\mathrm{input}}), S_{\mathrm{weight}}q_{\mathrm{weight}}] + S_{\mathrm{bias}}(q_{\mathrm{bias}}-Z_{\mathrm{bias}})\) \(\;\;\;\;\;\;\;\;= \mathrm{CONV}[q_{\mathrm{input}}-Z_{\mathrm{input}}, q_{\mathrm{weight}}]\cdot (S_{\mathrm{input}} \cdot S_{\mathrm{weight}}) + S_{\mathrm{bias}}(q_{\mathrm{bias}}-Z_{\mathrm{bias}})\)

๊ณ„์‚ฐ์„ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด

\(Z_{\mathrm{bias}} = 0\)

\(S_{\mathrm{bias}} = S_{\mathrm{input}} \cdot S_{\mathrm{weight}}\)

๋กœ ์„ค์ •ํ•˜์—ฌ,

\(r_{\mathrm{output}} = (\mathrm{CONV}[q_{\mathrm{input}}-Z_{\mathrm{input}}, q_{\mathrm{weight}}] + q_{\mathrm{bias}})\cdot (S_{\mathrm{input}} \cdot S_{\mathrm{weight}})\) \(\;\;\;\;\;\;\;\;= (\mathrm{CONV}[q_{\mathrm{input}}, q_{\mathrm{weight}}] - \mathrm{CONV}[Z_{\mathrm{input}}, q_{\mathrm{weight}}] + q_{\mathrm{bias}})\cdot (S_{\mathrm{input}}S_{\mathrm{weight}})\)

์ด๋ฉฐ,

\(r_{\mathrm{output}} = S_{\mathrm{output}}(q_{\mathrm{output}}-Z_{\mathrm{output}})\)

์ด๋ฏ€๋กœ

\(S_{\mathrm{output}}(q_{\mathrm{output}}-Z_{\mathrm{output}}) = (\mathrm{CONV}[q_{\mathrm{input}}, q_{\mathrm{weight}}] - \mathrm{CONV}[Z_{\mathrm{input}}, q_{\mathrm{weight}}] + q_{\mathrm{bias}})\cdot (S_{\mathrm{input}} S_{\mathrm{weight}})\)

๋”ฐ๋ผ์„œ

\(q_{\mathrm{output}} = (\mathrm{CONV}[q_{\mathrm{input}}, q_{\mathrm{weight}}] - \mathrm{CONV}[Z_{\mathrm{input}}, q_{\mathrm{weight}}] + q_{\mathrm{bias}})\cdot (S_{\mathrm{input}}S_{\mathrm{weight}} / S_{\mathrm{output}}) + Z_{\mathrm{output}}\)

\(Z_{\mathrm{input}}\), \(q_{\mathrm{weight}}\), \(q_{\mathrm{bias}}\)๋Š” ์ถ”๋ก  ์ „์— ๊ฒฐ์ •๋˜๋ฏ€๋กœ,

\(Q_{\mathrm{bias}} = q_{\mathrm{bias}} - \mathrm{CONV}[Z_{\mathrm{input}}, q_{\mathrm{weight}}]\)

๋กœ ์„ค์ •ํ•˜๋ฉด,

\(q_{\mathrm{output}} = (\mathrm{Linear}[q_{\mathrm{input}}, q_{\mathrm{weight}}] + Q_{\mathrm{bias}})\cdot (S_{\mathrm{input}} \cdot S_{\mathrm{weight}} / S_{\mathrm{output}}) + Z_{\mathrm{output}}\)

Question 6 (5 pts)

bias๋ฅผ linear quantizingํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์™„์„ฑํ•˜์„ธ์š”.

Hint:

์œ„์˜ ์ถ”๋ก ๊ณผ์ •์—์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ์ˆ˜์‹์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

\(Z_{\mathrm{bias}} = 0\)

\(S_{\mathrm{bias}} = S_{\mathrm{input}} \cdot S_{\mathrm{weight}}\)

def linear_quantize_bias_per_output_channel(bias, weight_scale, input_scale):
    """
    linear quantization for single bias tensor
        quantized_bias = fp_bias / bias_scale
    :param bias: [torch.FloatTensor] bias weight to be quantized
    :param weight_scale: [float or torch.FloatTensor] weight scale tensor
    :param input_scale: [float] input scale
    :return:
        [torch.IntTensor] quantized bias tensor
    """
    assert(bias.dim() == 1)
    assert(bias.dtype == torch.float)
    assert(isinstance(input_scale, float))
    if isinstance(weight_scale, torch.Tensor):
        assert(weight_scale.dtype == torch.float)
        weight_scale = weight_scale.view(-1)
        assert(bias.numel() == weight_scale.numel())

    ############### YOUR CODE STARTS HERE ###############
    bias_scale = weight_scale * input_scale
    ############### YOUR CODE ENDS HERE #################

    quantized_bias = linear_quantize(bias, 32, bias_scale,
                                     zero_point=0, dtype=torch.int32)
    return quantized_bias, bias_scale, 0

Quantized Fully-Connected Layer

์–‘์žํ™”๋œ fully-connected layer์˜ ๊ฒฝ์šฐ, \(Q_{\mathrm{bias}}\)๋ฅผ ๋จผ์ € ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. \(Q_{\mathrm{bias}} = q_{\mathrm{bias}} - \mathrm{Linear}[Z_{\mathrm{input}}, q_{\mathrm{weight}}]\)๋ฅผ ๊ธฐ์–ตํ•˜์„ธ์š”.

def shift_quantized_linear_bias(quantized_bias, quantized_weight, input_zero_point):
    """
    shift quantized bias to incorporate input_zero_point for nn.Linear
        shifted_quantized_bias = quantized_bias - Linear(input_zero_point, quantized_weight)
    :param quantized_bias: [torch.IntTensor] quantized bias (torch.int32)
    :param quantized_weight: [torch.CharTensor] quantized weight (torch.int8)
    :param input_zero_point: [int] input zero point
    :return:
        [torch.IntTensor] shifted quantized bias tensor
    """
    assert(quantized_bias.dtype == torch.int32)
    assert(isinstance(input_zero_point, int))
    return quantized_bias - quantized_weight.sum(1).to(torch.int32) * input_zero_point

Question 7 (15 pts)

์•„๋ž˜์˜ ์–‘์žํ™”๋œ fully-connected layer inference function๋ฅผ ์™„์„ฑํ•˜์„ธ์š”.

Hint:

\(q_{\mathrm{output}} = (\mathrm{Linear}[q_{\mathrm{input}}, q_{\mathrm{weight}}] + Q_{\mathrm{bias}})\cdot (S_{\mathrm{input}} S_{\mathrm{weight}} / S_{\mathrm{output}}) + Z_{\mathrm{output}}\)

def quantized_linear(input, weight, bias, feature_bitwidth, weight_bitwidth,
                     input_zero_point, output_zero_point,
                     input_scale, weight_scale, output_scale):
    """
    quantized fully-connected layer
    :param input: [torch.CharTensor] quantized input (torch.int8)
    :param weight: [torch.CharTensor] quantized weight (torch.int8)
    :param bias: [torch.IntTensor] shifted quantized bias or None (torch.int32)
    :param feature_bitwidth: [int] quantization bit width of input and output
    :param weight_bitwidth: [int] quantization bit width of weight
    :param input_zero_point: [int] input zero point
    :param output_zero_point: [int] output zero point
    :param input_scale: [float] input feature scale
    :param weight_scale: [torch.FloatTensor] weight per-channel scale
    :param output_scale: [float] output feature scale
    :return:
        [torch.CharIntTensor] quantized output feature (torch.int8)
    """
    assert(input.dtype == torch.int8)
    assert(weight.dtype == input.dtype)
    assert(bias is None or bias.dtype == torch.int32)
    assert(isinstance(input_zero_point, int))
    assert(isinstance(output_zero_point, int))
    assert(isinstance(input_scale, float))
    assert(isinstance(output_scale, float))
    assert(weight_scale.dtype == torch.float)

    # Step 1: integer-based fully-connected (8-bit multiplication with 32-bit accumulation)
    if 'cpu' in input.device.type:
        # use 32-b MAC for simplicity
        output = torch.nn.functional.linear(input.to(torch.int32), weight.to(torch.int32), bias)
    else:
        # current version pytorch does not yet support integer-based linear() on GPUs
        output = torch.nn.functional.linear(input.float(), weight.float(), bias.float())

    ############### YOUR CODE STARTS HERE ###############
    # Step 2: scale the output
    #         hint: 1. scales are floating numbers, we need to convert output to float as well
    #               2. the shape of weight scale is [oc, 1, 1, 1] while the shape of output is [batch_size, oc]
    real_scale = input_scale * weight_scale.view(-1) / output_scale
    output = output.float() * real_scale

    # Step 3: Shift output by output_zero_point
    output += output_zero_point
    ############### YOUR CODE STARTS HERE ###############

    # Make sure all value lies in the bitwidth-bit range
    output = output.round().clamp(*get_quantized_range(feature_bitwidth)).to(torch.int8)
    return output

Letโ€™s verify the functionality of defined quantized fully connected layer.

test_quantized_fc()
* Test quantized_fc()
    target bitwidth: 2 bits
      batch size: 4
      input channels: 8
      output channels: 8
* Test passed.

Quantized Convolution

์–‘์žํ™”๋œ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด์˜ ๊ฒฝ์šฐ, ๋จผ์ € \(Q_{\mathrm{bias}}\)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. \(Q_{\mathrm{bias}} = q_{\mathrm{bias}} - \mathrm{CONV}[Z_{\mathrm{input}}, q_{\mathrm{weight}}]\)๋ฅผ ๊ธฐ์–ตํ•˜์„ธ์š”.

def shift_quantized_conv2d_bias(quantized_bias, quantized_weight, input_zero_point):
    """
    shift quantized bias to incorporate input_zero_point for nn.Conv2d
        shifted_quantized_bias = quantized_bias - Conv(input_zero_point, quantized_weight)
    :param quantized_bias: [torch.IntTensor] quantized bias (torch.int32)
    :param quantized_weight: [torch.CharTensor] quantized weight (torch.int8)
    :param input_zero_point: [int] input zero point
    :return:
        [torch.IntTensor] shifted quantized bias tensor
    """
    assert(quantized_bias.dtype == torch.int32)
    assert(isinstance(input_zero_point, int))
    return quantized_bias - quantized_weight.sum((1,2,3)).to(torch.int32) * input_zero_point

Question 8 (15 pts)

์•„๋ž˜์˜ quantized convolution function์„ ์™„์„ฑํ•˜์„ธ์š”.

Hint: > \(q_{\mathrm{output}} = (\mathrm{CONV}[q_{\mathrm{input}}, q_{\mathrm{weight}}] + Q_{\mathrm{bias}}) \cdot (S_{\mathrm{input}}S_{\mathrm{weight}} / S_{\mathrm{output}}) + Z_{\mathrm{output}}\)

def quantized_conv2d(input, weight, bias, feature_bitwidth, weight_bitwidth,
                     input_zero_point, output_zero_point,
                     input_scale, weight_scale, output_scale,
                     stride, padding, dilation, groups):
    """
    quantized 2d convolution
    :param input: [torch.CharTensor] quantized input (torch.int8)
    :param weight: [torch.CharTensor] quantized weight (torch.int8)
    :param bias: [torch.IntTensor] shifted quantized bias or None (torch.int32)
    :param feature_bitwidth: [int] quantization bit width of input and output
    :param weight_bitwidth: [int] quantization bit width of weight
    :param input_zero_point: [int] input zero point
    :param output_zero_point: [int] output zero point
    :param input_scale: [float] input feature scale
    :param weight_scale: [torch.FloatTensor] weight per-channel scale
    :param output_scale: [float] output feature scale
    :return:
        [torch.(cuda.)CharTensor] quantized output feature
    """
    assert(len(padding) == 4)
    assert(input.dtype == torch.int8)
    assert(weight.dtype == input.dtype)
    assert(bias is None or bias.dtype == torch.int32)
    assert(isinstance(input_zero_point, int))
    assert(isinstance(output_zero_point, int))
    assert(isinstance(input_scale, float))
    assert(isinstance(output_scale, float))
    assert(weight_scale.dtype == torch.float)

    # Step 1: calculate integer-based 2d convolution (8-bit multiplication with 32-bit accumulation)
    input = torch.nn.functional.pad(input, padding, 'constant', input_zero_point)
    if 'cpu' in input.device.type:
        # use 32-b MAC for simplicity
        output = torch.nn.functional.conv2d(input.to(torch.int32), weight.to(torch.int32), None, stride, 0, dilation, groups)
    else:
        # current version pytorch does not yet support integer-based conv2d() on GPUs
        output = torch.nn.functional.conv2d(input.float(), weight.float(), None, stride, 0, dilation, groups)
        output = output.round().to(torch.int32)
    if bias is not None:
        output = output + bias.view(1, -1, 1, 1)

    ############### YOUR CODE STARTS HERE ###############
    # hint: this code block should be the very similar to quantized_linear()

    # Step 2: scale the output
    #         hint: 1. scales are floating numbers, we need to convert output to float as well
    #               2. the shape of weight scale is [oc, 1, 1, 1] while the shape of output is [batch_size, oc, height, width]
    real_scale = input_scale * weight_scale.view(-1) / output_scale
    output = output.float() * real_scale.unsqueeze(1).unsqueeze(2)

    # Step 3: shift output by output_zero_point
    #         hint: one line of code
    output += output_zero_point
    ############### YOUR CODE STARTS HERE ###############

    # Make sure all value lies in the bitwidth-bit range
    output = output.round().clamp(*get_quantized_range(feature_bitwidth)).to(torch.int8)
    return output

Question 9 (10 pts)

๋งˆ์ง€๋ง‰์œผ๋กœ ๋ชจ๋“  ๊ฒƒ์„ ์ข…ํ•ฉํ•˜์—ฌ ๋ชจ๋ธ์— ๋Œ€ํ•œ ํ›ˆ๋ จ ํ›„ int8 ์–‘์žํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด์™€ ์„ ํ˜• ๋ ˆ์ด์–ด๋ฅผ ํ•˜๋‚˜์”ฉ ์–‘์žํ™”๋œ ๋ฒ„์ „์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

  1. ๋จผ์ €, BatchNorm ๊ณ„์ธต์„ ์ด์ „ convolutional layer์— ์œตํ•ฉํ•  ๊ฒƒ์ด๋ฉฐ, ์ด๋Š” ์–‘์žํ™” ์ „์— ํ•˜๋Š” ํ‘œ์ค€ ๊ด€ํ–‰์ž…๋‹ˆ๋‹ค. BatchNorm์„ ์œตํ•ฉํ•˜๋ฉด ์ถ”๋ก  ์ค‘์— ์ถ”๊ฐ€ ๊ณฑ์…ˆ์ด ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.

์œตํ•ฉ ๋ชจ๋ธ์ธ model_fused๊ฐ€ ์›๋ž˜ ๋ชจ๋ธ๊ณผ ๋™์ผํ•œ ์ •ํ™•๋„๋ฅผ ๊ฐ–๋Š”์ง€๋„ ๊ฒ€์ฆํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค(BN fusion์€ ๋„คํŠธ์›Œํฌ ๊ธฐ๋Šฅ์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š” ๋™๋“ฑํ•œ ๋ณ€ํ™˜์ž…๋‹ˆ๋‹ค).

def fuse_conv_bn(conv, bn):
    # modified from https://mmcv.readthedocs.io/en/latest/_modules/mmcv/cnn/utils/fuse_conv_bn.html
    assert conv.bias is None

    factor = bn.weight.data / torch.sqrt(bn.running_var.data + bn.eps)
    conv.weight.data = conv.weight.data * factor.reshape(-1, 1, 1, 1)
    conv.bias = nn.Parameter(- bn.running_mean.data * factor + bn.bias.data)

    return conv

print('Before conv-bn fusion: backbone length', len(model.backbone))
#  fuse the batchnorm into conv layers
recover_model()
model_fused = copy.deepcopy(model)
fused_backbone = []
ptr = 0
while ptr < len(model_fused.backbone):
    if isinstance(model_fused.backbone[ptr], nn.Conv2d) and \
        isinstance(model_fused.backbone[ptr + 1], nn.BatchNorm2d):
        fused_backbone.append(fuse_conv_bn(
            model_fused.backbone[ptr], model_fused.backbone[ptr+ 1]))
        ptr += 2
    else:
        fused_backbone.append(model_fused.backbone[ptr])
        ptr += 1
model_fused.backbone = nn.Sequential(*fused_backbone)

print('After conv-bn fusion: backbone length', len(model_fused.backbone))
# sanity check, no BN anymore
for m in model_fused.modules():
    assert not isinstance(m, nn.BatchNorm2d)

#  the accuracy will remain the same after fusion
fused_acc = evaluate(model_fused, dataloader['test'])
print(f'Accuracy of the fused model={fused_acc:.2f}%')
Before conv-bn fusion: backbone length 29
After conv-bn fusion: backbone length 21
Accuracy of the fused model=92.95%
  1. ๊ฐ ํŠน์ง• ๋งต์˜ ๋ฒ”์œ„๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ์ผ๋ถ€ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜์—ฌ ํŠน์ง• ๋งต์˜ ๋ฒ”์œ„๋ฅผ ์–ป๊ณ , ํ•ด๋‹น ์Šค์ผ€์ผ๋ง ํŒฉํ„ฐ์™€ ์ œ๋กœ ํฌ์ธํŠธ๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# add hook to record the min max value of the activation
input_activation = {}
output_activation = {}

def add_range_recoder_hook(model):
    import functools
    def _record_range(self, x, y, module_name):
        x = x[0]
        input_activation[module_name] = x.detach()
        output_activation[module_name] = y.detach()

    all_hooks = []
    for name, m in model.named_modules():
        if isinstance(m, (nn.Conv2d, nn.Linear, nn.ReLU)):
            all_hooks.append(m.register_forward_hook(
                functools.partial(_record_range, module_name=name)))
    return all_hooks

hooks = add_range_recoder_hook(model_fused)
sample_data = iter(dataloader['train']).__next__()[0]
model_fused(sample_data.cuda())

# remove hooks
for h in hooks:
    h.remove()
  1. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ชจ๋ธ ์–‘์žํ™”๋ฅผ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋งคํ•‘์œผ๋กœ ๋ชจ๋ธ์„ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
nn.Conv2d: QuantizedConv2d,
nn.Linear: QuantizedLinear,
# the following twos are just wrappers, as current
# torch modules do not support int8 data format;
# we will temporarily convert them to fp32 for computation
nn.MaxPool2d: QuantizedMaxPool2d,
nn.AvgPool2d: QuantizedAvgPool2d,
class QuantizedConv2d(nn.Module):
    def __init__(self, weight, bias,
                 input_zero_point, output_zero_point,
                 input_scale, weight_scale, output_scale,
                 stride, padding, dilation, groups,
                 feature_bitwidth=8, weight_bitwidth=8):
        super().__init__()
        # current version Pytorch does not support IntTensor as nn.Parameter
        self.register_buffer('weight', weight)
        self.register_buffer('bias', bias)

        self.input_zero_point = input_zero_point
        self.output_zero_point = output_zero_point

        self.input_scale = input_scale
        self.register_buffer('weight_scale', weight_scale)
        self.output_scale = output_scale

        self.stride = stride
        self.padding = (padding[1], padding[1], padding[0], padding[0])
        self.dilation = dilation
        self.groups = groups

        self.feature_bitwidth = feature_bitwidth
        self.weight_bitwidth = weight_bitwidth


    def forward(self, x):
        return quantized_conv2d(
            x, self.weight, self.bias,
            self.feature_bitwidth, self.weight_bitwidth,
            self.input_zero_point, self.output_zero_point,
            self.input_scale, self.weight_scale, self.output_scale,
            self.stride, self.padding, self.dilation, self.groups
            )

class QuantizedLinear(nn.Module):
    def __init__(self, weight, bias,
                 input_zero_point, output_zero_point,
                 input_scale, weight_scale, output_scale,
                 feature_bitwidth=8, weight_bitwidth=8):
        super().__init__()
        # current version Pytorch does not support IntTensor as nn.Parameter
        self.register_buffer('weight', weight)
        self.register_buffer('bias', bias)

        self.input_zero_point = input_zero_point
        self.output_zero_point = output_zero_point

        self.input_scale = input_scale
        self.register_buffer('weight_scale', weight_scale)
        self.output_scale = output_scale

        self.feature_bitwidth = feature_bitwidth
        self.weight_bitwidth = weight_bitwidth

    def forward(self, x):
        return quantized_linear(
            x, self.weight, self.bias,
            self.feature_bitwidth, self.weight_bitwidth,
            self.input_zero_point, self.output_zero_point,
            self.input_scale, self.weight_scale, self.output_scale
            )

class QuantizedMaxPool2d(nn.MaxPool2d):
    def forward(self, x):
        # current version PyTorch does not support integer-based MaxPool
        return super().forward(x.float()).to(torch.int8)

class QuantizedAvgPool2d(nn.AvgPool2d):
    def forward(self, x):
        # current version PyTorch does not support integer-based AvgPool
        return super().forward(x.float()).to(torch.int8)

# we use int8 quantization, which is quite popular
feature_bitwidth = weight_bitwidth = 8
quantized_model = copy.deepcopy(model_fused)
quantized_backbone = []
ptr = 0
while ptr < len(quantized_model.backbone):
    if isinstance(quantized_model.backbone[ptr], nn.Conv2d) and \
        isinstance(quantized_model.backbone[ptr + 1], nn.ReLU):
        conv = quantized_model.backbone[ptr]
        conv_name = f'backbone.{ptr}'
        relu = quantized_model.backbone[ptr + 1]
        relu_name = f'backbone.{ptr + 1}'

        input_scale, input_zero_point = \
            get_quantization_scale_and_zero_point(
                input_activation[conv_name], feature_bitwidth)

        output_scale, output_zero_point = \
            get_quantization_scale_and_zero_point(
                output_activation[relu_name], feature_bitwidth)

        quantized_weight, weight_scale, weight_zero_point = \
            linear_quantize_weight_per_channel(conv.weight.data, weight_bitwidth)
        quantized_bias, bias_scale, bias_zero_point = \
            linear_quantize_bias_per_output_channel(
                conv.bias.data, weight_scale, input_scale)
        shifted_quantized_bias = \
            shift_quantized_conv2d_bias(quantized_bias, quantized_weight,
                                        input_zero_point)

        quantized_conv = QuantizedConv2d(
            quantized_weight, shifted_quantized_bias,
            input_zero_point, output_zero_point,
            input_scale, weight_scale, output_scale,
            conv.stride, conv.padding, conv.dilation, conv.groups,
            feature_bitwidth=feature_bitwidth, weight_bitwidth=weight_bitwidth
        )

        quantized_backbone.append(quantized_conv)
        ptr += 2
    elif isinstance(quantized_model.backbone[ptr], nn.MaxPool2d):
        quantized_backbone.append(QuantizedMaxPool2d(
            kernel_size=quantized_model.backbone[ptr].kernel_size,
            stride=quantized_model.backbone[ptr].stride
            ))
        ptr += 1
    elif isinstance(quantized_model.backbone[ptr], nn.AvgPool2d):
        quantized_backbone.append(QuantizedAvgPool2d(
            kernel_size=quantized_model.backbone[ptr].kernel_size,
            stride=quantized_model.backbone[ptr].stride
            ))
        ptr += 1
    else:
        raise NotImplementedError(type(quantized_model.backbone[ptr]))  # should not happen
quantized_model.backbone = nn.Sequential(*quantized_backbone)

# finally, quantized the classifier
fc_name = 'classifier'
fc = model.classifier
input_scale, input_zero_point = \
    get_quantization_scale_and_zero_point(
        input_activation[fc_name], feature_bitwidth)

output_scale, output_zero_point = \
    get_quantization_scale_and_zero_point(
        output_activation[fc_name], feature_bitwidth)

quantized_weight, weight_scale, weight_zero_point = \
    linear_quantize_weight_per_channel(fc.weight.data, weight_bitwidth)
quantized_bias, bias_scale, bias_zero_point = \
    linear_quantize_bias_per_output_channel(
        fc.bias.data, weight_scale, input_scale)
shifted_quantized_bias = \
    shift_quantized_linear_bias(quantized_bias, quantized_weight,
                                input_zero_point)

quantized_model.classifier = QuantizedLinear(
    quantized_weight, shifted_quantized_bias,
    input_zero_point, output_zero_point,
    input_scale, weight_scale, output_scale,
    feature_bitwidth=feature_bitwidth, weight_bitwidth=weight_bitwidth
)

์–‘์žํ™” ๊ณผ์ •์ด ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ธ์‡„ํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•˜๋ฉฐ ์–‘์žํ™”๋œ ๋ชจ๋ธ์˜ ์ •ํ™•์„ฑ๋„ ๊ฒ€์ฆํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Question 9.1 (5 pts)

์–‘์žํ™”๋œ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” (0, 1) ๋ฒ”์œ„์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ (-128, 127) ๋ฒ”์œ„์˜ int8 ๋ฒ”์œ„๋กœ ๋งคํ•‘ํ•˜๋Š” ์ถ”๊ฐ€์ ์ธ ์ „์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ „์ฒ˜๋ฆฌ๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์™„์„ฑํ•˜์„ธ์š”.

Hint: ์–‘์žํ™”๋œ ๋ชจ๋ธ์€ fp32 ๋ชจ๋ธ๊ณผ ๊ฑฐ์˜ ๋™์ผํ•œ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

print(quantized_model)

def extra_preprocess(x):
    # hint: you need to convert the original fp32 input of range (0, 1)
    #  into int8 format of range (-128, 127)
    ############### YOUR CODE STARTS HERE ###############
    x_scaled = x * 255
    x_shifted = x_scaled - 128
    return x_shifted.clamp(-128, 127).to(torch.int8)
    ############### YOUR CODE ENDS HERE #################

int8_model_accuracy = evaluate(quantized_model, dataloader['test'],
                               extra_preprocess=[extra_preprocess])
print(f"int8 model has accuracy={int8_model_accuracy:.2f}%")
VGG(
  (backbone): Sequential(
    (0): QuantizedConv2d()
    (1): QuantizedConv2d()
    (2): QuantizedMaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): QuantizedConv2d()
    (4): QuantizedConv2d()
    (5): QuantizedMaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): QuantizedConv2d()
    (7): QuantizedConv2d()
    (8): QuantizedMaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (9): QuantizedConv2d()
    (10): QuantizedConv2d()
    (11): QuantizedMaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (12): QuantizedAvgPool2d(kernel_size=2, stride=2, padding=0)
  )
  (classifier): QuantizedLinear()
)
int8 model has accuracy=92.90%

Question 9.2 (Bonus Question; 5 pts)

linear quantized model์— ReLU ์ธต์ด ์—†๋Š” ์ด์œ ๋ฅผ ์„ค๋ช…ํ•˜์„ธ์š”.

Your Answer:

์„ ํ˜•(Linear) ์–‘์žํ™” ๋ชจ๋ธ์—์„œ ReLU(Rectified Linear Unit) ์ธต์ด ์—†๋Š” ์ด์œ ๋Š” ์ฃผ๋กœ ์–‘์žํ™” ๊ณผ์ •์—์„œ์˜ ๋ฐ์ดํ„ฐ ํ‘œํ˜„ ๋ฐฉ์‹๊ณผ ์—ฐ์‚ฐ์˜ ํšจ์œจ์„ฑ๊ณผ ๊ด€๋ จ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์–‘์žํ™”๋Š” ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋‚˜ ํ™œ์„ฑํ™”๋ฅผ ๊ณ ์ •๋œ ๋น„ํŠธ ๋„ˆ๋น„(์˜ˆ: 8๋น„ํŠธ)์˜ ์ •์ˆ˜๋กœ ์ œํ•œํ•˜์—ฌ ์ €์žฅํ•˜๊ณ  ๊ณ„์‚ฐํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ œํ•œ์€ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ณ , ๊ณ„์‚ฐ ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ์ €์ „๋ ฅ ์žฅ์น˜์—์„œ์˜ ์‹คํ–‰์„ ์šฉ์ดํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ณผ์ •์—์„œ ๋ฐ์ดํ„ฐ์˜ ์ •๋ฐ€๋„๊ฐ€ ์†์‹ค๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋ชจ๋ธ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ์ด ์–‘์ˆ˜์ผ ๊ฒฝ์šฐ ๊ทธ๋Œ€๋กœ ์ถœ๋ ฅํ•˜๊ณ , ์Œ์ˆ˜์ผ ๊ฒฝ์šฐ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ฐ„๋‹จํ•˜๊ณ  ํšจ์œจ์ ์ธ ๋น„์„ ํ˜• ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ReLU๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋ฉฐ, ํŠนํžˆ ์€๋‹‰์ธต์—์„œ ๋น„์„ ํ˜•์„ฑ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์˜ ํ‘œํ˜„๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

์„ ํ˜• ์–‘์žํ™” ๋ชจ๋ธ์—์„œ ReLU ์ธต์ด ์—†๋Š” ์ฃผ๋œ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. ์–‘์žํ™”๋œ ๋ฐ์ดํ„ฐ์˜ ๋ฒ”์œ„ ์ œํ•œ: ์ •์ˆ˜ ์–‘์žํ™” ๊ณผ์ •์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ • ๋ฒ”์œ„ ๋‚ด์˜ ๊ฐ’์œผ๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 8๋น„ํŠธ ์–‘์žํ™”์—์„œ๋Š” ๊ฐ’์ด -128๋ถ€ํ„ฐ 127๊นŒ์ง€์˜ ์ •์ˆ˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฒ”์œ„ ๋‚ด์—์„œ ReLU๋ฅผ ์ ์šฉํ•˜๋ฉด ์Œ์ˆ˜ ๊ฐ’์ด ๋ชจ๋‘ 0์œผ๋กœ ๋ณ€ํ™˜๋˜์–ด, ์–‘์ˆ˜ ๊ฐ’๋งŒ ๋‚จ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๋ฐ์ดํ„ฐ์˜ ๋ฒ”์œ„๊ฐ€ ๋”์šฑ ์ œํ•œ๋˜์–ด, ์–‘์žํ™”๋œ ๋ชจ๋ธ์˜ ํ‘œํ˜„๋ ฅ์ด ๋”์šฑ ๊ฐ์†Œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  2. ํšจ์œจ์„ฑ: ์–‘์žํ™”๋œ ๋ชจ๋ธ์€ ๊ฐ€๋Šฅํ•œ ํ•œ ๊ณ„์‚ฐ์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์œ ์ง€ํ•˜์—ฌ ๋น ๋ฅธ ์ถ”๋ก  ์†๋„์™€ ๋‚ฎ์€ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๋‹ฌ์„ฑํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ReLU์™€ ๊ฐ™์€ ๋น„์„ ํ˜• ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด, ์ถ”๋ก  ๊ณผ์ •์—์„œ ์ถ”๊ฐ€์ ์ธ ๊ณ„์‚ฐ์ด ํ•„์š”ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋‚˜ ๋ชฉ์ ์— ๋”ฐ๋ผ ์ด๋Ÿฌํ•œ ์ถ”๊ฐ€ ๊ณ„์‚ฐ ์—†์ด๋„ ์ถฉ๋ถ„ํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ReLU ์ธต์„ ์ƒ๋žตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  3. ๋ชจ๋ธ ์„ค๊ณ„์™€ ๋ชฉ์ : ํŠน์ • ์–‘์žํ™” ๋ชจ๋ธ์—์„œ๋Š” ์„ฑ๋Šฅ ์œ ์ง€๋ฅผ ์œ„ํ•ด ReLU ๋Œ€์‹  ๋‹ค๋ฅธ ๊ธฐ๋ฒ•์ด๋‚˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์–‘์žํ™” ์ „ ๋ชจ๋ธ์—์„œ ReLU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ , ์–‘์žํ™” ๊ณผ์ •์—์„œ ์ตœ์ ํ™”๋œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์„ ํƒํ•˜๊ฑฐ๋‚˜, ReLU์˜ ํšจ๊ณผ๋ฅผ ๋ชจ๋ฐฉํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ๋ชจ์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ, ์„ ํ˜• ์–‘์žํ™” ๋ชจ๋ธ์—์„œ ReLU ์ธต์˜ ๋ถ€์žฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ฒ”์œ„ ์ œํ•œ, ๊ณ„์‚ฐ ํšจ์œจ์„ฑ, ๊ทธ๋ฆฌ๊ณ  ํŠน์ • ๋ชจ๋ธ ์„ค๊ณ„์™€ ๋ชฉ์ ์— ๊ธฐ์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์„ค๊ณ„์ž๋Š” ์„ฑ๋Šฅ, ์†๋„, ํฌ๊ธฐ ๋“ฑ์˜ ์š”๊ตฌ ์‚ฌํ•ญ์„ ๊ท ํ˜• ์žˆ๊ฒŒ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ ์˜ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ๊ฒฐ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Question 10 (5 pts)

k-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”์™€ ์„ ํ˜• ์–‘์žํ™”์˜ ์žฅ๋‹จ์ ์„ ๋น„๊ตํ•ด ๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ์ •ํ™•๋„, ์ง€์—ฐ ์‹œ๊ฐ„, ํ•˜๋“œ์›จ์–ด ์ง€์› ๋“ฑ์˜ ๊ด€์ ์—์„œ ์ƒ๊ฐํ•ด๋ณด์„ธ์š”.

Your Answer:

K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”์™€ ์„ ํ˜• ์–‘์žํ™”๋Š” ๋ฐ์ดํ„ฐ์˜ ์ •๋ฐ€๋„๋ฅผ ์ค„์ด๊ฑฐ๋‚˜ ํฌ๊ธฐ๋ฅผ ์ถ•์†Œํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋‘ ๊ฐ€์ง€ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ์ด๋“ค ๊ธฐ์ˆ ์€ ํŠน์ • ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์— ๋” ์ ํ•ฉํ•œ ๋‹ค์–‘ํ•œ ํŠน์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ์ •ํ™•๋„, ์ง€์—ฐ ์‹œ๊ฐ„ ๋ฐ ํ•˜๋“œ์›จ์–ด ์ง€์›์˜ ๊ด€์ ์—์„œ ๋น„๊ตํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

์ •ํ™•๋„:

  • K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”: ์ฃผ์–ด์ง„ ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜์— ๋Œ€ํ•ด ์–‘์žํ™” ์˜ค๋ฅ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ผ๋ฐ˜์ ์œผ๋กœ ์„ ํ˜• ์–‘์žํ™”๋ณด๋‹ค ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. K-means ์–‘์žํ™”๋Š” ๋น„์Šทํ•œ ๊ฐ’์„ ํ•จ๊ป˜ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ์ ์‘ํ•˜๋ฏ€๋กœ, ํŠนํžˆ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ท ์ผํ•˜์ง€ ์•Š์€ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅผ ๋•Œ ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ๋ณด์กดํ•ฉ๋‹ˆ๋‹ค.
  • ์„ ํ˜• ์–‘์žํ™”: ์ด ๋ฐฉ๋ฒ•์€ ๋ฐ์ดํ„ฐ ๊ฐ’์˜ ์ „์ฒด ๋ฒ”์œ„์— ๊ฑธ์ณ ๊ท ์ผํ•œ ์Šค์ผ€์ผ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋” ๋‹จ์ˆœํ•˜์ง€๋งŒ, K-means๋งŒํผ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŠนํžˆ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ท ์ผ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์—๋Š” ์–‘์žํ™” ์˜ค๋ฅ˜๊ฐ€ ๋” ํด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ง€์—ฐ ์‹œ๊ฐ„:

  • K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”: ์ตœ์ ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์€ ๊ณ„์‚ฐ์ด ๋งŽ์ด ํ•„์š”ํ•˜๊ณ  ํŠนํžˆ ํฐ ๋ฐ์ดํ„ฐ์…‹์ด๋‚˜ ๋†’์€ ์ฐจ์›์—์„œ ๋” ๋Š๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” K-means ์–‘์žํ™”๊ฐ€ ์„ ํ˜• ์–‘์žํ™”์— ๋น„ํ•ด ์–‘์žํ™” ๊ณผ์ •์—์„œ ๋” ๋งŽ์€ ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋„์ž…ํ•  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • ์„ ํ˜• ์–‘์žํ™”: ๋‹จ์ˆœ์„ฑ์œผ๋กœ ์ธํ•ด, ์„ ํ˜• ์–‘์žํ™”๋Š” K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”๋ณด๋‹ค ์ผ๋ฐ˜์ ์œผ๋กœ ๊ณ„์‚ฐ ์†๋„๊ฐ€ ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค. ์ด๋Š” ๋‹จ์ˆœํ•œ ์‚ฐ์ˆ  ์—ฐ์‚ฐ๋งŒ์„ ํฌํ•จํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ค‘์š”ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ๋” ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

ํ•˜๋“œ์›จ์–ด ์ง€์›:

  • K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”: ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•œ ์ „์šฉ ์ง€์› ์—†์ด K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜„๋Œ€์˜ GPU์™€ ์ „๋ฌธ ๊ฐ€์†๊ธฐ(์˜ˆ: TPU)๋Š” ์ด๋Ÿฌํ•œ ์ž‘์—…์„ ๋” ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ณต์žก์„ฑ์€ ์—ฌ์ „ํžˆ ๋‚ฎ์€ ์ „๋ ฅ ๋˜๋Š” ์ž„๋ฒ ๋””๋“œ ์žฅ์น˜์—์„œ ์‚ฌ์šฉ์„ ์ œํ•œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์„ ํ˜• ์–‘์žํ™”: ๊ทธ ๋‹จ์ˆœํ•จ์œผ๋กœ ์ธํ•ด ์„ ํ˜• ์–‘์žํ™”๋Š” ์ €์ „๋ ฅ ๋ฐ ์ž„๋ฒ ๋””๋“œ ์žฅ์น˜๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด์—์„œ ๋” ์‰ฝ๊ฒŒ ๊ตฌํ˜„๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ๋ง์— ๊ด€๋ จ๋œ ๋ณต์žกํ•œ ์—ฐ์‚ฐ์ด ํ•„์š”ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ์ œํ•œ๋œ ๊ณ„์‚ฐ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ฐ€์ง„ ์žฅ์น˜์—์„œ ๊ตฌํ˜„ํ•˜๊ธฐ ๋” ์‰ฝ์Šต๋‹ˆ๋‹ค.

์š”์•ฝ:

  • K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”๋Š” ๊ธฐ๋ณธ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ๋” ์ ์‘ํ•  ์ˆ˜ ์žˆ์–ด, ๊ณ„์‚ฐ ๋ณต์žก์„ฑ๊ณผ ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ฆ๊ฐ€ํ•˜๋Š” ๋Œ€์‹  ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ณ ๋„์˜ ์ •ํ™•์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๊ณ  ๊ณ„์‚ฐ ์ž์›์ด ์ฃผ์š” ์ œ์•ฝ ์กฐ๊ฑด์ด ์•„๋‹Œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์— ๊ฐ€์žฅ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • ์„ ํ˜• ์–‘์žํ™”๋Š” ๋‹จ์ˆœ์„ฑ, ์†๋„ ๋ฐ ๊ด‘๋ฒ”์œ„ํ•œ ํ•˜๋“œ์›จ์–ด ํ˜ธํ™˜์„ฑ์˜ ๊ท ํ˜•์„ ์ œ๊ณตํ•˜์—ฌ, ๋ณต์žกํ•˜๊ฑฐ๋‚˜ ๊ท ์ผํ•˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ๋Œ€ํ•ด ๋™์ผํ•œ ์ˆ˜์ค€์˜ ์ •ํ™•๋„๋ฅผ ํ•ญ์ƒ ๋‹ฌ์„ฑํ•  ์ˆ˜๋Š” ์—†์ง€๋งŒ ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ์™€ ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ์ด ์ œํ•œ๋œ ์žฅ์น˜์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ํŠน์ • ์š”๊ตฌ ์‚ฌํ•ญ์— ๋”ฐ๋ผ K-means ๊ธฐ๋ฐ˜ ์–‘์žํ™”์™€ ์„ ํ˜• ์–‘์žํ™” ์‚ฌ์ด์—์„œ ์„ ํƒํ•ด์•ผ ํ•˜๋ฉฐ, ์ •ํ™•์„ฑ, ์ฒ˜๋ฆฌ ์ง€์—ฐ ์‹œ๊ฐ„ ๋ฐ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ณ„์‚ฐ ๋ฆฌ์†Œ์Šค์˜ ์ค‘์š”์„ฑ์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.