๐Ÿง‘โ€๐Ÿซ Lecture 3

pruning
lecture
Pruning and Sparsity (Part I)
Author

Seunghyun Oh

Published

January 28, 2024

์•ž์œผ๋กœ ์ด 5์žฅ์— ๊ฑธ์ณ์„œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™” ๊ธฐ๋ฒ•๋“ค์— ๋Œ€ํ•ด์„œ ์†Œ๊ฐœํ•˜๋ ค๊ณ  ํ•œ๋‹ค. ๊ฒฝ๋Ÿ‰ํ™” ๊ธฐ๋ฒ•์œผ๋กœ๋Š” Pruning, Quantization, Neural Network Architecture Search, Knowledge Distillation, ๊ทธ๋ฆฌ๊ณ  Tiny Engine์—์„œ ๋Œ๋ฆฌ๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์„ ์ง„ํ–‰ํ•  ์˜ˆ์ •์ธ๋ฐ ๋ณธ ๋‚ด์šฉ์€ MIT์—์„œ Song Han ๊ต์ˆ˜๋‹˜์ด Fall 2022์— ํ•œ ๊ฐ•์˜ TinyML and Efficient Deep Learning Computing 6.S965๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์žฌ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ด๋‹ค. ๊ฐ•์˜ ์ž๋ฃŒ์™€ ์˜์ƒ์€ ์ด ๋งํฌ๋ฅผ ์ฐธ์กฐํ•˜์ž!

์ฒซ ๋ฒˆ์งธ ๋‚ด์šฉ์œผ๋กœ โ€œ๊ฐ€์ง€์น˜๊ธฐโ€๋ผ๋Š” ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ Pruning์— ๋Œ€ํ•ด์„œ ์ด์•ผ๊ธฐ, ์‹œ์ž‘!

1. Introduction to Pruning

Pruning์ด๋ž€ ์˜๋ฏธ์ฒ˜๋Ÿผ Neural Network์—์„œ ๋งค๊ฐœ๋ณ€์ˆ˜(๋…ธ๋“œ)๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋Š” Dropoutํ•˜๊ณ  ๋น„์Šทํ•œ ์˜๋ฏธ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, Dropout์˜ ๊ฒฝ์šฐ ๋ชจ๋ธ ํ›ˆ๋ จ ๋„์ค‘ ๋žœ๋ค์ ์œผ๋กœ ํŠน์ • ๋…ธ๋“œ๋ฅผ ์ œ์™ธ์‹œํ‚ค๊ณ  ํ›ˆ๋ จ์‹œ์ผœ ๋ชจ๋ธ์˜ Robustness๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํ›ˆ๋ จ์„ ํ•˜๊ณ ๋‚˜์„œ๋„ ๋ชจ๋ธ์˜ ๋…ธ๋“œ๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€๊ฐ€ ๋œ๋‹ค. ๋ฐ˜๋ฉด Pruning์˜ ๊ฒฝ์šฐ ํ›ˆ๋ จ์„ ๋งˆ์นœ ํ›„์—, ํŠน์ • Threshold ์ดํ•˜์˜ ๋งค๊ฐœ๋ณ€์ˆ˜(๋…ธ๋“œ)์˜ ๊ฒฝ์šฐ ์‹œ Neural Network์—์„œ ์ œ์™ธ์‹œ์ผœ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋ฉด์„œ ๋™์‹œ์— ์ถ”๋ก  ์†๋„ ๋˜ํ•œ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.

\[ \underset{W_p}{argmin}\ L(x;W_p), \text{ subject to } \lvert\lvert W_p\lvert\lvert_0\ < N \]

  • L represents the objective function for neural network training
  • \(x\) is input, \(W\) is original weights, \(W_p\) is pruned weights
  • \(\lvert\lvert W_p\lvert\lvert_0\) calcuates the #nonzeros in \(W_p\) and \(N\) is the target #nonzeros

์ด๋Š” ์œ„์™€ ๊ฐ™์€ ์‹์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ํŠน์ • W ์˜ ๊ฒฝ์šฐ 0 ์œผ๋กœ ๋งŒ๋“ค์–ด ๋…ธ๋“œ๋ฅผ ์—†์• ๋Š” ๊ฒฝ์šฐ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒ Pruningํ•œ Neural Network๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ ์ฒ˜๋Ÿผ ๋œ๋‹ค.

Reference. MIT-TinyML-lecture03-Pruning-1

๊ทธ๋Ÿผ ์™œ Pruning์„ ํ•˜๋Š” ๊ฑธ๊นŒ? ๊ฐ•์˜์—์„œ Pruning์„ ์‚ฌ์šฉํ•˜๋ฉด Latency, Memeory์™€ ๊ฐ™์€ ๋ฆฌ์†Œ์Šค๋ฅผ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ด€๋ จ๋œ ์•„๋ž˜๊ฐ™์€ ์—ฐ๊ตฌ๊ฒฐ๊ณผ๋ฅผ ๊ฐ™์ด ๋ณด์—ฌ์ค€๋‹ค.

Reference. MIT-TinyML-lecture03-Pruning-1

Song Han ๊ต์ˆ˜๋‹˜์€ Vision ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™” ์—ฐ๊ตฌ๋ฅผ ์ฃผ๋กœํ•˜์…”์„œ, CNN์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ชจ๋ธ์„ ์˜ˆ์‹œ๋กœ ๋ณด์—ฌ์ฃผ์‹ ๋‹ค. ๋ชจ๋‘ Pruning์ดํ›„์— ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ์˜ ๊ฒฝ์šฐ ์ตœ๋Œ€ 12๋ฐฐ ์ค„์–ด ๋“ค๋ฉฐ ์—ฐ์‚ฐ์˜ ๊ฒฝ์šฐ 6.3๋ฐฐ๊นŒ์ง€ ์ค„์–ด ๋“  ๊ฒƒ์„ ๋ณผ ์ˆ˜ ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์ €๋ ‡๊ฒŒ โ€œํฌ๊ธฐ๊ฐ€ ์ค„์–ด๋“  ๋ชจ๋ธ์ด ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์„๊นŒ?โ€œ

Reference. MIT-TinyML-lecture03-Pruning-1

๊ทธ๋ž˜ํ”„์—์„œ ๋ชจ๋ธ์˜ Weight ๋ถ„ํฌ๋„๋ฅผ ์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณด๋ฉด, Pruning์„ ํ•˜๊ณ  ๋‚œ ์ดํ›„์— Weight ๋ถ„ํฌ๋„์˜ ์ค‘์‹ฌ์— ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ž˜๋ ค๋‚˜๊ฐ„ ๊ฒŒ ๋ณด์ธ๋‹ค. ์ดํ›„ Fine Tuning์„ ํ•˜๊ณ  ๋‚œ ๋‹ค์Œ์˜ ๋ถ„ํฌ๊ฐ€ ๋‚˜์™€ ์žˆ๋Š”๋ฐ, ์–ด๋Š ์ •๋„ ์ •ํ™•๋„๋Š” ๋–จ์–ด์ง€์ง€๋งŒ ์„ฑ๋Šฅ์ด ์œ ์ง€๋˜๋Š” ๊ฑธ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๋‹ค.

Reference. MIT-TinyML-lecture03-Pruning-1

๊ทธ๋Ÿฐ Fine tuning์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด(Iterative Pruning and Fine tuning) ๊ทธ๋ž˜ํ”„์—์„œ๋Š” ์ตœ๋Œ€ 90ํ”„๋กœ ์ด์ƒ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.

๋ฌผ๋ก  ํŠน์ • ๋ชจ๋ธ์—์„œ, ํŠน์ • Task๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•œ ๊ฒƒ์ด๋ผ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜๋Š” ์—†์ง€๋งŒ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ณ ๋ คํ•˜๋Š” ์ƒํ™ฉ์ด๋ผ๋ฉด ์ถฉ๋ถ„ํžˆ ์‹œ๋„ํ•ด๋ณผ ๋งŒํ•œ ๊ฐ€์น˜๊ฐ€ ์žˆ์–ด ๋ณด์ธ๋‹ค. ๊ทธ๋Ÿผ ์ด๋ ‡๊ฒŒ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ Pruning์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์–ด๋–ค ์š”์†Œ๋ฅผ ๊ณ ๋ คํ•ด์•ผ ํ• ์ง€ ๋” ์ž์„ธํžˆ ์ด์•ผ๊ธฐํ•ด๋ณด์ž!

์†Œ๊ฐœํ•˜๋Š” ๊ณ ๋ ค์š”์†Œ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค. Pruning ํŒจํ„ด๋ถ€ํ„ฐ ์ฐจ๋ก€๋Œ€๋กœ ์‹œ์ž‘!

  • Pruning Granularity โ†’ Pruning ํŒจํ„ด
  • Pruning Criterion โ†’ ์–ผ๋งˆ๋งŒํผ์— ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ Pruning ํ•  ๊ฑด๊ฐ€?
  • Pruning Ratio โ†’ ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ Pruning์„ ์–ผ๋งˆ๋งŒํผ์˜ ๋น„์œจ๋กœ?
  • Fine Turning โ†’ Pruning ์ดํ›„์— ์–ด๋–ป๊ฒŒ Fine-Tuning ํ•  ๊ฑด๊ฐ€?
  • ADMM โ†’ Pruning ์ดํ›„, ์–ด๋–ป๊ฒŒ Convex๊ฐ€ ๋œ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์ง€?
  • Lottery Ticket Hypothesis โ†’ Training๋ถ€ํ„ฐ Pruning๊นŒ์ง€ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ๋ณด์ž!
  • System Support โ†’ ํ•˜๋“œ์›จ์–ด๋‚˜ ์†Œํ”„ํŠธ์›จ์–ด์ ์œผ๋กœ Pruning์„ ์ง€์›ํ•˜๋Š” ๊ฒฝ์šฐ๋Š”?

2. Determine the Pruning Granularity

The case of convolutional layers, red box is preserved and white one is pruned referred from MIT-TinyML-lecture03-Pruning-1

์—ฌ๊ธฐ์„œ ๊ณ ๋ ค์š”์†Œ๋Š” โ€œ์–ผ๋งˆ๋งŒํผ ๋‰ด๋Ÿฐ์„ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ณ ๋ คํ•  ๊ฒƒ์ธ๊ฐ€?โ€ ์ž…๋‹ˆ๋‹ค. Regularํ•œ ์ •๋กœ๋„ ๋ถ„๋ฅ˜ํ•˜๋ฉด์„œ Irregularํ•œ ๊ฒฝ์šฐ์™€ Regularํ•œ ๊ฒฝ์šฐ์˜ ํŠน์ง•์„ ์•„๋ž˜์ฒ˜๋Ÿผ ๋งํ•ฉ๋‹ˆ๋‹ค.

  • Fine-grained/Unstructured
    • More flexible pruning index choice
    • Hard to accelerate (irregular data expression)
    • Can deliver speed up on some custom hardware
  • Coarse-grained/Structured
    • Less flexible pruning index choice (a subset of the fine-grained case)
    • Easy to accelerate

Pruning์„ ํ•œ๋‹ค๊ณ  ๋ชจ๋ธ ์ถœ๋ ฅ์ด ๋‚˜์˜ค๋Š” ์‹œ๊ฐ„์ด ์งง์•„์ง€๋Š” ๊ฒƒ์ด ์•„๋‹˜๋„ ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. Hardware Acceleration์˜ ๊ฐ€๋Šฅ๋„๊ฐ€ ์žˆ๋Š”๋ฐ, ์ด ํŠน์ง•์„ ๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๋“ฏ, Pruning์˜ ์ž์œ ๋„์™€ Hardware Acceleration์ด trade-off, ์ฆ‰ ๊ฒฝ๋Ÿ‰ํ™” ์ •๋„์™€ Latency์‚ฌ์ด์— trade-off ๊ฐ€ ์žˆ์„ ๊ฒƒ์ด ์˜ˆ์ธก๋ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜์”ฉ, ์ž๋ฃŒ๋ฅผ ๋ณด๋ฉด์„œ ์‚ดํŽด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

2.1 Pattern-based Pruning

Irregular์—์„œ๋„ Pattern-based Pruning์€ ์—ฐ์†์ ์ธ ๋‰ด๋Ÿฐ M๊ฐœ ์ค‘ N๊ฐœ๋ฅผ Pruning ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” N:M = 2:4 ์œผ๋กœ ํ•œ๋‹ค๊ณ  ์†Œ๊ฐœํ•œ๋‹ค.

Reference. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT

Reference. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT

์˜ˆ์‹œ๋ฅผ ๋“ค์–ด ๋ณด๋ฉด, ์œ„์™€ ๊ฐ™์€ Matrix์—์„œ ํ–‰์„ ๋ณด์‹œ๋ฉด 8๊ฐœ์˜ Weight์ค‘ 4๊ฐœ๊ฐ€ Non-zero์ธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ Zero์ธ ๋ถ€๋ถ„์„ ์—†์• ๊ณ  2bit index๋กœ ํ•˜์—ฌ Matrix ์—ฐ์‚ฐ์„ ํ•˜๋ฉด Nvidiaโ€™s Ampere GPU์—์„œ ์†๋„๋ฅผ 2๋ฐฐ๊นŒ์ง€ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ Sparsity๋Š” โ€œ์–ผ๋งˆ๋งŒํผ ๊ฒฝ๋Ÿ‰ํ™” ๋๋Š”์ง€?โ€ ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.

  • N:M sparsity means that in each contiguous M elements, N of them is pruned
  • A classic case is 2:4 sparsity (50% sparsity)
  • It is supported by Nvidiaโ€™s Ampere GPU Architecture, which delivers up to 2x speed up and usually maintains accuracy.

2.2 Channel-level Pruning

๋ฐ˜๋Œ€๋กœ ํŒจํ„ด์ด ์ƒ๋Œ€์ ์œผ๋กœ regular ํ•œ ์ชฝ์ธ Channel-level Pruning์€ ์ถ”๋ก ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐ˜๋ฉด์— ๊ฒฝ๋Ÿ‰ํ™” ๋น„์œจ์ด ์ ๋‹ค๊ณ  ๋งํ•œ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์„ ๋ณด์‹œ๋ฉด Layer๋งˆ๋‹ค Sparsity๊ฐ€ ๋‹ค๋ฅธ ๊ฑธ ๋ณด์‹ค ์ˆ˜ ์žˆ๋‹ค.

  • Pro: Direct speed up!
  • Con: smaller compression ratio

Reference AMC: Automl for Model Compression and Acceleration on Mobile Devices [He et al., ECCV 2018]

์•„๋ž˜์— ์ž๋ฃŒ์—์„œ๋Š” Channel ๋ณ„๋กœ ํ•œ Pruning์˜ ๊ฒฝ์šฐ ์ „์ฒด ๋‰ด๋ จ์„ ๊ฐ€์ง€๊ณ  ํ•œ Pruning๋ณด๋‹ค ์ถ”๋ก  ์‹œ๊ฐ„์„ ๋” ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋งํ•œ๋‹ค.

Reference AMC: Automl for Model Compression and Acceleration on Mobile Devices [He et al., ECCV 2018]

์ž๋ฃŒ๋ฅผ ๋ณด๋ฉด Sparsity์—์„œ๋Š” ํŒจํ„ดํ™” ๋ผ ์žˆ์œผ๋ฉด ๊ฐ€์†ํ™”๊ฐ€ ์šฉ์ดํ•ด Latency, ์ถ”๋ก  ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ๊ทธ ๋งŒํผ Pruningํ•˜๋Š” ๋‰ด๋Ÿฐ์˜ ์ˆ˜๊ฐ€ ์ ์–ด ๊ฒฝ๋Ÿ‰ํ™” ๋น„์œจ์ด ์ค„ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค. ํ•˜์ง€๋งŒ ๋น„๊ต์  ๋ถˆ๊ทœ์น™ํ•œ ์ชฝ์— ์†ํ•˜๋Š” Pattern-based Pruning์˜ ๊ฒฝ์šฐ๊ฐ€ ํ•˜๋“œ์›จ์–ด์—์„œ ์ง€์›ํ•ด์ฃผ๋Š” ๊ฒฝ์šฐ, ๋ชจ๋ธ ํฌ๊ธฐ์™€ Latency๋ฅผ ๋‘˜ ๋‹ค ์ตœ์ ์œผ๋กœ ์žก์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

3. Determine the Pruning Criterion

๊ทธ๋ ‡๋‹ค๋ฉด ์–ด๋–ค ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๋Š” ๋‰ด๋Ÿฐ์„ ์šฐ๋ฆฌ๋Š” ์ž˜๋ผ๋‚ด์•ผ ํ• ๊นŒ์š”? Synapse์™€ Neuron์œผ๋กœ ๋‚˜๋ˆ ์„œ ์‚ดํŽด๋ณด์ž.

  • Which synapses? Which neurons? Which one is less important?
  • How to Select Synapses and Select Neurons to Prune

3.1 Select of Synapses

ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š”๋ฐ, ๊ฐ ๋‰ด๋Ÿฐ์˜ ํฌ๊ธฐ, ๊ฐ ์ฑ„๋„์— ์ „์ฒด ๋‰ด๋Ÿฐ์— ๋Œ€ํ•œ ํฌ๊ธฐ, ๊ทธ๋ฆฌ๊ณ  ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ gradient์™€ weight๋ฅผ ๋ชจ๋‘ ๊ณ ๋ คํ•œ ํฌ๊ธฐ๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. Song han ๊ต์ˆ˜๋‹˜์ด ๋ฐฉ๋ฒ•๋“ค์„ ์†Œ๊ฐœํ•˜๊ธฐ์— ์•ž์„œ์„œ ์œ ์ˆ˜์˜ ๊ธฐ์—…๋“ค๋„ ์ง€๋‚œ 5๋…„ ๋™์•ˆ ์ฃผ๋กœ Magnitude-based Pruning๋งŒ์„ ์‚ฌ์šฉํ•ด์™”๋‹ค๊ณ  ํ•˜๋Š”๋ฐ, 2023๋…„์ด ๋ผ์„œ On-device AI๊ฐ€ ๊ฐ๊ด‘๋ฐ›๊ธฐ ์‹œ์ž‘ํ•ด์„œ ์ ์ฐจ์ ์œผ๋กœ ๊ด€์‹ฌ์„ ๋ฐ›๊ธฐ ์‹œ์ž‘ํ•œ ๊ฑด๊ฐ€ ์‹ถ๊ธฐ๋„ ํ•˜๋‹ค.

3.1.1 Magnitude-based Pruning

ํฌ๊ธฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋Š” ๊ฒฝ์šฐ, โ€œ์–ผ๋งˆ๋งŒํผ ๋‰ด๋Ÿฐ ๊ทธ๋ฃน์—์„œ ๊ณ ๋ คํ•  ๊ฒƒ์ธ๊ฐ€?โ€์™€ โ€œ๊ทธ๋ฃน๋‚ด์—์„œ ์–ด๋–ค ์ •๊ทœํ™”๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์ธ๊ฐ€?๋ฅผ ๊ณ ๋ คํ•œ๋‹ค.

  1. Heuristic pruning criterion, Element-wise Pruning

    \[ Importance = \lvert W \lvert \]

    Reference. MIT-TinyML-lecture03-Pruning-1
  2. Heuristic pruning criterion, Row-wise Pruning, L1-norm magnitude

    \[ Importance = \sum_{i\in S}\lvert w_i \lvert, \\where\ W^{(S)}\ is\ the\ structural\ set\ S\ of\ parameters\ W \]

    Reference. MIT-TinyML-lecture03-Pruning-1
  3. Heuristic pruning criterion, Row-wise Pruning, L2-norm magnitude

    \[ Importance = \sum_{i\in S}\lvert w_i \lvert, \\where\ W^{(S)}\ is\ the\ structural\ set\ S\ of\ parameters\ W \]

    Reference. MIT-TinyML-lecture03-Pruning-1
  4. Heuristic pruning criterion, \(L_p\)- norm

    \[ \lvert\lvert W^{(S)}\lvert\lvert=\huge( \large \sum_{i\in S} \lvert w_i \lvert^p \huge) \large^{\frac{1}{p}} \]

3.1.2 Scaling-based Pruning

๋‘ ๋ฒˆ์งธ๋กœ Scaling์„ ํ•˜๋Š” ๊ฒฝ์šฐ ์ฑ„๋„๋งˆ๋‹ค Scaling Factor๋ฅผ ๋‘ฌ์„œ Pruning์„ ํ•œ๋‹ค. ๊ทธ๋Ÿผ Scaling Factor๋ฅผ ์–ด๋–ป๊ฒŒ ๋‘ฌ์•ผ ํ• ๊นŒ? ๊ฐ•์˜์—์„œ ์†Œ๊ฐœํ•˜๋Š” ์ด ๋…ผ๋ฌธ์—์„œ๋Š” Scaling factor \(\gamma\) ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ trainable ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋‘๋ฉด์„œ batch normalization layer์— ์‚ฌ์šฉํ•œ๋‹ค.

  • Scale factor is associated with each filter(i.e. output channel) in convolution layers.

  • The filters or output channels with small scaling factor magnitude will be pruned

  • The scaling factors can be reused from batch normalization layer

    \[ z_o = \gamma\dfrac{z_i-\mu_{B}}{\sqrt{\sigma_B^2+\epsilon}}+\beta \]

Reference. MIT-TinyML-lecture03-Pruning-1

3.1.3 Talyor Expansion Analysis on Pruning Error

์„ธ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ Objective function์„ ์ตœ์†Œํ™” ํ•˜๋Š” ์ง€์ ์„ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. Talyor Series์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์—ฌ๊ธฐ์„œ!

  • Evaluate pruning error induced by pruning synapses.
  • Minimize the objective function L(x; W)
  • A Taylor series can approximate the induced error.

\[ \delta L = L(x;W)-L(x;W_p=W-\delta W) \\ = \sum_i g_i\delta w_i + \frac{1}{2} \sum_i h_{ii}\delta w_i^2 + \frac{1}{2}\sum_{i\not=j}h_{ij}\delta w_i \delta w_j + O(\lvert\lvert \delta W \lvert\lvert^3) \] \[ where\ g_i=\dfrac{\delta L}{\delta w_i}, h_{i, j} = \dfrac{\delta^2 L}{\delta w_i \delta w_j} \]

  1. Second-Order-based Pruning

    Reference. MIT-TinyML-lecture03-Pruning-1

    Reference. MIT-TinyML-lecture03-Pruning-1

    Optimal Brain Damage[LeCun et al., NeurIPS 1989] ๋…ผ๋ฌธ์—์„œ๋Š” ์ด ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด ์„ธ ๊ฐ€์ง€๋ฅผ ๊ฐ€์ •ํ•œ๋‹ค.

    1. Objective function L์ด quadratic ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋งˆ์ง€๋ง‰ ํ•ญ์ด ๋ฌด์‹œ๋œ๋‹ค(์ด๋Š” Talyor Series์˜ Error ํ•ญ์„ ์•Œ๋ฉด ์ดํ•ด๊ฐ€ ๋” ์‰ฝ๋‹ค!)
    2. ๋งŒ์•ฝ ์‹ ๊ฒฝ๋ง์ด ์ˆ˜๋ ดํ•˜๊ฒŒ๋˜๋ฉด, ์ฒซ ๋ฒˆ์งธํ•ญ๋„ ๋ฌด์‹œ๋œ๋‹ค.
    3. ๊ฐ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋…๋ฆฝ์ ์ด๋ผ๋ฉด Cross-term๋„ ๋ฌด์‹œ๋œ๋‹ค.

    ๊ทธ๋Ÿฌ๋ฉด ์‹์„ ์•„๋ž˜์ฒ˜๋Ÿผ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ค‘์š”ํ•œ ๋ถ€๋ถ„์€ Hessian Matrix H์— ์‚ฌ์šฉํ•˜๋Š” Computation์ด ์–ด๋ ต๋‹ค๋Š” ์ !

    \[ \delta L_i = L(x;W)-L(x;W_p\lvert w_i=0)\approx \dfrac{1}{2} h_{ii}w_i^2,\ where\ h_{ii}=\dfrac{\partial^2 L}{\partial w_i \partial w_j} \]

    \[ importance_{w_i} = \lvert \delta L_i\lvert = \frac{1}{2}h_{ii}w_i^2 \] \[ *\ h_{ii} \text{ is non-negative} \]

  2. First-Order-based Pruning

    • ์ฐธ๊ณ ๋กœ ์ด ๋ฐฉ๋ฒ•์€ 2023๋…„์—๋Š” ์†Œ๊ฐœํ•˜์ง€ ์•Š๋Š”๋‹ค.

    Reference. MIT-TinyML-lecture03-Pruning-1
    • If only first-order expansion is considered under an i.i.d(Independent and identically distributed) assumption,

    \[ \delta L_i = L(x;W) - L(x; W_P\lvert w_i=0) \approx g_iw_i,\ where\ g_i=\dfrac{\partial L}{\partial w_i} \] \[ importance_{w_i} = \lvert \delta L_i \lvert = \lvert g_i w_i \lvert \ or \ importance_{w_i} = \lvert \delta L_i \lvert^2 = (g_i w_i)^2 \]

    • For coarse-grained pruning, we have,

      \[ importance_{\ W^{(S)}} = \sum_{i \in S}\lvert \delta L_i \lvert^2 = \sum_{i \in S} (g_i w_i)^2,\ where \ W^{(S)}is\ the\ structural\ set\ of\ parameters \]

3.2 Select of Neurons

์–ด๋–ค Neuron์„ ์—†์•จ ์ง€๋ฅผ ๊ณ ๋ ค(Less useful โ†’ Remove) ํ•œ ์ด ๋ฐฉ๋ฒ•์€ Neuron์˜ ๊ฒฝ์šฐ๋„ ์žˆ์ง€๋งŒ ์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ Channel๋กœ ๊ณ ๋ คํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ํ™•์‹คํžˆ ์ „์— ์†Œ๊ฐœํ–ˆ๋˜ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค โ€œCoarse-grained pruningโ€์ธ ๋ฐฉ๋ฒ•์ด๋‹ค.

  1. Percentage-of-Zero-based Pruning

    ์ฒซ๋ฒˆ์งธ๋Š” Channel๋งˆ๋‹ค 0์˜ ๋น„์œจ์„ ๋ด์„œ ๋น„์œจ์ด ๋†’์€ Channel ์„ ์—†๋‚ด๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ReLU activation์„ ์‚ฌ์šฉํ•˜๋ฉด Output์ด 0์ด ๋‚˜์˜ค๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ 0์˜ ๋น„์œจ, Average Percentage of Zero activations(APoZ)๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๊ฒƒ์„ ๋ณด๊ณ  ๊ฐ€์ง€์น˜๊ธฐํ•  Channel์„ ์ œ๊ฑฐํ•œ๋‹ค.

    • ReLU activation will generate zeros in the output activation
    • Similar to magnitude of weights, the Average Percentage of Zero activations(APoZ) can be exploited to measure the importance the neuron has

    Reference. MIT-TinyML-lecture03-Pruning-1
  2. First-Order-based Pruning

    • ์ฐธ๊ณ ๋กœ ์ด ๋ฐฉ๋ฒ•์€ 2023๋…„์—๋Š” ์†Œ๊ฐœํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

    • Minimize the error on loss function introduced by pruning neurons

    • Similar to previous Taylor expansion on weights, the induced error of the objective function L(x; W) can be approximated by a Taylor series expanded on activations.

      \[ \delta L_i = L(x; W) - L(x\lvert x_i = 0; W) \approx \dfrac{\partial L}{\partial x_i}x_i \]

    • For a structural set of neurons \(x^{(S)}\) (e.g., a channel plane),

      \[ \lvert \delta L_{x^{(S)}} \lvert\ = \Large\lvert \small\sum_{i\in S}\dfrac{\partial L}{\partial x_i}x_i\Large\lvert \]

  3. Regression-based Pruning

    ์ด ๋ฐฉ๋ฒ•์€ Quantizedํ•œ ๋ ˆ์ด์–ด์˜ output \(\hat Z\)(construction error of the corresponding layerโ€™s outputs)์™€ \(Z\)๋ฅผ Training์„ ํ†ตํ•ด ์ฐจ์ด๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ฐธ๊ณ ๋กœ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ์ž์„ธํ•œ ๊ณผ์ •์€ 2022๋…„ ๊ฐ•์˜์—๋งŒ ๋‚˜์™€ ์žˆ๋‹ค.

\[ Z=XW^T=\sum_{c=0}^{c_i-1}X_cW_c^T \]

Reference. MIT-TinyML-lecture03-Pruning-1

๋ฌธ์ œ๋ฅผ ์‹์œผ๋กœ ์ •์˜ํ•ด๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€๋ฐ,

  • \(\beta\) is the coefficient vector of length \(c_i\) for channel selection.
  • \(\beta_c = 0\) means channel \(c\) is pruned.
  • \(N_c\) is the number of none zero channel

์šฐ์„  ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๋‹จ๊ณ„๋Š” ๋‘ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆˆ๋‹ค. Channel์˜ Scale \(\beta\)๋ฅผ ์šฐ์„  ๊ณ„์‚ฐํ•œ ํ›„์— \(W\)๋ฅผ Quantizedํ•œ ๋ ˆ์ด์–ด์˜ output \(\hat Z\)(construction error of the corresponding layerโ€™s outputs)์™€ \(Z\)์˜ ์ฐจ์ด๊ฐ€ ์ตœ์†Œํ™”๋˜๋Š” ์ง€์ ๊นŒ์ง€ Training์‹œํ‚จ๋‹ค.

Solve the problem in two folds:

  • Fix W, solve \(\beta\) for channel selection โ†’ NP(Nondeterministic polynomial)-hard
  • Fix \(\beta\), solve W to minimize reconstruction error(Weight Reconstruction)

๊ฐ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๊ณผ์ •์„ ์กฐ๊ธˆ ๋” ์ž์„ธํžˆ ์‚ดํŽด๋ด๋ณด์ž. ๋ณธ ๋‚ด์šฉ์€ 2022๋…„ ๊ฐ•์˜์— ์žˆ์œผ๋‹ˆ ์ฐธ๊ณ !

NP(Nondeterministic polynomial)-hard๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์‹์œผ๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

\[ \underset{\beta}{argmin} \lvert\lvert Z- \sum_{c=0}^{c_i-1} \beta_cX_cW_c^T \lvert\lvert_F^2 = \lvert\lvert \sum_{c=0}^{c_i-1}X_cW_c^T - \sum_{c=0}^{c_i-1} \beta_cX_cW_c^T \lvert\lvert_F^2 \] \[ = \lvert\lvert\sum_{c=0}^{c_i-1} (1-\beta_c)X_cW_c^T \lvert\lvert_F^2, \ s.t.\ \lvert\lvert\beta\lvert\lvert_0 \ \leq N_c \]

๊ฐ•์˜์—์„œ ์†Œ๊ฐœํ•˜๋Š” ThiNet์ด๋ผ๋Š” ๋…ผ๋ฌธ์—์„œ๋Š” greedy solution์„ ์ด์šฉํ•ด์„œ ์ฑ„๋„ ํ•˜๋‚˜ํ•˜๋‚˜์”ฉ Pruning ํ•ด๋ณด๋ฉฐ objective function์˜ l2-norm ์ตœ์†Ÿ๊ฐ’์„ ๊ตฌํ•œ๋‹ค.

1: S = []
2: while len(S) < N:
3:   min_norm, min_c = +inf, 0
4:   for c in range(c_i):
5:     tmpS=S+[c]
6:     Z = X[:,tmpS] * W[:,tmpS].t()
7:     norm = Z.norm(2)
8:     if norm < min_norm:
9:       min_norm, min_c = norm, c
10:   S.append(min_c)
11:   c_i.pop(min_c)

์—ฌ๊ธฐ์„œ ๋”ํ•ด์„œ \(\beta\) ๋ฅผ ๊ตฌํ•˜๋Š” ๊ณผ์ •์—์„œ ์ผ๋ฐ˜ํ™”๋ฅผ ์œ„ํ•ด LASSO ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค(LASSO์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์—ฌ๊ธฐ์„œ). Relax the \(l_0\) to \(l_1\) regularization (LASSO):

\[ \underset{\beta}{argmin}\ \lvert\lvert Z- \sum_{c=0}^{c_i-1}\beta_cX_cW_c^T\lvert\lvert^2_F+\lambda\lvert\lvert \beta \lvert\lvert_1 \]

  • \(\lambda\) is a penalty coefficient. By increasing \(\lambda\), there will be more zeros in \(\beta\).

  • Gradually increase \(\lambda\) and solve the LASSO regression for \(\beta\), until \(\lvert\lvert \beta \lvert\lvert_0==N_c\) is met.

  • Why \(\lvert\lvert \beta \lvert\lvert_0==N_c\)?

    ์—ฌ๊ธฐ์— ๋Œ€ํ•ด์„œ๋Š” ๋”ฐ๋กœ ์–ธ๊ธ‰๋˜์ง€ ์•Š์•˜์ง€๋งŒ, ์˜๋ฏธ์ƒ scale ์ „์ฒด N๊ฐœ ์ค‘์—์„œ ์ตœ์ ๊ฐ’์„ ์ฐพ์•„์•ผํ•œ๋‹ค๋ฉด ์ „์ฒด๋ฅผ N์œผ๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ ์ตœ์ ๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ๊ฐ€ ์•„๋‹๊นŒ?

๋‘ ๋ฒˆ์งธ๋Š” ๊ตฌํ•œ \(\beta\)๋ฅผ ๊ณ ์ •ํ•œ ์ƒํƒœ๋กœ Weight๋ฅผ Quantized ์ „ํ›„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™” ํ•˜๊ฒŒ โ€œWeight Reconstructionโ€ ํ•œ๋‹ค. ๊ตฌํ•˜๋Š” ๊ณผ์ •์€ least square approach๋ฅผ ์ด์šฉํ•œ unique closed-form solution ์ด๋ฏ€๋กœ ์•„๋ž˜๋ฅผ ์ฐธ์กฐํ•˜์ž.

\[ \underset{\beta}{argmin}\ \lvert\lvert Z- \sum_{c=0}^{c_i-1}\beta_cX_cW_c^T\lvert\lvert^2_F \]

  • \(\beta\) is a coefficient vector from the previous step

  • This is a classic linear regression problem, which has a unique closed-form solution using the least square approach.

    \[ \underset{W}{argmin} \lvert\lvert Z-\hat{Z} \lvert\lvert^2_F = \lvert\lvert Z-UW^T \lvert\lvert_F^2 \]

    where

    \[ U= \Large[ \small\beta_0X_0\ \beta_1X_1 \ \cdots \beta_cX_c \cdots \beta_{c_i-1}X_{c_i-1} \Large] \]

    and thus,

    \[ W^T = (U^TU)^{-1}U^T Z \]

    • Q. How \((U^TU)^{-1}\) exists?

      Least Square method, ์ž„์˜์˜ ๋ฒกํ„ฐ \(v = (v_0, v_1, \dots, v_n)\) ๊ฐ€ ์žˆ์„ ๋•Œ \(v^Tv\) ์˜ ์—ญํ–‰๋ ฌ์€ ํ•ญ์ƒ ์žˆ์„๊นŒ? ๊ฐ€์ •์—์„œ โ€œa unique closed-form solutionโ€๋ผ๊ณ  ํ–ˆ์œผ๋ฏ€๋กœ ์ด๋Š” ์ฆ‰ linearly independen๋กœ ๊ณ ๋ คํ•  ์žˆ๊ณ  ์—ญํ–‰๋ ฌ์ด ์žˆ๋‹ค(\(v^Tv\) is invertible)๋Š” ์ด์•ผ๊ธฐ์ด๋‹ค.

4. Discussion

  1. Pruning์„ Dropout์ด๋ž‘ ๋น„๊ตํ•ด์„œ ์–ด๋–ค ์ฐจ์ด์ ์ด ์žˆ๋Š”๊ฐ€?

    ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ๋ถ„๋ช…ํžˆ Neuron๊ณผ Synapse๋ฅผ ์—†๋Œ„๋‹ค๋Š” ์ธก๋ฉด์—์„œ๋Š” ๋น„์Šทํ•˜๋‹ค. ํ•˜์ง€๋งŒ ๋‘ ๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ์ฐจ์ด์ ์ด ์žˆ๋Š”๋ฐ, ํ•œ ๊ฐ€์ง€๋Š” ๋ชฉ์ ํ•˜๋Š” ๋ฐ”์ด๊ณ , ๋‘ ๋ฒˆ์งธ๋Š” ์‹œ์ ์ด๋‹ค. Dropout์€ ๋ชฉ์ ํ•˜๋Š” ๋ฐ”๊ฐ€ ํ›ˆ๋ จ์ค‘์— overfitting์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•จ์ด ์žˆ๊ณ  Pruning์˜ ๊ฒฝ์šฐ๋Š” ํ›ˆ๋ จ์„ ๋งˆ์นœ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์— ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ์‹œ์ ์˜ ๊ฒฝ์šฐ Dropout์€ ํ›ˆ๋ จ์ค‘์— ์ด๋ค„์ง€๋Š” ๋ฐ˜๋ฉด Pruning์€ ํ›ˆ๋ จ์„ ๋งˆ์น˜๊ณ , ๊ทธ ํฌ๊ธฐ๋ฅผ ์ค„์ธ ํ›„์— ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋ฉด ๊ทธ์— ๋งž๊ฒŒ Fine-tuning์„ ํ•œ๋‹ค.

    ์Šคํ„ฐ๋””์—์„œ๋Š” โ€œ์™œ dropout์„ ํ†ตํ•ด ์‚ฌ์ด์ฆˆ๋ฅผ ์ค„์ด์ง€ ์•Š์•˜๋Š”๊ฐ€? ๊ทธ๋ฆฌ๊ณ  ๊ตฌ์ง€ ํ›ˆ๋ จ์„ ๋งˆ์นœ ๋‹ค์Œ์— ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‚˜?โ€ ๋ผ๊ณ  ์งˆ๋ฌธ์ด ๋‚˜์™”์—ˆ๋‹ค. ๋ฌผ๋ก  ํ›ˆ๋ จ ์ค‘์— ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์ž‘๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์œผ๋ฉด, ๊ฐ€๋Šฅํ•œ ๊ทธ๋ ‡๊ฒŒ ํ•˜๋ฉด ๋  ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ, ์ด ๋˜ํ•œ ๋‘๊ฐ€์ง€ ์ธก๋ฉด์„ ๊ณ ๋ คํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ํ•˜๋‚˜๋Š” โ€œ๊ณผ์—ฐ ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ํ›ˆ๋ จ ์ค‘ ํ˜น์€ ์ „์— ์ค„์—ฌ๋‚˜๊ฐ€๋ฉด์„œ ์ถฉ๋ถ„ํžˆ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋Š”๊ฐ€?โ€์ด๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” Pruning์ด๋‚˜ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”๋Š” ์ตœ์ ํ™”์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํ›ˆ๋ จ ์ค‘๊ฐ„์— Channel pruning๊ณผ ๊ฐ™์€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„ ์ง€๋Š” ๋ฏธ์ง€์ˆ˜์ด๊ณ , ์„ค๋ น Fine-grained Pruning๊ณผ ๊ฐ™์€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค ํ•˜๋”๋ผ๋„ ์ด๋Š” ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ๋งŒ ์ค„์–ด ๋“ค ๋ฟ, ๋‚˜๋จธ์ง€ ๋ฉ”๋ชจ๋ฆฌ(e.g. RAM)์ด๋‚˜ Latency๊ฐ™์€ ์„ฑ๋Šฅ์€ ์ข‹๊ฒŒ ๊ฐ€์ ธ๊ฐˆ ์ˆ˜ ์žˆ์„์ง€๋„ ๋ฏธ์ง€์ˆ˜๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

    ํ•„์ž๋Š” ์œ„์™€ ๊ฐ™์€ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•œ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์ด ๊ธ€์—์„œ์ฒ˜๋Ÿผ 2022๋…„ TinyML ๊ฐ•์˜์—์„œ ์ œ๊ณตํ•˜๋Š” ์‹ค์Šต์„ ํ†ตํ•ด ๊ฒฝํ—˜ํ–ˆ์—ˆ๋‹ค. ์•ž์„  ์˜ˆ์‹œ๋Š” OS๋ฅผ ๊ฐ€์ง„ ๋””๋ฐ”์ด์Šค๊ฐ€ ์•„๋‹Œ Bare-metal firmware๋กœ ํ™˜๊ฒฝ์ด ์กฐ๊ธˆ ํŠน์ˆ˜ํ•˜๊ธฐ๋„ ํ•˜๊ณ , ์‹ค์ œ๋กœ Torch๋‚˜ Tensorflowlite์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๋ถ„์„ํ•ด๋ด์•ผ ์‹ค์งˆ์ ์ธ ์˜ˆ์‹œ๋ฅผ ์•Œ ์ˆ˜ ์žˆ๊ฒ ์ง€๋งŒ, ํ˜น์—ฌ ์ดํ•ดํ•ด ์ฐธ๊ณ ๊ฐ€ ๋ ๊นŒ ๋ง๋ถ™์—ฌ ๋†“๋Š”๋‹ค.

5. Reference