๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
AI ๊ณต๋ถ€ ํ•ญ์ƒํ•˜์ž/๊ด€๋ จ ์ด๋ก 

[Math] Gradient Descent (์ฐฉํ•œ๋ฏธ๋ถ„๋ง›)

by ์ž„๋ฆฌ๋‘ฅ์ ˆ 2024. 12. 16.
๋ฐ˜์‘ํ˜•
๋”๋ณด๊ธฐ

๊ธฐ์กด ๋ถ€์บ  ๋•Œ ๋…ธ์…˜์— ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๊ฒƒ์„ ๊ณต๋ถ€ํ•  ๊ฒธ ์ž‘์„ฑํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.
๊ฐœ์ธ์ ์œผ๋กœ ํ•ด์„ํ•ด์„œ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. (ํ‹€๋ฆด ์ˆ˜ ์žˆ์Œ. ์ •์ •์š”์ฒญ ์š”๋งใ…‹)
** ๊ฐ•์˜์ž๋ฃŒ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค **
** ์ƒ์—…์  ์ด์šฉ์„ ๊ธˆ์ง€ํ•ฉ๋‹ˆ๋‹ค **

 

 

Today's Keyword
๋ฏธ๋ถ„, ๊ธฐ์šธ๊ธฐ, ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•, ํŽธ๋ฏธ๋ถ„, gradient vector

 

 

๋ฏธ๋ถ„ (Differentiation)

  • ๋ณ€์ˆ˜์˜ ์›€์ง์ž„์— ๋”ฐ๋ฅธ ํ•จ์ˆ˜๊ฐ’์˜ ๋ณ€ํ™” ์ธก์ •
  • ์ตœ์ ํ™”์—์„œ ๋งŽ์ด ์‚ฌ์šฉ 

๋ฏธ๋ถ„์„ ์†์œผ๋กœ ๊ณ„์‚ฐํ•˜๋ ค๋ฉด ์ผ์ผ์ด h -> 0 ๊ทนํ•œ ๊ณ„์‚ฐํ•ด์•ผ ํ•จ
h๋ฅผ 0์œผ๋กœ ๋ณด๋‚ด๋ฉด (x, f(x))์—์„œ ์ ‘์„ ์˜ ๊ธฐ์šธ๊ธฐ๋กœ ์ˆ˜๋ ด.

  • ๋ฏธ๋ถ„ == ์ ‘์„ ์˜ ๊ธฐ์šธ๊ธฐ
    • ํ•จ์ˆ˜์˜ ๋ชจ์–‘์ด ๋งค๋„๋Ÿฌ์›Œ์•ผ ํ•œ๋‹ค (์—ฐ์†)
  • ํ•œ ์ ์—์„œ ์ ‘์„ ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์•Œ๋ฉด 
    • ์ฆ๊ฐ€ : ๋ฏธ๋ถ„ ๊ฐ’ ๋”ํ•˜๊ธฐ
    • ๊ฐ์†Œ : ๋ฏธ๋ถ„ ๊ฐ’ ๋นผ๊ธฐ 
    • ๋ฏธ๋ถ„๊ฐ’์ด ์Œ์ˆ˜(์ขŒ๋กœ ์ƒ์Šนํ•˜๋Š” ํ˜•ํƒœ) = x+f'(x) < x ๋Š” ์™ผ์ชฝ์œผ๋กœ ์ด๋™ํ•˜์—ฌ ํ•จ์ˆ˜๊ฐ’์ด ์ฆ๊ฐ€ 
    • ๋ฏธ๋ถ„๊ฐ’์ด ์•™์ˆ˜(์šฐ๋กœ ์ƒ์Šนํ•˜๋Š” ํ˜•ํƒœ) = x+f'(x) > x ๋Š” ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด๋™ํ•˜์—ฌ ํ•จ์ˆ˜๊ฐ’์ด ๊ฐ์†Œ
  • sympy.sym.diff() : ๋กœ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋ฏธ๋ถ„ ๊ณ„์‚ฐ ๊ฐ€๋Šฅ (import sympy as sym)

 

๊ฒฝ์‚ฌ์ƒ์Šน ํ•˜๊ฐ• ๋ฒ•

 

๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์ตœ๋Œ€ํ™”

  • Gradient Ascent +
    • ๋ฏธ๋ถ„๊ฐ’์„ ๋”ํ•˜๊ธฐ
    • ํ•จ์ˆ˜์˜ ๊ทน๋Œ€๊ฐ’์˜ ์œ„์น˜
    • ์ œ์ผ ๋†’์€ ๊ณณ ์ฐพ๊ธฐ

๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”

  • Gradient Descent -
    • ๋ฏธ๋ถ„๊ฐ’์„ ๋นผ๊ธฐ
    • ํ•จ์ˆ˜์˜ ๊ทน์†Ÿ๊ฐ’์˜ ์œ„์น˜ 
    • ์ œ์ผ ๋‚ฎ์€ ๊ณณ ์ฐพ๊ธฐ 
  • ๊ทน ๊ฐ’์— ๋„๋‹ฌํ•˜๋ฉด ๋ฉˆ์ถค (= ๋ฏธ๋ถ„ ๊ฐ’์ด 0)

์ฝ”๋“œ๋กœ ๋ด๋ณผ๊นŒ

#init ์‹œ์ž‘์ , lr : ํ•™์Šต๋ฅ , eps : ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ข…๋ฃŒ์กฐ๊ฑด
var = init 						# ์‹œ์ž‘์ 
grad = gradient(var) 			# ๋ฏธ๋ถ„๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜
while (abs(grad)>eps): 			# 0๊ณ„์‚ฐ ๋ชปํ•˜๋‹ˆ ์ „์—์„œ ๋ฉˆ์ถค
		var = var - lr * grad 	# ํ•™์Šต๋ฅ ๋กœ ์†๋„ ์กฐ์ ˆ
		grad = gradient(var) 	# ๊ณ„์† ๋ฏธ๋ถ„๊ฐ’ ์—…๋ฐ์ดํŠธ

ํ•จ์ˆ˜ f(x) = x^2  + 2x + 3 ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์œผ๋กœ ์ฐพ๊ธฐ

import sympy as sym
import numpy as np
from sympy.abc import x

def func(val):					# 2์ฐจํ•จ์ˆ˜ ์ •์˜
    fun = sym.poly(x**2 + 2*x + 3)
    return fun.subs(x, val), fun

def func_gradient(fun, val):	# ๋ฏธ๋ถ„ ๊ณ„์‚ฐ ํ•˜๋Š” ํ•จ์ˆ˜
    _, function = fun(val)
    diff = sym.diff(function, x)
    return diff.subs(x, val), diff # x, val์˜ ๋ฏธ๋ถ„ ์ฐจ์ด / ๋ฏธ๋ถ„๊ฐ’

def gradient_descent(fun, init_point, lr_rate=1e-2, epsilon=1e-5):
    cnt=0
    val = init_point
    diff, _ = func_gradient(fun, init_point)
    while np.abs(diff) > epsilon:
        val = val - lr_rate*diff
        diff, _ = func_gradient(fun, val)
        cnt+=1
    print("ํ•จ์ˆ˜: {}, ์—ฐ์‚ฐํšŸ์ˆ˜: {}, ์ตœ์†Œ๊ฐ’: ({}, {})".format(fun(val)[1], cnt, val, fun(val)[0]))

gradient_descent(fun=func, init_point=np.random.uniform(-2,2))

 

 

ํŽธ๋ฏธ๋ถ„ (partial Differentiation) : ๋ฒกํ„ฐ๊ฐ€ ์ž…๋ ฅ์ธ ๋‹ค๋ณ€์ˆ˜ ํ•จ์ˆ˜

  • ์ด๋ ‡๊ฒŒ ๋‹ค๋ณ€์ˆ˜ ํ•จ์ˆ˜์˜ ๊ฒฝ์šฐ ๊ฐ ๋ณ€์ˆ˜ ๋ณ„๋กœ ๊ณ„์‚ฐํ•œ gradient vector๋ฅผ ์ด์šฉ. 
  • e_i ๋กœ i๋งŒ 1์ด๊ณ  ๋‚˜๋จธ์ง€ 0 ๋•Œ๋ฆฐ ๋‹จ์œ„ ๋ฒกํ„ฐ ์ด์šฉํ•˜๋Š” ์‹ 

์œ— ์‹์— ๋Œ€ํ•œ ์•„๋ž˜๋Š” x์— ๋Œ€ํ•œ ํŽธ๋ฏธ๋ถ„ ๊ฒฐ๊ณผ

import sympy as sym
from sympy.abc import x, y

# ํ•จ์ˆ˜ ์ •์˜์™€ ํŽธ๋ฏธ๋ถ„
sym.diff(sym.poly(x**2 + 2*x*y + 3) + sym.cos(x + 2*y), x)

nabla f๋ฅผ ์ด์šฉํ•˜์—ฌ x = (x1...xd) ๋ณ€์ˆ˜ ๋™์‹œ ์—…๋ฐ์ดํŠธ

 

 

Gradient Vector

f(x,y,z) ๊ณต๊ฐ„์—์„œ -nabla Fvector๋ฅผ ๊ทธ๋ฆฌ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

  • ๋ฒกํ„ฐํ™” ๋˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ทน์ ์œผ๋กœ ๊ฐ€๋Š” ๋ฐฉํ–ฅ์„ฑ์ด ์ƒ๊น€.
  • +๋ฉด ๊ฐ€์žฅ ๋นจ๋ฆฌ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฐฉํ–ฅ. - ๋ฉด ๊ฐ€์žฅ ๋นจ๋ฆฌ ๊ฐ์†Œํ•˜๋Š” ๋ฐฉํ–ฅ !

์ฝ”๋“œ๋กœ ๋ณด๋ฉด....

  • ๋ฒกํ„ฐ๋Š” ์ ˆ๋Œ€๊ฐ’ ๋Œ€์‹  norm ์ด์šฉํ•˜๊ธฐ์— ๊ทธ ๋ถ€๋ถ„๋งŒ ๋ฐ”๊พธ์–ด ์ค€๋‹ค.
  • while(norm(grad)>eps):
  • f(x) = x^2 + 2y^2 ์ผ ๋•Œ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์œผ๋กœ ์ตœ์†Ÿ์  ์ฐพ๊ธฐ
import sympy as sym
import numpy as np
from sympy.abc import x, y

# Multivariate Gradient Descent

def eval_(fun, val):			# ํ•จ์ˆ˜๊ฐ’์„ ๊ณ„์‚ฐ
    val_x, val_y = val
    fun_eval = fun.subs(x, val_x).subs(y, val_y)
    return fun_eval

def func_multi(val):			# ํ•จ์ˆ˜ ์ •์˜
    x_, y_ = val
    func = sym.poly(x**2 + 2*x*y + 2)
    return eval_(func, [x_, y_]), func

def func_gradient(fun, val):	# gradient vector 
    x_, y_ = val
    _, function = fun(val)
    diff_x = sym.diff(function, x)
    diff_y = sym.diff(function, y)
    grad_vec = np.array([eval_(diff_x, [x_, y_]), eval_(diff_y, [x_, y_])], dtype=float)
    return grad_vec, [diff_x, diff_y]

def gradient_descent(fun, init_point, lr_rate=1e-2, epsilon=1e-5):
    cnt=0
    val = init_point
    diff, _ = func_gradient(fun, val)
    while np.linalg.norm(diff) > epsilon:
        val = val - lr_rate*diff
        diff, _ = func_gradient(fun, val)
        cnt+=1
    print("ํ•จ์ˆ˜: {}, ์—ฐ์‚ฐํšŸ์ˆ˜: {}, ์ตœ์†Œ๊ฐ’: ({}, {})".format(fun(val)[1], cnt, val, fun(val)[0]))

# ์‹คํ–‰
pt = [np.random.uniform(-2, 2), np.random.uniform(-2, 2)]
gradient_descent(fun=func_multi, init_point=pt)

 

๋ฐ˜์‘ํ˜•

์ตœ๊ทผ๋Œ“๊ธ€

์ตœ๊ทผ๊ธ€

skin by ยฉ 2024 ttutta