Bài giảng Tối ưu hóa nâng cao: Chương 7 - Hoàng Nam Dũng

Số trang: 34 Loại file: pdf Dung lượng: 543.20 KB Lượt xem: 9 Lượt tải: 0

tailieu_vip

Hỗ trợ phí lưu trữ khi tải xuống: 9,000 VND

Báo xấu

Xem trước 4 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Bài giảng "Tối ưu hóa nâng cao - Chương 7: Subgradient method" cung cấp cho người học các kiến thức: Last last time - gradient descent, subgradient method, step size choices, convergence analysis, lipschitz continuity, convergence analysis - Proof,... Mời các bạn cùng tham khảo.
Nội dung trích xuất từ tài liệu:
Bài giảng Tối ưu hóa nâng cao: Chương 7 - Hoàng Nam DũngSubgradient MethodHoàng Nam DũngKhoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà NộiLast last time: gradient descent Consider the problem min f (x) x for f convex and differentiable, dom(f ) = Rn . Gradient descent: choose initial x (0) ∈ Rn , repeat: x (k) = x (k−1) − tk · ∇f (x (k−1) ), k = 1, 2, 3, . . . Step sizes tk chosen to be fixed and small, or by backtracking line search. If ∇f Lipschitz, gradient descent has convergence rate O(1/ε). Downsides: I Requires f differentiable — addressed this lecture. I Can be slow to converge — addressed next lecture. 1Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . 2Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . Subgradient method is not necessarily a descent method, so we (k) keep track of best iterate xbest among x (0) , ...x (k) so far, i.e., (k) f (xbest ) = min f (x (i) ). i=0,...,k 2Outline Today: I How to choose step sizes I Convergence analysis I Intersection of sets I Projected subgradient method 3Step size choices I Fixed step sizes: tk = t all k = 1, 2, 3, . . . I Fixed step length, i.e., tk = s/kg (k−1) k2 , and hence ktk g (k−1) k2 = s. I Diminishing step sizes: choose to meet conditions X∞ X∞ tk2 < ∞, tk = ∞, k=1 k=1 i.e., square summable but not summable. Important here that step sizes go to zero, but not too fast. There are several other options too, but key difference to gradient descent: step sizes are pre-specified, not adaptively computed. 4Convergence analysis Assume that f convex, dom(f ) = Rn , and also that f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y . Theorem For a fixed step size t, subgradient method satisfies (k) kx (0) − x ∗ k22 L2 t f (xbest ) − f ∗ ≤ + . 2kt 2 For fixed step length, i.e., tk = s/kg (k−1) k2 , we have (k) Lkx (0) − x ∗ k22 Ls f (xbest ) − f ∗ ≤ + . 2ks 2 For diminishing step sizes, subgradient method satisfies kx (0) − x ∗ k22 + L2 ki=1 ti2 P (k) ∗ f (xbest ) − f ≤ , 2 ki=1 ti P (k) i.e., lim f (xbest ) = f ∗ . 5 k→∞Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. ⇐=: Choose subgradients gx and gy at x and y . We have gxT (x − y ) ≥ f (x) − f (y ) ≥ gyT (x − y ). Apply Cauchy-Schwarz inequality get Lkx − y k2 ≥ f (x) − f (y ) ≥ −Lkx − y k2 . 6Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. =⇒: Assume kg k2 > L for some g ∈ ∂f (x). Take y = x + g /kg k2 we have ky − xk2 = 1 and f (y ) ≥ f (x) + g T (y − x) = f (x) + kg k2 > f (x) + L, contradiction. 7Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient kx (k) − x ∗ k22 = kx (k−1) − tk g (k−1) − x ∗ k22 = kx (k−1) − x ∗ k22 − 2tk g (k−1) (x (k−1) − x ∗ ) + tk2 kg (k−1) k22 ≤ kx (k−1) − x ∗ k22 − 2tk (f (x (k−1) ) − f (x ∗ )) + tk2 kg (k−1) k22 . 8Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient ...