Bài giảng Tối ưu hóa nâng cao: Chương 7 - Hoàng Nam Dũng
Số trang: 34
Loại file: pdf
Dung lượng: 543.20 KB
Lượt xem: 9
Lượt tải: 0
Xem trước 4 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
Bài giảng "Tối ưu hóa nâng cao - Chương 7: Subgradient method" cung cấp cho người học các kiến thức: Last last time - gradient descent, subgradient method, step size choices, convergence analysis, lipschitz continuity, convergence analysis - Proof,... Mời các bạn cùng tham khảo.
Nội dung trích xuất từ tài liệu:
Bài giảng Tối ưu hóa nâng cao: Chương 7 - Hoàng Nam DũngSubgradient MethodHoàng Nam DũngKhoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà NộiLast last time: gradient descent Consider the problem min f (x) x for f convex and differentiable, dom(f ) = Rn . Gradient descent: choose initial x (0) ∈ Rn , repeat: x (k) = x (k−1) − tk · ∇f (x (k−1) ), k = 1, 2, 3, . . . Step sizes tk chosen to be fixed and small, or by backtracking line search. If ∇f Lipschitz, gradient descent has convergence rate O(1/ε). Downsides: I Requires f differentiable — addressed this lecture. I Can be slow to converge — addressed next lecture. 1Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . 2Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . Subgradient method is not necessarily a descent method, so we (k) keep track of best iterate xbest among x (0) , ...x (k) so far, i.e., (k) f (xbest ) = min f (x (i) ). i=0,...,k 2Outline Today: I How to choose step sizes I Convergence analysis I Intersection of sets I Projected subgradient method 3Step size choices I Fixed step sizes: tk = t all k = 1, 2, 3, . . . I Fixed step length, i.e., tk = s/kg (k−1) k2 , and hence ktk g (k−1) k2 = s. I Diminishing step sizes: choose to meet conditions X∞ X∞ tk2 < ∞, tk = ∞, k=1 k=1 i.e., square summable but not summable. Important here that step sizes go to zero, but not too fast. There are several other options too, but key difference to gradient descent: step sizes are pre-specified, not adaptively computed. 4Convergence analysis Assume that f convex, dom(f ) = Rn , and also that f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y . Theorem For a fixed step size t, subgradient method satisfies (k) kx (0) − x ∗ k22 L2 t f (xbest ) − f ∗ ≤ + . 2kt 2 For fixed step length, i.e., tk = s/kg (k−1) k2 , we have (k) Lkx (0) − x ∗ k22 Ls f (xbest ) − f ∗ ≤ + . 2ks 2 For diminishing step sizes, subgradient method satisfies kx (0) − x ∗ k22 + L2 ki=1 ti2 P (k) ∗ f (xbest ) − f ≤ , 2 ki=1 ti P (k) i.e., lim f (xbest ) = f ∗ . 5 k→∞Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. ⇐=: Choose subgradients gx and gy at x and y . We have gxT (x − y ) ≥ f (x) − f (y ) ≥ gyT (x − y ). Apply Cauchy-Schwarz inequality get Lkx − y k2 ≥ f (x) − f (y ) ≥ −Lkx − y k2 . 6Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. =⇒: Assume kg k2 > L for some g ∈ ∂f (x). Take y = x + g /kg k2 we have ky − xk2 = 1 and f (y ) ≥ f (x) + g T (y − x) = f (x) + kg k2 > f (x) + L, contradiction. 7Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient kx (k) − x ∗ k22 = kx (k−1) − tk g (k−1) − x ∗ k22 = kx (k−1) − x ∗ k22 − 2tk g (k−1) (x (k−1) − x ∗ ) + tk2 kg (k−1) k22 ≤ kx (k−1) − x ∗ k22 − 2tk (f (x (k−1) ) − f (x ∗ )) + tk2 kg (k−1) k22 . 8Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient ...
Nội dung trích xuất từ tài liệu:
Bài giảng Tối ưu hóa nâng cao: Chương 7 - Hoàng Nam DũngSubgradient MethodHoàng Nam DũngKhoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà NộiLast last time: gradient descent Consider the problem min f (x) x for f convex and differentiable, dom(f ) = Rn . Gradient descent: choose initial x (0) ∈ Rn , repeat: x (k) = x (k−1) − tk · ∇f (x (k−1) ), k = 1, 2, 3, . . . Step sizes tk chosen to be fixed and small, or by backtracking line search. If ∇f Lipschitz, gradient descent has convergence rate O(1/ε). Downsides: I Requires f differentiable — addressed this lecture. I Can be slow to converge — addressed next lecture. 1Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . 2Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . Subgradient method is not necessarily a descent method, so we (k) keep track of best iterate xbest among x (0) , ...x (k) so far, i.e., (k) f (xbest ) = min f (x (i) ). i=0,...,k 2Outline Today: I How to choose step sizes I Convergence analysis I Intersection of sets I Projected subgradient method 3Step size choices I Fixed step sizes: tk = t all k = 1, 2, 3, . . . I Fixed step length, i.e., tk = s/kg (k−1) k2 , and hence ktk g (k−1) k2 = s. I Diminishing step sizes: choose to meet conditions X∞ X∞ tk2 < ∞, tk = ∞, k=1 k=1 i.e., square summable but not summable. Important here that step sizes go to zero, but not too fast. There are several other options too, but key difference to gradient descent: step sizes are pre-specified, not adaptively computed. 4Convergence analysis Assume that f convex, dom(f ) = Rn , and also that f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y . Theorem For a fixed step size t, subgradient method satisfies (k) kx (0) − x ∗ k22 L2 t f (xbest ) − f ∗ ≤ + . 2kt 2 For fixed step length, i.e., tk = s/kg (k−1) k2 , we have (k) Lkx (0) − x ∗ k22 Ls f (xbest ) − f ∗ ≤ + . 2ks 2 For diminishing step sizes, subgradient method satisfies kx (0) − x ∗ k22 + L2 ki=1 ti2 P (k) ∗ f (xbest ) − f ≤ , 2 ki=1 ti P (k) i.e., lim f (xbest ) = f ∗ . 5 k→∞Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. ⇐=: Choose subgradients gx and gy at x and y . We have gxT (x − y ) ≥ f (x) − f (y ) ≥ gyT (x − y ). Apply Cauchy-Schwarz inequality get Lkx − y k2 ≥ f (x) − f (y ) ≥ −Lkx − y k2 . 6Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. =⇒: Assume kg k2 > L for some g ∈ ∂f (x). Take y = x + g /kg k2 we have ky − xk2 = 1 and f (y ) ≥ f (x) + g T (y − x) = f (x) + kg k2 > f (x) + L, contradiction. 7Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient kx (k) − x ∗ k22 = kx (k−1) − tk g (k−1) − x ∗ k22 = kx (k−1) − x ∗ k22 − 2tk g (k−1) (x (k−1) − x ∗ ) + tk2 kg (k−1) k22 ≤ kx (k−1) − x ∗ k22 − 2tk (f (x (k−1) ) − f (x ∗ )) + tk2 kg (k−1) k22 . 8Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient ...
Tìm kiếm theo từ khóa liên quan:
Bài giảng Tối ưu hóa nâng cao Tối ưu hóa nâng cao Tối ưu hóa Subgradient method Gradient descent Lipschitz continuity Convergence analysisTài liệu liên quan:
-
Tóm tắt luận án tiến sỹ Một số vấn đề tối ưu hóa và nâng cao hiệu quả trong xử lý thông tin hình ảnh
28 trang 225 0 0 -
BÀI TẬP TỔNG HỢP - QUY HOẠCH TUYẾN TÍNH
3 trang 68 0 0 -
Giáo trình Tối ưu hóa - PGS.TS. Nguyễn Hải Thanh
187 trang 45 0 0 -
Tổng hợp bài tập Tối ưu hoá: Phần 2
152 trang 36 0 0 -
Giáo trình tối ưu hóa - Chương 5
31 trang 35 0 0 -
Tổng hợp bài tập Tối ưu hoá: Phần 1
177 trang 29 0 0 -
Công nghiệp thực phẩm và quá trình tối ưu hóa: Phần 1
174 trang 29 0 0 -
Bài giảng Lý thuyết tối ưu - Phan Lê Na
181 trang 29 0 0 -
7 trang 29 0 0
-
Giáo trình tối ưu hóa - Chương 2
28 trang 28 0 0