Bài giảng Tối ưu hóa nâng cao: Chương 8 - Hoàng Nam Dũng

Số trang: 50 Loại file: pdf Dung lượng: 1.60 MB Lượt xem: 9 Lượt tải: 0

Jamona

Phí tải xuống: 14,000 VND

Xem trước 5 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Bài giảng "Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)" cung cấp cho người học các kiến thức: Subgradient method, decomposable functions, proximal mapping, proximal gradient descent,.... Mời các bạn cùng tham khảo.
Nội dung trích xuất từ tài liệu:
Bài giảng Tối ưu hóa nâng cao: Chương 8 - Hoàng Nam DũngProximal Gradient Descent (and Acceleration)Hoàng Nam DũngKhoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà NộiLast time: subgradient method Consider the problem min f (x) x with f convex, and dom(f ) = Rn . Subgradient method: choose an initial x (0) ∈ Rn , and repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ). We use pre-set rules for the step sizes (e.g., diminshing step sizes rule). If f is Lipschitz, then subgradient method has a convergence rate O(1/ε2 ). Upside: very generic. Downside: can be slow — addressed today. 1Outline Today I Proximal gradient descent I Convergence analysis I ISTA, matrix completion I Special cases I Acceleration 2Decomposable functions Suppose f (x) = g (x) + h(x) where I g is convex, differentiable, dom(g ) = Rn I h is convex, not necessarily differentiable. If f were differentiable, then gradient descent update would be x + = x − t · ∇f (x) Recall motivation: minimize quadratic approximation to f around x, replace ∇2 f (x) by 1t I 1 x + = argminz f (x) + ∇f (x)T (z − x) + kz − xk22 . | {z 2t } f˜t (z) 3Decomposable functions In our case f is not differentiable, but f = g + h, g differentiable. Why don’t we make quadratic approximation to g , leave h alone? I.e., update x + = argminz g˜t (z) + h(z) 1 = argminz g (x) + ∇g (x)T (z − x) + kz − xk22 + h(z) 2t 1 = argminz kz − (x − t∇g (x))k22 + h(z). 2t 1 2t kz − (x − t∇g (x))k22 stay close to gradient update for g h(z) also make h small 4Proximal mapping The proximal mapping (or prox-operator) of a convex function h is defined as 1 proxh (x) = argminz kx − zk22 + h(z). 2 5Proximal mapping The proximal mapping (or prox-operator) of a convex function h is defined as 1 proxh (x) = argminz kx − zk22 + h(z). 2 Examples: I h(x) = 0: proxh (x) = x. 5Proximal mapping The proximal mapping (or prox-operator) of a convex function h is defined as 1 proxh (x) = argminz kx − zk22 + h(z). 2 Examples: I h(x) = 0: proxh (x) = x. I h(x) is indicator function of a closed convex set C : proxh is the projection on C 1 proxh (x) = argminz∈C kx − zk22 = PC (x). 2 5Proximal mapping The proximal mapping (or prox-operator) of a convex function h is defined as 1 proxh (x) = argminz kx − zk22 + h(z). 2 Examples: I h(x) = 0: proxh (x) = x. I h(x) is indicator function of a closed convex set C : proxh is the projection on C 1 proxh (x) = argminz∈C kx − zk22 = PC (x). 2 I h(x) = kxk1 : proxh is the ’soft-threshold’ (shrinkage) operation    xi − 1 xi ≥ 1 proxh (x)i = 0 |xi | ≤ 1   xi + 1 xi ≤ −1. 5Proximal mapping Theorem If h is convex and closed (has closed epigraph) then 1 proxh (x) = argminz kx − zk22 + h(z). 2 exists and is unique for all x. Chứng minh. See http://www.seas.ucla.edu/~vandenbe/236C/lectures/ proxop.pdf Uniqueness since the objective function is strictly convex. 6Proximal mapping Theorem If h is convex and closed (has closed epigraph) then 1 proxh (x) = argminz kx − zk22 + h(z). 2 exists and is unique for all x. Chứng minh. See http://www.seas.ucla.edu/~vandenbe/236C/lectures/ proxop.pdf Uniqueness since the objective function is strictly convex. Optimality condition z = proxh (x) ⇔ x − z ∈ ∂h(z) ⇔ h(u) ≥ h(z) + (x − z)T (u − z), ∀u. 6Properties of proximal mapping Theorem Proximal mappings are firmly nonexpansive (co-coercive with constant 1) (proxh (x) − proxh (y ))T (x − y ) ≥ k proxh (x) − proxh (y )k22 . 7P ...