Mạng thần kinh thường xuyên cho dự đoán P10

Số trang: 9 Loại file: pdf Dung lượng: 120.55 KB Lượt xem: 9 Lượt tải: 0

Hoai.2512

Phí lưu trữ: 5,000 VND

Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Convergence of Online Learning Algorithms in Neural NetworksAn analysis of convergence of real-time algorithms for online learning in recurrent neural networks is presented. For convenience, the analysis is focused on the real-time recurrent learning (RTRL) algorithm for a recurrent perceptron. Using the assumption of contractivity of the activation function of a neuron and relaxing the rigid assumptions of the ﬁxed optimal weights of the system, the analysis presented is general and is applicable to a wide range of existing algorithms....
Nội dung trích xuất từ tài liệu:
Mạng thần kinh thường xuyên cho dự đoán P10 Recurrent Neural Networks for Prediction Authored by Danilo P. Mandic, Jonathon A. Chambers Copyright c 2001 John Wiley & Sons Ltd ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)10Convergence of OnlineLearning Algorithms inNeural Networks10.1 PerspectiveAn analysis of convergence of real-time algorithms for online learning in recurrentneural networks is presented. For convenience, the analysis is focused on the real-timerecurrent learning (RTRL) algorithm for a recurrent perceptron. Using the assump-tion of contractivity of the activation function of a neuron and relaxing the rigidassumptions of the ﬁxed optimal weights of the system, the analysis presented is gen-eral and is applicable to a wide range of existing algorithms. It is shown that some ofthe results obtained for stochastic gradient algorithms for linear systems can be con-sidered as a bound for stability of RNN-based algorithms, as long as the contractivitycondition holds.10.2 IntroductionThe following criteria (Bershad et al. 1990) are most commonly used to assess theperformance of adaptive algorithms. 1. Convergence (consistency of the statistics). 2. Transient behaviour (how quickly the algorithm reacts to changes in the statis- tics of the input). 3. Convergence rate (how quickly the algorithm approaches the optimal solution), which can be linear, quadratic or superlinear. The standard approach for the analysis of convergence of learning algorithms forlinear adaptive ﬁlters is to look at convergence of the mean weight error vector, con-vergence in the mean square and at the steady-state misadjustment (Gholkar 1990;Haykin 1996a; Kuan and Hornik 1991; Widrow and Stearns 1985). The analysis ofconvergence of steepest-descent-based algorithms has been ongoing ever since their162 INTRODUCTIONintroduction (Guo and Ljung 1995; Ljung 1984; Slock 1993; Tarrab and Feuer 1988).Some of the recent results consider the exact expectation analysis of the LMS algo-rithm for linear adaptive ﬁlters (Douglas and Pan 1995) and the analysis of LMSwith Gaussian inputs (Bershad 1986). For neural networks as nonlinear adaptive ﬁl-ters, the analysis is far more diﬃcult, and researchers have often resorted to numericalexperiments (Ahmad et al. 1990). Convergence of neural networks has been consid-ered in Shynk and Roy (1990), Bershad et al. (1993a) and Bershad et al. (1993b),where the authors used the Gaussian model for input data and a Rosenblatt percep-tron learning algorithm. These analyses, however, were undertaken for a hard limiternonlinearity, which is not convenient for nonlinear adaptive ﬁlters. Convergence ofRTRL was addressed in Mandic and Chambers (2000b) and Chambers et al. (2000). An error equation for online training of a recurrent perceptron can be expressed as e(k) = s(k) − Φ(uT (k)w(k)), (10.1)where s(k) is the teaching (desired) signal, w(k) = [w1 (k), . . . , wN (k)]T is the weightvector and u(k) = [u1 (k), . . . , uN (k)]T is an input vector. A weight update equationfor a general class of stochastic gradient-based nonlinear neural algorithms can beexpressed as w(k + 1) = w(k) + η(k)F (u(k))g(u(k), w(k)), (10.2)where η(k) is the learning rate, F : RN → RN usually consists of N copies of thescalar function f and g( · ) is a scalar function related to the error e(k). The functionF is related to data nonlinearities, which have an inﬂuence on the convergence ofthe algorithm. The function g is related to error nonlinearities, and it aﬀects the costfunction to be minimised. Error nonlinearities are mostly chosen to be sign-preserving(Sethares 1992). Let us assume additive noise q(k) ∼ N (0, σq ) in the output of the system, which 2can be expressed as s(k) = Φ(uT (k)w(k)) + q(k), ˜ (10.3) ˜where w(k) are optimal ﬁlter weights and q(k) is an i.i.d. sequence. The error equa-tion (10.1) now becomes e(k) = Φ(uT (k)w(k)) − Φ(uT (k)w(k)) + q(k). ˜ (10.4)To examine the stability of algorithm (10.2), researchers often resort to linearisation.For RTRL, F is an identity matrix and g is some nonlinear, sign-preserving functionof the output error. A further assumption is that the learning rate η is suﬃcientlysmall to allow the algorithm to be linearised around its current point in the statespace. From Lyapunov stability theory, the system z(k + 1) = F (k, z(k)) (10.5)can be analysed via its linearised version z(k + 1) = A(k)z(k), (10.6)where A is the Jacobian of F . This is the Lyapunov indirect method and assumesthat A(k) is bounded in the neighbourhood of the current point in the state spaceCONVERGENCE OF LEARNING ALGORITHMS IN NNs 163and that F (k, z) − A(k)z lim max = 0, (10.7) z →0 k zwhich guarantees that time variation in the nonlinear terms of the Taylor series expan-sion of (10.5) does not become arbitrarily large in time (Chambers et al. 2000). Resultson Lyapunov stability for a class of nonlinear systems can be found in Wang andMichel (1994) and Tanaka ...