Mạng thần kinh thường xuyên cho dự đoán P9

Số trang: 12 Loại file: pdf Dung lượng: 419.39 KB Lượt xem: 9 Lượt tải: 0

tailieu_vip

Phí lưu trữ: 4,000 VND

Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

A Class of Normalised Algorithms for Online Training of Recurrent Neural NetworksA normalised version of the real-time recurrent learning (RTRL) algorithm is introduced. This has been achieved via local linearisation of the RTRL around the current point in the state space of the network. Such an algorithm provides an adaptive learning rate normalised by the L2 norm of the gradient vector at the output neuron. The analysis is general and also covers simpler cases of feedforward networks and linear FIR ﬁlters...
Nội dung trích xuất từ tài liệu:
Mạng thần kinh thường xuyên cho dự đoán P9 Recurrent Neural Networks for Prediction Authored by Danilo P. Mandic, Jonathon A. Chambers Copyright c 2001 John Wiley & Sons Ltd ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)9A Class of NormalisedAlgorithms for Online Trainingof Recurrent Neural Networks9.1 PerspectiveA normalised version of the real-time recurrent learning (RTRL) algorithm is intro-duced. This has been achieved via local linearisation of the RTRL around the currentpoint in the state space of the network. Such an algorithm provides an adaptive learn-ing rate normalised by the L2 norm of the gradient vector at the output neuron. Theanalysis is general and also covers simpler cases of feedforward networks and linearFIR ﬁlters.9.2 IntroductionGradient-descent-based algorithms for training neural networks, such as the back-propagation, backpropagation through time, recurrent backpropagation (RBP) andreal-time recurrent learning (RTRL) algorithm, typically suﬀer from slow convergencewhen dealing with statistically nonstationary inputs. In the area of linear adaptiveﬁlters, similar problems with the LMS algorithm have been addressed by utilisingnormalised algorithms, such as NLMS. We therefore introduce a normalised RTRL-based learning algorithm with the idea to impose similar stabilisation and convergenceeﬀects on training of RNNs, as normalisation imposes on the LMS algorithm. In the area of linear FIR adaptive ﬁlters, it is shown (Soria-Olivas et al. 1998) thata normalised gradient-descent-based learning algorithm can be derived starting fromthe Taylor series expansion of the instantaneous output error of an adaptive FIR ﬁlter,given by N N N ∂e(k) 1 ∂ 2 e(k)e(k + 1) = e(k) + ∆wi (k) + ∆wi (k)∆wj (k) + · · · . i=1 ∂wi (k) 2! i=1 j=1 ∂wi (k)∂wj (k) (9.1)150 OVERVIEWFrom the mathematical description of LMS 1 from Chapter 2, we have ∂e(k) = −x(k − i + 1), i = 1, 2, . . . , N, (9.2) ∂wi (k)and ∆wi (k) = µ(k)e(k)x(k − i + 1), i = 1, 2, . . . , N. (9.3)Due to the linearity of the FIR ﬁlter, the second- and higher-order partial derivativesin (9.1) vanish. Combining (9.1)–(9.3) yields e(k + 1) = e(k) − µ(k)e(k) x(k) 2 2 (9.4)for which the nontrivial solution gives the learning rate of a normalised LMS algorithm 1 µNLMS (k) = 2. (9.5) x(k) 2The stability analysis of adaptive algorithms can be undertaken using contractiveoperators and ﬁxed point iteration. For the contractive operator T , it follows that T z1 − T z2 γ z1 − z 2 , 0 γ < 1, z1 , z2 ∈ R N . (9.6)The convergence analysis of LMS, for instance, can be undertaken starting from themisalignment 2 vector v(k) = w(k) − w(k) by setting z1 = v(k + 1), z2 = v(0) ˜and T = [I − µ(k)x(k)x (k)] (Gholkar 1990). Detailed convergence analysis for a Tclass of gradient-based learning algorithms for recurrent neural networks is given inChapter 10.9.3 OverviewA class of normalised gradient-based algorithms is derived starting from the LMSalgorithm for linear adaptive ﬁlters through to a normalised algorithm for trainingrecurrent neural networks. For each case the adaptive learning rate has been derived.Stability of such algorithms is addressed in Chapter 10. The normalised algorithmsare shown to outperform standard algorithms with ﬁxed learning rate. 1 The two core equations for adaptation of the LMS algorithm are e(k) = d(k) − xT (k)w(k), w(k + 1) = w(k) + µ(k)e(k)x(k). 2 The misalignment vector is deﬁned as v(k) = w(k) − w(k), where w(k) is the set of optimal ˜ ˜weights of the system.A CLASS OF NORMALISED ALGORITHMS FOR TRAINING OF RNNs 151 0 −5 Averaged squared prediction error in dB −10 LMS −15 NLMS NGD −20 −25 NNGD −30 100 200 300 400 500 600 700 800 900 1000 ...