Mạng thần kinh thường xuyên cho dự đoán P4

Số trang: 22 Loại file: pdf Dung lượng: 1,023.54 KB Lượt xem: 8 Lượt tải: 0

Jamona

Phí lưu trữ: 4,000 VND

Xem trước 3 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Activation Functions Used in Neural Networks PerspectiveThe choice of nonlinear activation function has a key inﬂuence on the complexity and performance of artiﬁcial neural networks, note the term neural network will be used interchangeably with the term artiﬁcial neural network. The brief introduction to activation functions given in Chapter 3 is therefore extended. Although sigmoidal nonlinear activation functions are the most common choice, there is no strong a priori justiﬁcation why models based on such functions should be preferred to others....
Nội dung trích xuất từ tài liệu:
Mạng thần kinh thường xuyên cho dự đoán P4 Recurrent Neural Networks for Prediction Authored by Danilo P. Mandic, Jonathon A. Chambers Copyright c 2001 John Wiley & Sons Ltd ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)4Activation Functions Used inNeural Networks4.1 PerspectiveThe choice of nonlinear activation function has a key inﬂuence on the complexityand performance of artiﬁcial neural networks, note the term neural network will beused interchangeably with the term artiﬁcial neural network. The brief introductionto activation functions given in Chapter 3 is therefore extended. Although sigmoidalnonlinear activation functions are the most common choice, there is no strong a priorijustiﬁcation why models based on such functions should be preferred to others. We therefore introduce neural networks as universal approximators of functions andtrajectories, based upon the Kolmogorov universal approximation theorem, whichis valid for both feedforward and recurrent neural networks. From these universalapproximation properties, we then demonstrate the need for a sigmoidal activationfunction within a neuron. To reduce computational complexity, approximations tosigmoid functions are further discussed. The use of nonlinear activation functionssuitable for hardware realisation of neural networks is also considered. For rigour, we extend the analysis to complex activation functions and recognisethat a suitable complex activation function is a M¨bius transformation. In that con- otext, a framework for rigorous analysis of some inherent properties of neural networks,such as ﬁxed points, nesting and invertibility based upon the theory of modular groupsof M¨bius transformations is provided. o All the relevant deﬁnitions, theorems and other mathematical terms are given inAppendix B and Appendix C.4.2 IntroductionA century ago, a set of 23 (originally) unsolved problems in mathematics was proposedby David Hilbert (Hilbert 1901–1902). In his lecture ‘Mathematische Probleme’ at thesecond International Congress of Mathematics held in Paris in 1900, he presented 10of them. These problems were designed to serve as examples for the kinds of prob-lems whose solutions would lead to further development of disciplines in mathematics.48 INTRODUCTIONHis 13th problem concerned solutions of polynomial equations. Although his originalformulation dealt with properties of the solution of the seventh degree algebraic equa-tion, 1 this problem can be restated as: Prove that there are continuous functions of nvariables, not representable by a superposition of continuous functions of (n − 1) vari-ables. In other words, could a general algebraic equation of a high degree be expressedby sums and compositions of single-variable functions? 2 In 1957, Kolmogorov showedthat the conjecture of Hilbert was not correct (Kolmogorov 1957). Kolmogorov’s theorem is a general representation theorem stating that any real-valued continuous function f deﬁned on an n-dimensional cube I n (n 2) can berepresented as 2n+1 n f (x1 , . . . , xn ) = Φq ψpq (xp ) , (4.1) q=1 p=1where Φq ( · ), q = 1, . . . , 2n + 1, and ψpq ( · ), p = 1, . . . , n, q = 1, . . . , 2n + 1, aretypically nonlinear continuous functions of one variable. For a neural network representation, this means that an activation function of aneuron has to be nonlinear to form a universal approximator. This also means thatevery continuous function of many variables can be represented by a four-layered neu-ral network with two hidden layers and an input and output layer, whose hidden unitsrepresent mappings Φ and ψ. However, this does not mean that a network with twohidden layers necessarily provides an accurate representation of function f . In fact,functions ψpq of Kolmogorov’s theorem are quite often highly nonsmooth, whereasfor a neural network we want smooth nonlinear activation functions, as is requiredby gradient-descent learning algorithms (Poggio and Girosi 1990). Vitushkin (1954)showed that there are functions of more than one variable which do not have a rep-resentation by superpositions of diﬀerentiable functions (Beiu 1998). Important ques-tions about Kolmogorov’s representation are therefore existence, constructive proofsand bounds on the size of a network needed for approximation. Kolmogorov’s representation has been improved by several authors. Sprecher (1965)replaced functions ψpq in the Kolmogorov representation by λpq ψq , where λ is aconstant and ψq are monotonic increasing functions which belong to the class ofLipschitz functions. Lorentz (1976) showed that the functions Φq can be replaced byonly one function Φ. Hecht-Nielsen reformulated this result for MLPs so that they areable to approximate any function. In this case, functions ψ are nonlinear activationfunctions in hidden layers, whereas functions Φ are nonlinear activation functions inthe output layer. The functions Φ and ψ are found, however, to be generally highlynonsmooth. Further, in Katsuura and Sprecher (1994), the function ψ is obtainedthrough a graph that is the limit point of an iterated composition of contractionmappings on their domain. In applications of neural networks for universal approximation, the existence prooffor approximation by neural networks is provided by Kolmogorov’s theorem, which 1 Hilbert conjectured that the roots of the equation x7 + ax3 + b ...