Mạng Nơ-ron hồi quy

Số trang: 6 Loại file: pdf Dung lượng: 586.10 KB Lượt xem: 10 Lượt tải: 0

Jamona

Hỗ trợ phí lưu trữ khi tải xuống: 4,000 VND

Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

RNN có khả năng nhớ các thông tin được tính toán trước. Gần đây, mạng LSTM đang được chú ý và sử dụng khá phổ biến. Về cơ bản mô hình của LSTM không khác mô hình truyền thống của RNN, nhưng chúng sử dụng hàm tính toán khác ở các trạng thái ẩn. Vì vậy mà ta có thể truy xuất được quan hệ của các từ phụ thuộc xa nhau rất hiệu quả. Việc ứng dụng LSTM sẽ được giới thiệu ở bài báo sau. Mời các bạn cùng tham khảo chi tiết nội dung bài viết!
Nội dung trích xuất từ tài liệu:
Mạng Nơ-ron hồi quy MẠNG NƠ-RON HỒI QUY Bùi Quốc Khánh* Trường Đại học Hà Nội Tóm tắt: Ý tưởng chính của mạng hồi quy (Recurrent Neural Network) là sử dụng chuỗicác thông tin. Trong các mạng nơ-ron truyền thống tất cả các đầu vào và cả đầu ra là độc lậpvới nhau. Tức là chúng không liên kết thành chuỗi với nhau. Nhưng các mô hình này không phùhợp trong rất nhiều bài toán. RNN được gọi là hồi quy (Recurrent) bởi lẽ chúng thực hiện cùngmột tác vụ cho tất cả các phần tử của một chuỗi với đầu ra phụ thuộc vào cả các phép tínhtrước đó. Nói cách khác, RNN có khả năng nhớ các thông tin được tính toán trước. Gần đây,mạng LSTM đang được chú ý và sử dụng khá phổ biến. Về cơ bản mô hình của LSTM khôngkhác mô hình truyền thống của RNN, nhưng chúng sử dụng hàm tính toán khác ở các trạng tháiẩn. Vì vậy mà ta có thể truy xuất được quan hệ của các từ phụ thuộc xa nhau rất hiệu quả. Việcứng dụng LSTM sẽ được giới thiệu ở bài báo sau. Từ khóa: Neural Networks, Recurrent Neural Networks, Sequential Data. Abstract: One major assumption for Neural Networks (NNs) and in fact many othermachine learning models is the independence among data samples. However, this assumptiondoes not hold for data which is sequential in nature. One mechanism to account for sequentialdependency is to concatenate a fixed number of consecutive data samples together and treatthem as one data point, like moving a fixed size sliding window over data stream. RecurrentNeural Networks (RNNs) process the input sequence one element at a time and maintain ahidden state vector which acts as a memory for past information. They learn to selectively retainrelevant information allowing them to capture dependencies across several time steps, whichallows them to utilize both current input and past information while making future predictions. Keywords: Neural Networks, Recurrent Neural Networks, Sequential Data. RECURRENT NEURAL NETWORK I. MOTIVATION FOR RECURRENT NEURAL NETWORKS Before studying RNNs it would be worthwhile to understand why there is a needfor RNNs and the shortcoming of NNs in modeling sequential data. One majorassumption for NNs and in fact many other machine learning models is theindependence among data samples. However, this assumption does not hold for datawhich is sequential in nature. Speech, language, time series, video, etc. all exhibitdependence between individual elements across time. NNs treat each data sampleindividually and thereby lose the benefit that can be derived by exploiting thissequential information. One mechanism to account for sequential dependency is toconcatenate a fixed number of consecutive data samples together and treat them as one 12data point, similar to moving a fixed size sliding window over data stream. Thisapproach was used in the work of [13] for time series prediction using NNs, and in thatof [14] for acoustic modeling. But as mentioned by [13], the success of this approachdepends on finding the optimal window size: a small window size does not capture thelonger dependencies, whereas a larger window size than needed would add unnecessarynoise. More importantly, if there are long-range dependencies in data ranging overhundreds of time steps, a window-based method would not scale. Another disadvantageof conventional NNs is that they cannot handle variable length sequences. For manydomains like speech modeling, language translation the input sequences vary in length. A hidden Markov model (HMM) [15] can model sequential data without requiringa fixed size window. HMMs map an observed sequence to a set of hidden states bydefining probability distributions for transition between hidden states, and relationshipsbetween observed values and hidden states. HMMs are based on the Markov propertyaccording to which each state depends only on the immediately preceding state. Thisseverely limits the ability of HMMs to capture long-range dependencies. Furthermore,the space complexity of HMMs grows quadratically with the number of states and doesnot scale well. RNNs process the input sequence one element at a time and maintain a hiddenstate vector which acts as a memory for past information. They learn to selectivelyretain relevant information allowing them to capture dependencies across several timesteps. This allows them to utilize both current input and past information while makingfuture predictions. All this is learned by the model automatically without muchknowledge of the cycles or time dependencies in data. RNNs obviate the need for afixed size time window and can also handle variable length sequences. Moreover, thenumber of states that can be ...