![Phân tích tư tưởng của nhân dân qua đoạn thơ: Những người vợ nhớ chồng… Những cuộc đời đã hóa sông núi ta trong Đất nước của Nguyễn Khoa Điềm](https://timtailieu.net/upload/document/136415/phan-tich-tu-tuong-cua-nhan-dan-qua-doan-tho-039-039-nhung-nguoi-vo-nho-chong-nhung-cuoc-doi-da-hoa-song-nui-ta-039-039-trong-dat-nuoc-cua-nguyen-khoa-136415.jpg)
Báo cáo khoa học: Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System
Số trang: 4
Loại file: pdf
Dung lượng: 2.09 MB
Lượt xem: 6
Lượt tải: 0
Xem trước 2 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
Over several years, we have developed an approach to spoken dialogue systems that includes rule-based and trainable dialogue managers, spoken language understanding and generation modules, and a comprehensive dialogue system architecture. We present a Reinforcement Learning-based dialogue system that goes beyond standard rule-based models and computes on-line decisions of the best dialogue moves. The key concept of this work is that we bridge the gap between manually written dialog models (e.g. rule-based) and adaptive computational models such as Partially Observable Markov Decision Processes (POMDP) based dialogue managers. ...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System" Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue SystemSebastian Varges, Silvia Quarteroni, Giuseppe Riccardi, Alexei V. Ivanov, Pierluigi Roberti Department of Information Engineering and Computer Science University of Trento 38050 Povo di Trento, Italy {varges|silviaq|riccardi|ivanov|roberti}@disi.unitn.it Abstract We demonstrate the various parameters that in- fluence the learnt dialogue management policy by Over several years, we have developed an using pre-trained policies (section 4). The appli- approach to spoken dialogue systems that cation domain is a tourist information system for includes rule-based and trainable dialogue accommodation and events in the local area. The managers, spoken language understanding domain of the trained DMs is identical to that of a and generation modules, and a compre- rule-based DM that was used by human users (sec- hensive dialogue system architecture. We tion 2), allowing us to compare the two directly. present a Reinforcement Learning-based The state of the POMDP keeps track of the SLU dialogue system that goes beyond standard hypotheses in the form of domain concepts (10 in rule-based models and computes on-line the application domain, e.g. main activity, star rat- decisions of the best dialogue moves. The ing of hotels, dates etc.) and their values. These key concept of this work is that we bridge values may be abstracted into ‘known/unknown,’ the gap between manually written dia- for example, increasing the likelihood that the sys- log models (e.g. rule-based) and adaptive tem re-visits a dialogue state which it can exploit. computational models such as Partially Representing the verification status of the con- Observable Markov Decision Processes cepts in the state, influences – in combination with (POMDP) based dialogue managers. the user model (section 1.2) and N best hypotheses – if the system learns to use clarification questions.1 Reinforcement Learning-based Dialogue Management 1.1 The exploration/exploitation trade-off in reinforcement learningIn recent years, Machine Learning techniques,in particular Reinforcement Learning (RL), have The RL-DM maintains a policy, an internal databeen applied to the task of dialogue management structure that keeps track of the values (accumu-(DM) (Levin et al., 2000; Williams and Young, lated rewards) of past state-action pairs. The goal2006). A major motivation is to improve robust- of the learner is to optimize the long-term rewardness in the face of uncertainty, for example due by maximizing the ‘Q-Value’ Qπ (st , a) of a policyto speech recognition errors. A further motivation π for taking action a at time t. The expected cu-is to improve adaptivity w.r.t. different user be- mulative value V of a state s is defined recursivelyhaviour and application/recognition environments. as V π (st ) =The Reinforcement Learning framework is attrac- π(st , a) Pst ,st+1 [Rst ,st+1 + γV π (st+1 )]. a ative because it offers a statistical model represent- a st+1ing the dynamics of the interaction between sys-tem and user. This is in contrast to the super- Since an analytic solution to finding an optimalvised learning approach of learning system be- value function is not possible for realistic dialoguehaviour based on a fixed corpus (Higashinaka et scenarios, V (s) is estimated by dialogue simula-al., 2003). To explore the range of dialogue man- tions.agement strategies, a simulation environment is To optimize Q and populate the policy with ex-required that includes a simulated user (Schatz- pected values, the learner needs to explore un-mann et al., 2006) if one wants to avoid the pro- trie ...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System" Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue SystemSebastian Varges, Silvia Quarteroni, Giuseppe Riccardi, Alexei V. Ivanov, Pierluigi Roberti Department of Information Engineering and Computer Science University of Trento 38050 Povo di Trento, Italy {varges|silviaq|riccardi|ivanov|roberti}@disi.unitn.it Abstract We demonstrate the various parameters that in- fluence the learnt dialogue management policy by Over several years, we have developed an using pre-trained policies (section 4). The appli- approach to spoken dialogue systems that cation domain is a tourist information system for includes rule-based and trainable dialogue accommodation and events in the local area. The managers, spoken language understanding domain of the trained DMs is identical to that of a and generation modules, and a compre- rule-based DM that was used by human users (sec- hensive dialogue system architecture. We tion 2), allowing us to compare the two directly. present a Reinforcement Learning-based The state of the POMDP keeps track of the SLU dialogue system that goes beyond standard hypotheses in the form of domain concepts (10 in rule-based models and computes on-line the application domain, e.g. main activity, star rat- decisions of the best dialogue moves. The ing of hotels, dates etc.) and their values. These key concept of this work is that we bridge values may be abstracted into ‘known/unknown,’ the gap between manually written dia- for example, increasing the likelihood that the sys- log models (e.g. rule-based) and adaptive tem re-visits a dialogue state which it can exploit. computational models such as Partially Representing the verification status of the con- Observable Markov Decision Processes cepts in the state, influences – in combination with (POMDP) based dialogue managers. the user model (section 1.2) and N best hypotheses – if the system learns to use clarification questions.1 Reinforcement Learning-based Dialogue Management 1.1 The exploration/exploitation trade-off in reinforcement learningIn recent years, Machine Learning techniques,in particular Reinforcement Learning (RL), have The RL-DM maintains a policy, an internal databeen applied to the task of dialogue management structure that keeps track of the values (accumu-(DM) (Levin et al., 2000; Williams and Young, lated rewards) of past state-action pairs. The goal2006). A major motivation is to improve robust- of the learner is to optimize the long-term rewardness in the face of uncertainty, for example due by maximizing the ‘Q-Value’ Qπ (st , a) of a policyto speech recognition errors. A further motivation π for taking action a at time t. The expected cu-is to improve adaptivity w.r.t. different user be- mulative value V of a state s is defined recursivelyhaviour and application/recognition environments. as V π (st ) =The Reinforcement Learning framework is attrac- π(st , a) Pst ,st+1 [Rst ,st+1 + γV π (st+1 )]. a ative because it offers a statistical model represent- a st+1ing the dynamics of the interaction between sys-tem and user. This is in contrast to the super- Since an analytic solution to finding an optimalvised learning approach of learning system be- value function is not possible for realistic dialoguehaviour based on a fixed corpus (Higashinaka et scenarios, V (s) is estimated by dialogue simula-al., 2003). To explore the range of dialogue man- tions.agement strategies, a simulation environment is To optimize Q and populate the policy with ex-required that includes a simulated user (Schatz- pected values, the learner needs to explore un-mann et al., 2006) if one wants to avoid the pro- trie ...
Tìm kiếm theo từ khóa liên quan:
Combining POMDPs trained User Simulations and Rule-based Dialogue Management Spoken Dialogue System báo cáo khoa học báo cáo ngôn ngữ xử lý ngôn ngữ tự nhiênTài liệu liên quan:
-
63 trang 331 0 0
-
12 trang 319 0 0
-
Phương pháp tạo ra văn bản tiếng Việt có đề tài xác định
7 trang 276 0 0 -
13 trang 268 0 0
-
Báo cáo khoa học Bước đầu tìm hiểu văn hóa ẩm thực Trà Vinh
61 trang 255 0 0 -
Tóm tắt luận án tiến sỹ Một số vấn đề tối ưu hóa và nâng cao hiệu quả trong xử lý thông tin hình ảnh
28 trang 225 0 0 -
Đề tài nghiên cứu khoa học và công nghệ cấp trường: Hệ thống giám sát báo trộm cho xe máy
63 trang 214 0 0 -
NGHIÊN CỨU CHỌN TẠO CÁC GIỐNG LÚA CHẤT LƯỢNG CAO CHO VÙNG ĐỒNG BẰNG SÔNG CỬU LONG
9 trang 214 0 0 -
Giáo trình Lập trình logic trong prolog: Phần 1
114 trang 205 0 0 -
Đề tài nghiên cứu khoa học: Tội ác và hình phạt của Dostoevsky qua góc nhìn tâm lý học tội phạm
70 trang 193 0 0