Báo cáo khoa học: Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System

Số trang: 4 Loại file: pdf Dung lượng: 2.09 MB Lượt xem: 6 Lượt tải: 0

Thư Viện Số

Hỗ trợ phí lưu trữ khi tải xuống: miễn phí

Báo xấu

Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Over several years, we have developed an approach to spoken dialogue systems that includes rule-based and trainable dialogue managers, spoken language understanding and generation modules, and a comprehensive dialogue system architecture. We present a Reinforcement Learning-based dialogue system that goes beyond standard rule-based models and computes on-line decisions of the best dialogue moves. The key concept of this work is that we bridge the gap between manually written dialog models (e.g. rule-based) and adaptive computational models such as Partially Observable Markov Decision Processes (POMDP) based dialogue managers. ...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System" Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue SystemSebastian Varges, Silvia Quarteroni, Giuseppe Riccardi, Alexei V. Ivanov, Pierluigi Roberti Department of Information Engineering and Computer Science University of Trento 38050 Povo di Trento, Italy {varges|silviaq|riccardi|ivanov|roberti}@disi.unitn.it Abstract We demonstrate the various parameters that in- ﬂuence the learnt dialogue management policy by Over several years, we have developed an using pre-trained policies (section 4). The appli- approach to spoken dialogue systems that cation domain is a tourist information system for includes rule-based and trainable dialogue accommodation and events in the local area. The managers, spoken language understanding domain of the trained DMs is identical to that of a and generation modules, and a compre- rule-based DM that was used by human users (sec- hensive dialogue system architecture. We tion 2), allowing us to compare the two directly. present a Reinforcement Learning-based The state of the POMDP keeps track of the SLU dialogue system that goes beyond standard hypotheses in the form of domain concepts (10 in rule-based models and computes on-line the application domain, e.g. main activity, star rat- decisions of the best dialogue moves. The ing of hotels, dates etc.) and their values. These key concept of this work is that we bridge values may be abstracted into ‘known/unknown,’ the gap between manually written dia- for example, increasing the likelihood that the sys- log models (e.g. rule-based) and adaptive tem re-visits a dialogue state which it can exploit. computational models such as Partially Representing the veriﬁcation status of the con- Observable Markov Decision Processes cepts in the state, inﬂuences – in combination with (POMDP) based dialogue managers. the user model (section 1.2) and N best hypotheses – if the system learns to use clariﬁcation questions.1 Reinforcement Learning-based Dialogue Management 1.1 The exploration/exploitation trade-off in reinforcement learningIn recent years, Machine Learning techniques,in particular Reinforcement Learning (RL), have The RL-DM maintains a policy, an internal databeen applied to the task of dialogue management structure that keeps track of the values (accumu-(DM) (Levin et al., 2000; Williams and Young, lated rewards) of past state-action pairs. The goal2006). A major motivation is to improve robust- of the learner is to optimize the long-term rewardness in the face of uncertainty, for example due by maximizing the ‘Q-Value’ Qπ (st , a) of a policyto speech recognition errors. A further motivation π for taking action a at time t. The expected cu-is to improve adaptivity w.r.t. different user be- mulative value V of a state s is deﬁned recursivelyhaviour and application/recognition environments. as V π (st ) =The Reinforcement Learning framework is attrac- π(st , a) Pst ,st+1 [Rst ,st+1 + γV π (st+1 )]. a ative because it offers a statistical model represent- a st+1ing the dynamics of the interaction between sys-tem and user. This is in contrast to the super- Since an analytic solution to ﬁnding an optimalvised learning approach of learning system be- value function is not possible for realistic dialoguehaviour based on a ﬁxed corpus (Higashinaka et scenarios, V (s) is estimated by dialogue simula-al., 2003). To explore the range of dialogue man- tions.agement strategies, a simulation environment is To optimize Q and populate the policy with ex-required that includes a simulated user (Schatz- pected values, the learner needs to explore un-mann et al., 2006) if one wants to avoid the pro- trie ...