Integration of Speech Recognition-based Caption Editing System with Presentation Software

Số trang: 22 Loại file: pptx Dung lượng: 764.36 KB Lượt xem: 2 Lượt tải: 0

10.10.2023

Hỗ trợ phí lưu trữ khi tải xuống: 11,000 VND

Xem trước 3 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Contents "Integration of Speech Recognition-based Caption Editing System with Presentation Software": introduction, preliminary survey and investigation, problems and apparatus, results, summary.
Nội dung trích xuất từ tài liệu:
Integration of Speech Recognition-based Caption Editing System with Presentation SoftwareIntegration of Speech Recognition-based Caption Editing System with Presentation Software HV: BùiVănChung NguyễnQuốcUy 1contents1. Introduction2. Preliminary Survey and Investigation3. Problems and Apparatus4. Results5. Summary 21. Introduction 1.1 Background - Recently an increasing amount of e Learning material including audio and presentation slides is being provided through the Internet or private networks referredtoasintranets. - Manyhearingimpairedpeopleandsenior citizens require captioning to understand 3 suchcontent.- Thepaperintroducethemethodof“IBMCaption Editing System with Presentation Integration (hereafter CESPI)” which is an extension to IBM Caption Editing System (hereafter CES). CESPI completelyincludesallthefunctionswithinCES,but is further extended to include the presentation integrationfunctions.- CESencapsulatesthespeechrecognitionengine for transcribing audio into text (CES Recorder) and also allows various editing features for error 4 correction (CES Master and CES Client). As shown- CESPI integrates presentation software in various waysforboththeCESRecorderandtheCES MasterSystem 5Fig.2.ThesampleoutputofCESPIisshown.Presentationslideimageisonthelefthandside,videoimageisontheupperrighthandandthecaptionison 6thelowerrighthandside.- We also showed how the caption editing steps can beimproved using three major concepts. The three conceptswere “complete audio synchronization”, “completely automaticaudio control”, and “status marking”.- In CES, the output phrases (as candidate caption lines) from thevoice recognition engine are laid out vertically as individuallines along with timestamps. “Complete audiosynchronization” means that the keyboard focus always matchesthe audio replay position. 7- The second concept of “completely automatic audio control”,means that the audio is fully controlled automatically by thesystem. Users are not required to “replay” and “stop” the audiomanually (usually a huge number of times). As the editingbegins, the focus is set on the initial series of words, and theaudio which is associated to that portion is replayedautomatically- The last concept is “status marking”. The unverified linesare automatically distinguished from the corrected lines asshown in Figure 3,in CES, each caption line includes a button 8which is used to mark the status of each caption lineFig.3.ThesampleimageofCESisshown. 9Fig.4.ThefigureshowshowthecaptioneditingtaskusingtheCES.Alltheaudio 10processingisautomaticandusermerelyneedstofocusonmakingthenecessarycorrection.- Presentation software provides many useful features to easily create effective e-Learning contents by the following 2 steps.1. Prepare presentation file by combination of text, pictures, visual layout, and any other provided feature.2. Make oral presentation using the slide showfeature of the presentation software. At the same time record the movie by any video camera and/or oral presentation audio. 112.PreliminarySurveyandInvestigation- The results as shown in Table 1, showed that 66.3% found themultimedia composite either Strongly Agree” or Agree,irrelevant of age group. Sowe concluded that a multimediacomposite is very useful for better understanding in e-Learning. 123.ProblemsandApparatus- Based on the preliminary survey and investigation, weinvestigated the available caption editing tools that generatecaptions from audio, and identified 3 major problems. The threemajor problems between CES and presentation software wereidentified as “Content Layout Definitions”, “Editing FocusLinkage”, and “Exporting to Speaker Notes”- To address these problems, we extended our Caption Editing System (CES) to integrate it with MicrosoftPowerPoint, creating our new Caption Editing System withPresentationIntegration (CESPI). The architecture in terms ofcode interface is shown in Figure 5. 133.ProblemsandApparatusFig. 5. The base platform is Microsoft Windows 2000/XP. User Interfaceof CESPI is built on Visual Basic V6.0. IBM ViaVoice engine control isimplemented by Microsoft Visual C++ 6.0. The interface betweenViaVoice and CESPI isSpeech Manager API (SMAPI) V7.0. Also, theinterface between CESPI and Microsoft PowerPoint is Visual Basicfor Application (VBA) V6.0. 143.ProblemsandApparatus Fig.7.ThefigureshowstheChangeContentLayout dialogonthelefthandsideandthe 15 SelectLayoutVideo+PPT+Captiondialogwiththe focusontherighthandside3.ProblemsandApparatus 3.1EditingFocusLinkage ...