BOOK III HEARSAY TEXT-TO-SPEECH SYNTHESIS TABLE OF CONTENTS HOW TO USE TEXT-TO-SPEECH SYNTHESIS MANUAL PART I - USER'S GUIDE SECTION 1. INTRODUCTION 1a. HEARSAY VERSIONS 1b. SPEECH TAILORING SECTION 2. HEARSAY TEXT-TO-SPEECH SOFTWARE SET UP 2a. SELECTING A HEARSAY KEY SECTION 3. HEARSAY FUNCTIONS & MENU SYSTEM 3a. MAIN MENU 3b. SCREEN ECHO 3c. KEYBOARD ECHO 3d. READ SCREEN 3e. MISCELLANEOUS KEYS 1. TOGGLE MENU KEYS 2. TOGGLE MENU SCREENS 3. TOGGLE STATUS 4. NEW HEARSAY KEY 5. INITIALIZE HEARSAY 6. UNHOOK HEARSAY 3f. FILES MENU SECTION 4. DICTIONARY & DICTIONARY USAGE 4a. USING THE SPEECH EDITOR 4b. TO CREATE OR MODIFY A DICTIONARY 4c. DICTIONARY SPACE SECTION 5. USING SETUP COMMANDS 5a. MENU COLORS 5b. HEARSAY KEY 5c. RD-SCAN PROGRAM 5d. ENABLE SCREEN ECHO 5e. SET SCREEN ECHO WINDOW 5f. MENU HELP LEVEL 5g. HEARSAY VERSION 5h. ENTERING MULTIPLE COMMANDS 5i. BATCH PROGRAMS SECTION 6. HEARSAY PROGRAMS 6a. HS.BAT 6b. DEMOV2.BAT 6c. DEMOV3.BAT 6d. HEARSAY.EXE 6e. RD-SCAN.EXE 6f. SP-EDIT.EXE 6g. SPEECHV2.EXE 6h. SPEECHV3.EXE 6i. DEMO.SD2 6j. DEMO.SD3 6k. HSGOLDDR.BAS 6l. READ.ME 6m. README.TXT 6n. PRINTME.BAT 6o. README.BAT PART II PROGRAMMER'S GUIDE INTRODUCTION CALLING HEARSAY FUNCTIONS FROM BASIC SPEAK A LINE OF TEXT PROGRAMMING IN BASIC PROGRAMMING IN ASSEMBLER SET SCREEN ECHO PARAMETERS PROGRAMMING IN BASIC PROGRAMMING IN ASSEMBLER SET WINDOW PROGRAMMING IN BASIC PROGRAMMING IN ASSEMBLER UNHOOK HEARSAY GOLD FROM DOS PROGRAMMING IN BASIC PROGRAMMING IN ASSEMBLER LOAD DICTIONARY PROGRAMMING IN BASIC PROGRAMMING IN ASSEMBLER GET VERSION NUMBER PROGRAMMING IN BASIC PROGRAMMING IN ASSEMBLER APPENDICES APPENDIX A SAMPLE DICTIONARY SESSION APPENDIX B HEARSAY PHONEMES APPENDIX C READ SCREEN COMMANDS APPENDIX D HEARSAY SETUP COMMANDS APPENDIX E SCREEN COLOR CODES HOW TO USE TEXT-TO-SPEECH SYNTHESIS BOOK This book is divided into two seperate parts, A User's Guide and a Programmer's Reference. PART I - USER'S GUIDE This section explains how to install the Hearsay Text-To-Speech software and how to get started using it. Because Hearsay is a menu-driven utility, operation is mostly a matter of reading the menus and doing the obvious. The User's Guide tells you what you can do, fully describes all menu selections, and shows how to invoke Hearsay functions from setup strings without going through the menus. The User's Guide also includes technical information about the way the Hearsay program works and how it interacts with other programs. PART II - PROGRAMMER'S GUIDE For users who wish to call Hearsay Speech Synthesis functions directly from programs they have written, without using the menus. A BASIC interface driver is included on the Hearsay Gold disks, and Part II includes examples of BASIC program segments that can be used to call specific Hearsay functions, and 8086/8088 assembler listings for each function. PART I - HEARSAY USER'S GUIDE IA. HEARSAY VERSIONS The Hearsay Gold diskettes contain two versions of the Hearsay program, Version 2 & Version 3. Version 2 can be used for IBM PC'S, XT's and compatibles Version 3, a more powerful program producing more realistic speech, can only be used if you have an IBM AT or compatible. Hearsay uses an in-memory program that runs at the same time as whatever application program you are using, and utilizes memory in addition to whatever memory your application program requires. Version 2 uses an additional 163K of memory and Version 3 uses an additional 217K. To use these versions, you must have that much EXTRA memory - in addition to whatever is required by the program you are using. IB. SPEECH TAILORING Both Hearsay speech synthesis versions may be tailored in two ways, by customizing the pronunciation of individual words (Dictionary) and by modifying the Voice, Pitch, and Speed of the speech Hearsay uses. Creation of a special dictionary must be done with the Hearsay program not loaded as described below (See Section 4 Dictionary). Voice modification (Change Voice, Change Pitch, Change Speed) can be done from the Keyboard Echo or Screen Echo menus, or by setup string. Whatever the speech variables are set to, they will revert to their default values when the program is rerun (Unless they are re-invoked by a setup string that calls the program) or when Hearsay is re-initialized). There are two voices available to Hearsay, a lower-sounding voice (Voice 1 the default) and a higher sounding voice (Voice 2). There are nine possible pitches (1 through 9, with 9 the highest) and nine speeds (1 through 9, with 9 the fastest). In both cases the default value is 6. 2. HEARSAY TEXT-TO-SPEECH SOFTWARE SETUP Once the Hearsay programs are loaded, it will stay in memory until it is "Unhooked" or the PC is rebooted. If you want to use Hearsay every time you use your PC, Section 5 tells how to load and configure the software automatically from a .BAT file. Section 6 lists all the speech synthesis files on the Hearsay disks & explains the purpose of each. With the Hearsay disk in drive A or with Hearsay as your default hard disk directory, TYPE HEARSAY/V2 [ENTER]. If you are running Version 2 or TYPE HEARSAY/V3 [ENTER] if you are running Version 3. If you just type HEARSAY [ENTER] Version 2 will automatically be loaded to PC's, XT's and compatibles, and Version 3 to AT's and compatibles. When the DOS prompt returns, Type: SPEECH [ENTER] EDITOR [ENTER] Hearsay will prompt "Press the HEARSAY key.". This key is used to pop up the Speech Synthesis menus and should be a key that is not normally used by any other program. When the [ALT] key and the Hearsay Key are pressed together, the Speech Synthesis Main Menu will appear on the screen. From this menu you can activate the various functions of the Hearsay Gold Speech Synthesis features. 2A. SELECTING A HEARSAY KEY Do not use Function keys 1 through 7 as your Hearsay Key because these are used by the Hearsay Gold program. We also recommend that you do not use any of the alphabet keys or the number keys. But if your keyboard has a numeric keypad which duplicates the number keys, you can use any of them as the Hearsay Key. Hearsay can distinguish between the keys on the main keyboard and those on the numeric keypad. But remember, even if the key is duplicated on the keyboard, only the key you specify as the Hearsay Key, not the duplicate, will get you in and out of the Speech Synthesis menus. Once installed, the Hearsay menus can be invoked at any time by pressing the ALT key together with your Hearsay Key. 2B. DEMO PROGRAMS A demo program, DEMOV2 (For Version 2) and DEMOV3 (For Version 3) has been included that illustrates Hearsay's features. To load, simply type DEMOV2 for Version 2 or DEMOV3 for Version 3. 3. HEARSAY FUNCTIONS AND MENU SYSTEM The Hearsay Gold has powerful speech synthesis generation capabilities, & the Speech Synthesis menus & commands are tools for using these capabilities effectively. Hearsay is a memory resident program, & once installed it remains in your computer's memory - even when you run other programs - until you reboot your system. Because of this it can be used with other programs, even though they were not designed for voice interaction. Hearsay can create speech by reading words from the screen or characters from the keyboard. When reading from the screen, Hearsay can either read text as it is written to the screen (Screen Echo) or read text already displayed on the screen (Read Screen). Whatever the source of the text, the pronunciation of the words & the tone, pitch & speed of the voice can be customized by the user. The most common way of using Hearsay is from the Hearsay menus. These are easy to use and provide effective access to all Hearsay functions except for dictionary creation, which is done with the SP-EDIT program. For more information (See Section 4 DICTIONARY). 3A. HEARSAY GOLD MAIN MENU To access the Hearsay menus, press the ALT key together with the key you defined as your Hearsay Key when you loaded the Hearsay program. Hearsay will speak HEARSAY GOLD MAIN MENU and pop up the Hearsay Main Menu. A status window at the top of each menu shows the ON-OFF status of the major switches - for the Main Menu these are Screen Echo and Keyboard Echo. The lower window of the Main Menu offers five other menus (If DOS is not ready for file functions, the Main Menu will also offer a choice of F7 - Return When DOS Is Not Busy). The following choices are always available from the Main Menu: F1 - Voice Commands Menu (Not active, for future expansion). F2 - Screen Echo Menu F3 - Keyboard Echo Menu F4 - Read Screen Menu F5 - Miscellaneous Menu F6 - Files Menu To access a menu, press its associated function key. To "exit" a menu selection to the previous menu or out of the Hearsay menus altogether - press the SPACEBAR or the Hearsay key. 3B. SCREEN ECHO MENU enables you to: - turn Screen Echo on & off - tell Hearsay to echo the screen line by line - tell Hearsay to echo the screen sentence by sentence - tell Hearsay to speak or ignore punctuation characters - change voice, pitch, and speed of Hearsay speech - tell Hearsay which part of the screen to echo When the Screen Echo is turned on, Hearsay reads characters as they are written to the screen, translates them into words and speaks them. Characters may be echoed from anywhere on the screen or from a selected portion of it. In the default mode, the text is not spoken until a terminating punctuation mark is written to the screen. A terminating punctuation mark is a period, colon, semicolon, question mark or exclamation point. These defaults may be altered by menu selections (Toggle line mode, Toggle punctuation) or preselected by setup strings. NOTE: Screen Echo monitors text being printed to the screen by trapping INT 10, (The video interrupt). As long as a program uses INT 10 to print its text, Hearsay can speak it aloud. Unfortunately, there are some programs, particularly certain word processors and spreadsheets, that write their text characters directly to the video memory. For these programs, the Hearsay Gold cannot detect when text is being printed to the screen and therefore cannot speak the text. Fortunately, you can always use the Read Screen option if for some reason you need this text spoken. The default Screen Echo Speaks all text appearing on the screen. To speak only text appearing in a certain part of the screen, a window may be set (Set Window) describing the screen area to be spoken. The window is defined by its top and bottom rows and Hearsay may be toggled to speak only what is inside the window, or what is outside of it. In line mode the text is spoken when the cursor moves to a new line. Most programs terminate their sentences with a period, and for these programs line mode should be off. In case you are using a program that does not terminate sentences, then line mode should be on. SCREEN ECHO MENU CHOICES F1 - TOGGLE SCREEN ECHO This toggles Screen Echo On and OFF. When Screen Echo is ON, all text printed to the screen (Within the designated window) will be spoken by Hearsay. F2 - TOGGLE LINE MODE This will cause text to be spoken whenever the cursor moves to a new line. If this option is not set, a line of text will be spoken only when the line is terminated by a terminating punctuation mark (Colon, semi-colon, question mark, exclamation point or period). If you are executing a program which terminates lines with periods, the line mode option should be turned off. F3 - TOGGLE PUNCTUATION Normally on. Words in capitals will be spelled out. Pressing [F3] from this menu will cause the words in capitals to be spoken when they are displayed on the screen while Screen Echo is on. When this feature is ON, pressing [F3] from this menu will turn it OFF. F4 - CHANGE VOICE Hearsay has two voices, a lower-sounding voice (Voice 1 the default) and a higher-sounding voice (Voice 2). To change the voice, simply enter the new voice number followed by [ENTER]. F5 - CHANGE PITCH Hearsay allows nine (9) different pitches (1 to 9 with 9 being the highest default is 7). Pitch is changed by typing in the desired value followed by [ENTER]. F6 - CHANGE SPEED Hearsay allows for nine (9) speeds (1 to 9 with 9 being the fastest default). Speed is changed by typing in the desired value followed by [ENTER]. F7 - SET WINDOW Hearsay's Screen Echo option allows text to be spoken from anywhere on the screen, or from only one part of it. The Set Window Menu is used to control this option. SET WINDOW MENU CHOICES - FROM SCREEN ECHO MENU F1 - CHANGE TOP LINE The top row of the speech window is normally set to 1, but this function allows you to set it any row from 1 to 25. F2 - CHANGE BOTTOM LINE The bottom row of the speech window is normally set to 25, but this function allows you to set it any row from 1 to 25. F3 - TOGGLE WINDOW MODE Normally the text is spoken inside a speech window. However, this function allows you to toggle between having text inside or outside the window spoken. 3c. KEYBOARD ECHO MENU - enables you to: - turn keyboard echo on and off - change voice, pitch or speed of Hearsay speech (For keyboard echo only) When Keyboard Echo is selected, each key is spoken as it is struck. No effort is made to translate the keystrokes into words, the names of the individual keys are spoken. This feature is useful for someone who is learning to touch type, for children learning to recognize letters, or for anyone who just wants the keys to be spoken. NOTE: Hearsay's Keyboard Echo works by trapping INT 16 (The keyboard interrupt) When a program wants to get input from the keyboard, it will normally call INT 16. When this happens, Hearsay will check if a key was pressed, and if so will speak it. As long as a program uses INT 16 to read the keyboard, Hearsay can echo the keys. As far as we know, all programs that run under MSDOS use INT 16. KEYBOARD ECHO MENU CHOICES F1 - TOGGLE KEYBOARD ECHO This function toggles Keyboard Echo ON and OFF. When Keybaord Echo is ON, every key pressed will be spoken by Hearsay. F2 - CHANGE VOICE Hearsay has two voices, a lower sounding voice (Voice 1 is default) and a higher sounding voice (Voice 2). To change the voice, simply enter the new voice number followed by [ENTER]. This change will only affect the Keyboard Echo voice, not the Screen Echo or Read Screen voices. F3 - CHANGE PITCH Hearsay allows ten (10) different pitches (1 through 10, 10 is the highest & default). Pitch is changed by typing in the desired value followed by [ENTER] This change will only affect the Keyboard Echo voice, not the Screen Echo or Read Screen voices. F4 - CHANGE SPEED Hearsay allows for nine (9) speeds (1 through 9, 9 is the fastest, 7 is default). The speed is changed by typing in the desired value followed by [ENTER]. This change will only affect the Keyboard Echo voice, not the Screen Echo or Read Screen voices. 3d. READ SCREEN MENU Enables you to have Hearsay read letters, words, lines or the entire screen to you after it has been written to the screen. The Read Screen option allows Hearsay to read text that has already been written to the screen under control of keyboard commands. There is a special Hearsay cursor, distinct from the flashing program cursor, which starts out in the same location as the program cursor, but can be moved around seperately from the program cursor, used to point to the area of the screen to be read. Individual characters, words or lines can be read, or the entire screen from the cursor position on. In Read Screen mode the four cursor keys and the [PAGE UP], [PAGE DOWN], [HOME], and [END] keys are used to control cursor movement, and Hearsay can also speak the location of the cursor. When a function key is selected at the Read Screen Menu, the menu box disappears and the program screen reappears so that it can be read. Although the box will no longer be shown, the function keys described on the menus are still active, however, the special Read Screen cursor can be moved around the screen, and segments of text read, without the menu being displayed (See appendix E for a summary of Read Screen commands). Pressing the [SPACE BAR] brings back the Read Screen Menu. Pressing the Hearsay key returns you to the Hearsay Main Menu. NOTE: Because of the technique used for reading the characters on the screen, this method will usually work with programs where text is not available to the Screen Echo function. Read Screen reads text characters printed on the screen by reading the ASCII characters in the display memory. In graphics mode, the IBM PC's display memory contains the binary representation of the screen's pixel map rather than ASCII characters, and is therefore unavailable to Read Screen. Graphics mode characters can be spoken by Screen Echo, which traps the INT 10 video interrupt when the characters are written to the screen. Read Screen reads and speaks text already written to the screen under control of keyboard commands. This is particularly useful for applications such as reading word processing documents. Read Screen can not read graphics mode characters, while Screen Echo can. READ SCREEN MENU CHOICES F1 - READ SCREEN STARTING AT CURSOR Reads and speaks the contents of the screen starting at the location of the Hearsay cursor. F2 - READ LINE Reads and speaks the line the Hearsay cursor is on. F3 - READ WORD Reads and speaks the word the Hearsay cursor is on. F4 - SPELL WORD Spells out the word the Hearsay cursor is on. F5 - READ CHARACTER Reads and speaks the character the Hearsay cursor is on, then moves the Hearsay cursor one position to the right. F6 - LOCATION OF HEARSAY CURSOR Speaks the location of the Hearsay cursor. F7 - GO TO THE END OF CURRENT WORD. Moves the Hearsay cursor to the beginning of the next word. In addition to the functions of the Read Screen Menu described above, the following functions are also available to you: [SHIFT-F3] Reads the word the Hearsay cursor is on and then moves the Hearsay cursor to the next word. [PAGE UP] Moves the Hearsay cursor up 6 lines. [PAGE DOWN] Moves the Hearsay cursor down 6 lines. [HOME] Moves the Hearsay cursor to line 1, column 1. [END] Moves the Hearsay cursor to line 25, column 1. [UP ARROW] Moves the Hearsay cursor up 1 line. [DOWN ARROW] Moves the Hearsay cursor down 1 line. [LEFT ARROW] Moves the Hearsay cursor one character to the left. [RIGHT ARROW] Moves the Hearsay cursor one character to the right. [CTR-LEFT ARROW] Moves the Hearsay cursor to the beginning of the line. 3e. MISCELLANEOUS MENU CHOICES - enables you to: - have Hearsay speak keys as they are struck in menu mode. - have Hearsay read and speak the menu screens. - have Hearsay speak the menu status lines. - select a new Hearsay key. - reinitialize Hearsay. - unload Hearsay and release system RAM. MISCELLANEOUS MENU CHOICES F1 - TOGGLE MENU KEYS Hearsay does not normally speak the keys you press when you are in the Hearsay Menus. This toggle switch allows you to speak the menu keys. F2 - TOGGLE MENU SCREENS Normally, Hearsay only speaks the title of each menu as it pops up. This toggle switch will allow you to have Hearsay also speak all of the commands on a menu. F3 - TOGGLE STATUS The top part of the Hearsay menu is called the Status Area. This is where Hearsay shows you information about the menu you are in. You can also have Hearsay speak the contents of this area each time you enter a menu just by pressing this function key. You can toggle this feature ON and OFF. F4 - NEW HEARSAY KEY This function key enables you to reset the Hearsay Key. You must be very careful NOT to choose F1 through F7 as your Hearsay Key since these function keys are used by the Hearsay menus. Also, you do not want to pick a Hearsay Key which might be used in a speech recognition command sequence. After selecting a new Hearsay Key, it becomes the key you have to use to enter the Hearsay Main Menu. F5 - INITIALIZE HEARSAY Hearsay can be re-initialized back to its default values, with the Hearsay memory cleared by the Initialize command. Pressing [F5] will restore the Voice, Pitch, and Speed of the Screen Echo and Keyboard Echo and Voice Commands and erases all voice commands in memory. F6 - UNHOOK HEARSAY When you are through using Hearsay you can reclaim all the RAM reserved for it by pressing [F6]. When Hearsay is "Unhooked", the memory it was using is freed for other uses. This will remove all "Hooks" to your MSDOS System, restoring your computer to the state it was in before Hearsay was installed. Another way to unhook is with the runtime command, HEARSAY/X. To use Hearsay again after unhooking, it is necessary to re-install it again by running the HEARSAY, SPEECH, and EDITOR programs.