Overview
- Text-To-Speech (TTS) system
- Input
- Plain text
- Speech Synthesis Markup Language (SSMS)
- Available Voices
- 15+ languages
- Output
- MP3
- Ogg
- PCM (IoT or telephony)
- Use cases
- TBD
Lexicon
- Pronunciation lexicon ("dictionary")
- Use cases
- Stylized text ("h4ck3r")
- Acronyms
Speech Mark
- Metadata - describe synthesized speech
- Where word/sentence starts or ends
- Types
- sentence
- word
- viseme
- ssml ("<mark>")
- Use case
- Lip-sync
Phoneme
- Basic acoustic unit from which word is formed
Viseme
- Represents position of a face
- Visual counterpart of a phoneme
No comments:
Post a Comment