Notes on AWS, Big Data, Machine Learning and Leadership: Amazon Polly

Thursday, 1 March 2018

Amazon Polly

Overview

Text-To-Speech (TTS) system
Input
- Plain text
- Speech Synthesis Markup Language (SSMS)
Available Voices
- 15+ languages
Output
- MP3
- Ogg
- PCM (IoT or telephony)
Use cases
- TBD

Lexicon

Pronunciation lexicon ("dictionary")
Use cases
- Stylized text ("h4ck3r")
- Acronyms

Speech Mark

Metadata - describe synthesized speech
- Where word/sentence starts or ends
Types
- sentence
- word
- viseme
- ssml ("<mark>")
Use case
- Lip-sync

Phoneme

Basic acoustic unit from which word is formed

Viseme

Represents position of a face
Visual counterpart of a phoneme

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)