Thursday, 1 March 2018

Amazon Polly

Overview
  • Text-To-Speech (TTS) system
  • Input
    • Plain text
    • Speech Synthesis Markup Language (SSMS)
  • Available Voices
    • 15+ languages
  • Output
    • MP3
    • Ogg
    • PCM (IoT or telephony)
  • Use cases
    • TBD

Lexicon
  • Pronunciation lexicon ("dictionary")
  • Use cases
    • Stylized text ("h4ck3r")
    • Acronyms

Speech Mark
  • Metadata - describe synthesized speech
    • Where word/sentence starts or ends
  • Types
    • sentence
    • word
    • viseme
    • ssml ("<mark>")
  • Use case
    • Lip-sync

Phoneme
  • Basic acoustic unit from which word is formed

Viseme
  • Represents position of a face
  • Visual counterpart of a phoneme

No comments:

Post a Comment