Natural Conversations

HyperSkill bridges the gap between traditional learning and interactive experiences with its Speech-to-Text (ASR) and Text-to-Speech (TTS) features. Let's explore how these functionalities can elevate your VR/AR/Web/Desktop simulations.

Speech-to-Text Transcription (ASR): Capture Learner Speech

  • Dialogue Recording: HyperSkill's ASR feature automatically captures and transcribes user speech within your simulations. Captured speech can be used to branch in dialogue using trigger utterances or roleplaying using Chit-Chat.

  • Analyze Trainee Communication: All recorded dialogue will appear in the HyperSkill dashboard for future review. Authors can evaluate responses, identify areas for improvement, or assess communication styles at any time.

In addition to spoken dialogue, HyperSkill also supports text-based input on desktop and web. For more information, visit the Desktop page.

Activating Speech-to-Text in HyperSkill

To enable Speech-to-Text (ASR) and capture spoken dialogue within your simulations, follow these steps:

  1. Enter Edit Mode: In HyperSkill Desktop or Web, navigate to your simulations list and enter edit mode for the simulation you want to add ASR.

  2. Access Settings: Locate the settings menu within the tabs bar.

  3. Enable Microphone Input: Within the settings menu, find the options for microphone input labeled "Microphone". Ensure the toggles for "Enable Microphone Input" and "Microphone Always Listening" are switched on.

Once you've enabled these settings, HyperSkill will be ready to capture spoken dialogue within your simulations using the device's microphone.

Text-to-Speech Generation (TTS): Breathe Life into Virtual Characters

  • Voice Customization: HyperSkill offers a variety of voice options to choose from. Select a voice that best suits the character's personality, gender, and the overall tone of your simulation. (Free plan limitations apply)

Using Text-to-Speech in HyperSkill:

Text-to-Speech functionality is integrated with various HyperSkill state actions. These actions allow you to trigger speech generation at specific points within your simulation. For example, you could use a Text-to-Speech action to:

  • Have a virtual instructor deliver introductory remarks.

  • Make characters react to learner choices with spoken feedback.

  • Provide audio cues and instructions throughout the simulation.

Supported Text-to-Speech Technologies:

HyperSkill offers two TTS options depending on your subscription plan:

  • Google Text-to-Speech: This service is available to all users and provides high-quality, natural-sounding voices in English, Spanish, and Hindi.

  • ElevenLabs Text-to-Speech (Paid Plans): Upgrade your plan to access ElevenLabs, which offers high-quality voices across multiple languages.

Configuring Text-to-Speech Settings:

You can define the Text-to-Speech engine (Google or ElevenLabs) and the spoken language within your simulation settings. Navigate to Settings > Conversational AI in edit mode to access these options.

Fine-tuning Speech with SSML:

HyperSkill's TTS supports Speech Synthesis Markup Language (SSML). This allows you to add specific instructions for the TTS engine, further customizing the generated speech. With SSML, you can control aspects like:

  • Speech rate: Adjust the speed of the narration to match your simulation's pace.

  • Pitch: Modify the character's vocal pitch to create a more distinct personality.

  • Emphasis: Highlight specific words or phrases for dramatic effect.

  • Pauses: Introduce pauses for a more natural flow of conversation.

You can visit the following link to review supported SSML elements: Google SSML elements. You do not need to add the <speak> tag.

Voice options in HyperSkill:

You can choose from any of the following voices in HyperSkill:

  • Matthew

  • Justin

  • Joey

  • Salli

  • Kimberly

  • Kendra

  • Joanna

  • Ivy

  • Brian

  • Amy

  • Emma

  • Aditi

  • Raveena

  • Russell

  • Nicole

In HyperSkill, when selecting voices for ElevenLabs, you may notice duplicates. This occurs because ElevenLabs offers fewer voice options compared to Google. As a result, some voices appear identical to each other, offering equivalent choices. The following voices are duplicates:

  • Kendra, Joanna, Ivy, Amy, Emma, Aditi, and Raveena

  • Joey and Russell

Tips and Common Errors

  • Experiment with different voice options to find the perfect fit for your characters. (ElevenLabs plan limitations apply)

  • Preview your TTS implementation to ensure the audio is correctly generated at experience time. Text-to-speech may sometimes break on the apostrophe ' character. This may occur if text was pasted instead of typed into HyperSkill. To fix the issue, delete and retype the apostrophe character.

  • Consider the pacing and intonation of the generated speech for optimal impact.

  • Use SSML tags to fine-tune the speech and create a more nuanced performance by your virtual characters.

Last updated