• Español – América Latina
  • Português – Brasil
  • Tiếng Việt

Using the Speech-to-Text API with Node.js

1. overview.

Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API.

In this codelab, you will focus on using the Speech-to-Text API with Node.js. You will learn how to send an audio file in English and other languages to the Cloud Speech-to-Text API for transcription.

What you'll learn

  • How to enable the Speech-to-Text API
  • How to Authenticate API requests
  • How to install the Google Cloud client library for Node.js
  • How to transcribe audio files in English
  • How to transcribe audio files with word timestamps
  • How to transcribe audio files in different languages

What you'll need

  • A Google Cloud Platform Project
  • A Browser, such Chrome or Firefox
  • Familiarity using Javascript/Node.js

How will you use this tutorial?

How would you rate your experience with node.js, how would you rate your experience with using google cloud platform services, 2. setup and requirements, self-paced environment setup.

  • Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one .)

dMbN6g9RawQj_VXCSYpdYncY-DbaRzr2GbnwoV7jFf1u3avxJtmGPmKpMYgiaMH-qu80a_NJ9p2IIXFppYk8x3wyymZXavjglNLJJhuXieCem56H30hwXtd8PvXGpXJO9gEUDu3cZw

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID .

  • Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost much, if anything at all. Be sure to to follow any instructions in the "Cleaning up" section which advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell , a command line environment running in the Cloud.

Activate Cloud Shell

H7JlbhKGHITmsxhQIcLwoe5HXZMhDlYue4K-SPszMxUxDjIeWfOHBfxDHYpmLQTzUmQ7Xx8o6OJUlANnQF0iBuUyfp1RzVad_4nCa0Zz5LtwBlUZFXFCWFrmrWZLqg1MkZz2LdgUDQ

If you've never started Cloud Shell before, you'll be presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:

kEPbNAo_w5C_pi9QvhFwWwky1cX8hr_xEMGWySNIoMCdi-Djx9AQRqWn-__DmEpC7vKgUtl-feTcv-wBxJ8NwzzAp7mY65-fi2LJo4twUoewT1SUjd6Y3h81RG3rKIkqhoVlFR-G7w

It should only take a few moments to provision and connect to Cloud Shell.

pTv5mEKzWMWp5VBrg2eGcuRPv9dLInPToS-mohlrqDASyYGWnZ_SwE-MzOWHe76ZdCSmw0kgWogSJv27lrQE8pvA5OD6P1I47nz8vrAdK7yR1NseZKJvcxAZrPb8wRxoqyTpD-gbhA

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.

Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.

  • Run the following command in Cloud Shell to confirm that you are authenticated:

Command output

If it is not, you can set it with this command:

3. Enable the Speech-to-Text API

Before you can begin using the Speech-to-Text API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:

4. Authenticate API requests

In order to make requests to the Speech-to-Text API, you need to use a Service Account . A Service Account belongs to your project and it is used by the Google Client Node.js library to make Speech-to-Text API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the Cloud SDK to create a service account and then create credentials you will need to authenticate as the service account.

First, set an environment variable with your PROJECT_ID which you will use throughout this codelab, if you are using Cloud Shell this will be set for you:

Next, create a new service account to access the Speech-to-Text API by using:

Next, create credentials that your Node.js code will use to login as your new service account. Create these credentials and save it as a JSON file ~/key.json by using the following command:

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Speech-to-Text API Node.js library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:

You can read more about authenticating the Speech-to-Text API .

5. Install the Google Cloud Speech-to-Text API client library for Node.js

First, create a project that you will use to run this Speech-to-Text API lab, initialize a new Node.js package in a folder of your choice:

NPM asks several questions about the project configuration, such as name and version. For each question, press ENTER to accept the default values. The default entry point is a file named index.js .

Next, install the Google Cloud Speech library to the project:

For more instructions on how to set up a Node.js development for Google Cloud please see the Setup Guide .

Now, you're ready to use Speech-to-Text API!

6. Transcribe Audio Files

In this section, you will transcribe a pre-recorded audio file in English. The audio file is available on Google Cloud Storage.

Navigate to the index.js file inside the and replace the code with the following:

Take a minute or two to study the code and see it is used to transcribe an audio file*.*

The Encoding parameter tells the API which type of audio encoding you're using for the audio file. Flac is the encoding type for .raw files (see the doc for encoding type for more details).

In the RecognitionAudio object, you can pass the API either the uri of our audio file in Cloud Storage or the local file path for the audio file. Here, we're using a Cloud Storage uri.

Run the program:

You should see the following output:

7. Transcribe with word timestamps

Speech-to-Text can detect time offset (timestamp) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.

Take a minute or two to study the code and see it is used to transcribe an audio file with word timestamps*.* The EnableWordTimeOffsets parameter tells the API to enable time offsets (see the doc for more details).

Run your program again:

8. Transcribe different languages

Speech-to-Text API supports transcription in over 100 languages! You can find a list of supported languages here .

In this section, you will transcribe a pre-recorded audio file in French. The audio file is available on Google Cloud Storage.

Run your program again and you should see the following output:

This is a sentence from a popular French children's tale .

For the full list of supported languages and language codes, see the documentation here .

9. Congratulations!

You learned how to use the Speech-to-Text API using Node.js to perform different kinds of transcription on audio files!

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  • Go to the Cloud Platform Console .
  • Select the project you want to shut down, then click ‘Delete' at the top: this schedules the project for deletion.
  • Google Cloud Speech-to-Text API: https://cloud.google.com/speech-to-text/docs
  • Node.js on Google Cloud Platform: https://cloud.google.com/nodejs/
  • Google Cloud Node.js client: https://googlecloudplatform.github.io/google-cloud-node/

This work is licensed under a Creative Commons Attribution 2.0 Generic License.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Google Cloud Platform logo

Google Cloud Text-to-Speech: Node.js Client

Cloud Text-to-Speech API client for Node.js

A comprehensive list of changes in each version may be found in the CHANGELOG .

  • Google Cloud Text-to-Speech Node.js Client API Reference
  • Google Cloud Text-to-Speech Documentation
  • github.com/googleapis/google-cloud-node/packages/google-cloud-texttospeech

Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained .

Table of contents:

Before you begin

Installing the client library, using the client library, contributing.

  • Select or create a Cloud Platform project .
  • Enable billing for your project .
  • Enable the Google Cloud Text-to-Speech API .
  • Set up authentication with a service account so you can access the API from your local workstation.

Samples are in the samples/ directory. Each sample's README.md has instructions for running its sample.

Sample Source Code Try it
Text_to_speech.list_voices
Text_to_speech.streaming_synthesize
Text_to_speech.synthesize_speech
Text_to_speech_long_audio_synthesize.synthesize_long_audio
Text_to_speech.list_voices
Text_to_speech.streaming_synthesize
Text_to_speech.synthesize_speech
Text_to_speech_long_audio_synthesize.synthesize_long_audio
Quickstart

The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.

Supported Node.js Versions

Our client libraries follow the Node.js release schedule . Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.

Google's client libraries support legacy versions of Node.js runtimes on a best-efforts basis with the following warnings:

  • Legacy versions are not tested in continuous integration.
  • Some security patches and features cannot be backported.
  • Dependencies cannot be kept up-to-date.

Client libraries targeting some end-of-life versions of Node.js are available, and can be installed through npm dist-tags . The dist-tags follow the naming convention legacy-(version) . For example, npm install @google-cloud/text-to-speech@legacy-8 installs client libraries for versions compatible with Node.js 8.

This library follows Semantic Versioning .

This library is considered to be stable . The code surface will not change in backwards-incompatible ways unless absolutely necessary (e.g. because of critical security issues) or with an extensive deprecation period. Issues and requests against stable libraries are addressed with the highest priority.

More Information: Google Cloud Platform Launch Stages

Contributions welcome! See the Contributing Guide .

Please note that this README.md , the samples/README.md , and a variety of configuration files in this repository (including .nycrc and tsconfig.json ) are generated from a central template. To edit one of these files, make an edit to its templates in directory .

Apache Version 2.0

See LICENSE

Node.js - Google Cloud Text-to-Speech API Examples

Node.js - Google Cloud Text-to-Speech

For some reasons you may need to convert a text into an audio file. The so called text-to-speech technology allows you to do so. Developing your own text-to-speech technology takes a long time and it's not an easy thing. Therefore, the easiest solution is using a service, with the drawback of having to pay.

One of the Text-to-Speech service is provied by Google. It's known to have a pretty good results. They also provide the API, makes it easy to integrate with your application. In this tutorial, I'm going to show you the basic example usages of Google Text-to-Speech API in Node.js, from the preparation until the code.

Preparation

1. Create or select a Google Cloud project

A Google Cloud project is required to use this service. Open Google Cloud console , then create a new project or select existing project

2. Enable billing for the project

Like other cloud platforms, Google requires you to enable billing for your project. If you haven't set up billing, open billing page .

3. Enable Google Text-to-Speech API

To use an API, you must enable it first. Open this page to enable Text-to-Speech API.

4. Set up service account for authentication

As for authentication, you need to create a new service account. Create a new one on the service account management page and download the credentials, or you can use your already created service account.

In your .env file, you have to add a new variable

The .env file should be loaded of course, so you need to use a module for reading  .env such as dotenv .

Dependencies

This tutorial uses @google-cloud/text-to-speech . Add the following dependency to your package.json and run npm install

1. Synthesize Speech

The example below is a basic example of how to use speech synthesization. You need to provide the text to synthesize, audio encoding, and voice output configuration (optional). If successful, it will return audioContent on the response body. Then you can write it to a file.

2. List Voices

The example below is for getting the list of voices supported by Google Text-to-Speech service. You may need to run it to get the latest list.

Below is the list of supported voices at the time this post was written.

Language Code Name SSML Gender Natural Sample Rate (Hz)
es-ES es-ES-Standard-A FEMALE 24000
it-IT it-IT-Standard-A FEMALE 24000
ja-JP ja-JP-Standard-A FEMALE 22050
ko-KR ko-KR-Standard-A FEMALE 22050
pt-BR pt-BR-Standard-A FEMALE 24000
tr-TR tr-TR-Standard-A FEMALE 22050
sv-SE sv-SE-Standard-A FEMALE 22050
nl-NL nl-NL-Standard-A FEMALE 24000
en-US en-US-Wavenet-D MALE 24000
de-DE de-DE-Wavenet-A FEMALE 24000
de-DE de-DE-Wavenet-B MALE 24000
de-DE de-DE-Wavenet-C FEMALE 24000
de-DE de-DE-Wavenet-D MALE 24000
en-AU en-AU-Wavenet-A FEMALE 24000
en-AU en-AU-Wavenet-B MALE 24000
en-AU en-AU-Wavenet-C FEMALE 24000
en-AU en-AU-Wavenet-D MALE 24000
en-GB en-GB-Wavenet-A FEMALE 24000
en-GB en-GB-Wavenet-B MALE 24000
en-GB en-GB-Wavenet-C FEMALE 24000
en-GB en-GB-Wavenet-D MALE 24000
en-US en-US-Wavenet-A MALE 24000
en-US en-US-Wavenet-B MALE 24000
en-US en-US-Wavenet-C FEMALE 24000
en-US en-US-Wavenet-E FEMALE 24000
en-US en-US-Wavenet-F FEMALE 24000
fr-FR fr-FR-Wavenet-A FEMALE 24000
fr-FR fr-FR-Wavenet-B MALE 24000
fr-FR fr-FR-Wavenet-C FEMALE 24000
fr-FR fr-FR-Wavenet-D MALE 24000
it-IT it-IT-Wavenet-A FEMALE 24000
ja-JP ja-JP-Wavenet-A FEMALE 24000
nl-NL nl-NL-Wavenet-A FEMALE 24000
en-GB en-GB-Standard-A FEMALE 24000
en-GB en-GB-Standard-B MALE 24000
en-GB en-GB-Standard-C FEMALE 24000
en-GB en-GB-Standard-D MALE 24000
en-US en-US-Standard-B MALE 24000
en-US en-US-Standard-C FEMALE 24000
en-US en-US-Standard-D MALE 24000
en-US en-US-Standard-E FEMALE 24000
de-DE de-DE-Standard-A FEMALE 24000
de-DE de-DE-Standard-B MALE 24000
en-AU en-AU-Standard-A FEMALE 24000
en-AU en-AU-Standard-B MALE 24000
en-AU en-AU-Standard-C FEMALE 24000
en-AU en-AU-Standard-D MALE 24000
fr-CA fr-CA-Standard-A FEMALE 24000
fr-CA fr-CA-Standard-B MALE 24000
fr-CA fr-CA-Standard-C FEMALE 24000
fr-CA fr-CA-Standard-D MALE 24000
fr-FR fr-FR-Standard-A FEMALE 24000
fr-FR fr-FR-Standard-B MALE 24000
fr-FR fr-FR-Standard-C FEMALE 24000
fr-FR fr-FR-Standard-D MALE 24000

That's all about how to use Google Text-to-Speech API in Node.js. Thank you for reading this post.

text to speech google npm

Ivan Andrianto

Ivan Andrianto is a software engineer and the founder of woolha.com. I create high-quality programming tutorials for free.

Web Analytics Made Easy - StatCounter

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Google text-to-speech nodejs

I am trying to code a nodejs application that uses google tts api what my problem is, it returns an url to an audio. I need to be able to hear the text automatically without going to link and playing the audio.

  • text-to-speech

Çağdaş Öksüztepe's user avatar

  • You need to provide more information about what you're looking for and what you've tried so far to get it. This isn't enough for anyone to give you an answer. –  Paul Commented Nov 10, 2017 at 12:59

3 Answers 3

first, install mpv player then try this ==>

Onur Durmuş's user avatar

Just take the url and "play it" – it's a link to audio file. Example using play-sound :

The play-sound package works by executing an external player – see #options for a list. You can even specify another one with the player option. The player needs to support playing from https urls, obviously. I tried it with mpv and it works perfectly.

If you can't or don't want to use the external player, you'll need to fetch the audio, get the data buffer from response and play it somehow. So something along this way:

helb's user avatar

  • I tried using your play sound code but i doesn't play the audio at the url and it doesn't give me any error. –  Çağdaş Öksüztepe Commented Nov 10, 2017 at 14:28
  • It did that to me when it couldn't find any of the listed players. What OS are you on? –  helb Commented Nov 10, 2017 at 14:34
  • Ubuntu 16.04. I am using node 7.5.0 and npm 4.1.2 –  Çağdaş Öksüztepe Commented Nov 10, 2017 at 14:54
  • And do you have any of the listed players installed? Please try running which mplayer afplay mpg123 mpg321 play omxplayer in terminal (and post the output here). –  helb Commented Nov 10, 2017 at 14:57
  • when i use Afplay, mplayer , mpg123, play, omxplayer, i get this error: ===> (node:7148) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): TypeError: player.play is not a function (node:7148) DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code. –  Çağdaş Öksüztepe Commented Nov 13, 2017 at 6:09

Play directly to speakers in nodejs

  • [Terminal] install play: sudo apt install sox
  • [Terminal] install encoder: sudo apt install libsox-fmt-mp3
  • [Terminal] install node-gtts : npm install node-gtts
  • [IDE][speech.js] See code listing
  • it would be better all in memory but the mp3 encoder lame is currently not installing on the current version of nodejs

If they get that fixed, then this code will work

toddmo's user avatar

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged javascript node.js text-to-speech or ask your own question .

  • The Overflow Blog
  • The hidden cost of speed
  • The creator of Jenkins discusses CI/CD and balancing business with open source
  • Featured on Meta
  • Announcing a change to the data-dump process
  • Bringing clarity to status tag usage on meta sites
  • What does a new user need in a homepage experience on Stack Overflow?
  • Feedback requested: How do you use tag hover descriptions for curating and do...
  • Staging Ground Reviewer Motivation

Hot Network Questions

  • Can you equip or unequip a weapon before or after a Bonus Action?
  • An error in formula proposed by Riley et al to calculate the sample size
  • Can I counter an opponent's attempt to counter my own spell?
  • What would be a good weapon to use with size changing spell
  • What is the nature of the relationship between language and thought?
  • Is there a way to read lawyers arguments in various trials?
  • How to change upward facing track lights 26 feet above living room?
  • Why didn't Air Force Ones have camouflage?
  • Gravitational potential energy of a water column
  • What does an isolated dash mean in figured bass?
  • Book about a wormhole found inside the Moon
  • Nausea during high altitude cycling climbs
  • What's the benefit or drawback of being Small?
  • Is the 2024 Ukrainian invasion of the Kursk region the first time since WW2 Russia was invaded?
  • Transform a list of rules into a list of function definitions
  • Inductive and projective limit of circles
  • Where is this railroad track as seen in Rocky II during the training montage?
  • How to run only selected lines of a shell script?
  • Did Babylon 4 actually do anything in the first shadow war?
  • How to connect 20 plus external hard drives to a computer?
  • Why is notation in logic so different from algebra?
  • Is it helpful to use a thicker gage wire for part of a long circuit run that could have higher loads?
  • Tiller use on takeoff
  • Is a stable quantifier-free language really possible?

text to speech google npm

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.

googleapis/nodejs-text-to-speech

Folders and files.

NameName
474 Commits

Repository files navigation

THIS REPOSITORY IS DEPRECATED. ALL OF ITS CONTENT AND HISTORY HAS BEEN MOVED TO GOOGLE-CLOUD-NODE

Google Cloud Platform logo

Google Cloud Text-to-Speech: Node.js Client

release level

Cloud Text-to-Speech API client for Node.js

A comprehensive list of changes in each version may be found in the CHANGELOG .

  • Google Cloud Text-to-Speech Node.js Client API Reference
  • Google Cloud Text-to-Speech Documentation
  • github.com/googleapis/nodejs-text-to-speech

Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained .

Table of contents:

Before you begin

Installing the client library, using the client library, contributing.

  • Select or create a Cloud Platform project .
  • Enable billing for your project .
  • Enable the Google Cloud Text-to-Speech API .
  • Set up authentication with a service account so you can access the API from your local workstation.

Samples are in the samples/ directory. Each sample's README.md has instructions for running its sample.

Sample Source Code Try it
Audio Profile
List Voices
Quickstart
Ssml Addresses
Synthesize

The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.

Supported Node.js Versions

Our client libraries follow the Node.js release schedule . Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.

Google's client libraries support legacy versions of Node.js runtimes on a best-efforts basis with the following warnings:

  • Legacy versions are not tested in continuous integration.
  • Some security patches and features cannot be backported.
  • Dependencies cannot be kept up-to-date.

Client libraries targeting some end-of-life versions of Node.js are available, and can be installed through npm dist-tags . The dist-tags follow the naming convention legacy-(version) . For example, npm install @google-cloud/text-to-speech@legacy-8 installs client libraries for versions compatible with Node.js 8.

This library follows Semantic Versioning .

This library is considered to be stable . The code surface will not change in backwards-incompatible ways unless absolutely necessary (e.g. because of critical security issues) or with an extensive deprecation period. Issues and requests against stable libraries are addressed with the highest priority.

More Information: Google Cloud Platform Launch Stages

Contributions welcome! See the Contributing Guide .

Please note that this README.md , the samples/README.md , and a variety of configuration files in this repository (including .nycrc and tsconfig.json ) are generated from a central template. To edit one of these files, make an edit to its templates in directory .

Apache Version 2.0

See LICENSE

Code of conduct

Security policy, releases 48, contributors 34.

Google-text-to-speech Packages

Splits long texts with SSML tags by batches suitable for working with AWS Polly TTS and Google Cloud Text to Speech.

node-google-text-to-speech

Google TTS(Text-To-Speech) for node.js

google-text-to-speech

free text to speech with google translate

google-tts.js

A wrapper for the Google Text To Speech API with various features.

@sefinek/google-tts-api

Google TTS (Text-To-Speech) for Node.js.

talkify-tts

  • 0 Dependencies
  • 1 Dependents
  • 47 Versions

A javascript text to speech (TTS) library. Originally from and used by https://talkify.net .

Give a voice to your website in a matter of minutes. Talkify library provides you with high quality text to speech (TTS) voices in many languages.

To use our backend services (our hosted voices) you will require an api-key. Visit our portal ( https://manage.talkify.net ) to create your own API-key, Talkify offers 1000 free requests per month.

Dependencies

Configuration.

  • Form reader

Text selection reader

Installation.

Font Awesome 5+ (Used in Talkify Control Center)

Quick demos

  • Web Reader http://jsfiddle.net/5atrbjc6/
  • Form Reader http://jsfiddle.net/dx53bg6k/2/
  • Text selection Reader http://jsfiddle.net/t5dbcL64/
  • Enhanced text visibility http://jsfiddle.net/pwbqkzxj/2/

Include the scripts and stylesheets

Minified version, non-minified version, stylesheets.

You find our stylesheets under /styles folder. Include the stylesheets that you need (i.e. all under /modern-control-center for our "modern" UI).

Play all, top to bottom

Play simple text.

High qualiy voices ( https://manage.talkify.net/docs#voices )

Supported languages:

Text highligting for easy read along

Control pitch, pauses between words, volume, speech rate, phonation and much more

Download as mp3

Playback of entire website or paragraph/s of your choice

Fully integrated UI options

Read web forms aloud

Listen to selected text

Enhanced visibility features

When useSSML is active, Talkify will translate the following markup into SSML. This has the potential of creating a smoother voice experience.

HTML tags SSML
h1 - h3 emphasis strong
b emphasis strong
strong emphasis strong
i emphasis reduced
em emphasis strong
br break-strength strong

Declarative settings

These settings are only supported by the TtsPlayer for now.

Talkify supports declarative settings. These settings will override general settings. The following attributes can be added to any element that Talkify is connected to. When these attributes are present, Talkify will use them as playback settings.

data-attribute Accepted values Example Remarks
data-talkify-wordbreakms [0, 10000] data-talkify-wordbreakms="100"
data-talkify-pitch [-5, 5] data-talkify-pitch="-2"
data-talkify-rate [-10, 10] data-talkify-rate="-2"
data-talkify-voice Any authorized voice data-talkify-voice="David"
data-talkify-phonation "soft", "normal" or "" data-talkify-phonation="soft"
data-talkify-whisper "true" or "false" data-talkify-whisper="true"
data-talkify-whisper "true" or "false" data-talkify-whisper="true"
data-talkify-read-as-lowercase "true" data-talkify-read-as-lowercase="true" Some voices spell out capital letters, which might be unwanted, this setting will read the content of the element as lower case

WebReader demo

Talkify lives in its own namespace - talkify. Hence, everything below is scoped to that namespace (i.e. talkify.playlist, etc).

Auto scroll

Talkify provides an opt in auto scroll to the item to be played.

Activate the feature by calling talkify.autoScroll.activate()

| Method | | activate |

Playlist fluent builder

Playlist builder is Talkifys way to instantiate your playlist. It comes with a fluent API.

Entry point: talkify.playlist()

Method Parameters Default Description Mandatory
begin Entry point. Call this to start building your playlist Yes
usingPlayer TtsPlayer/Html5Player Specify which player to be used. Yes
withTextInteraction Enables you to click on paragraphs (and other text) to play No
withElements DOM elements Specifies with elements to play. If omitted, Talkify will crawl the page and select for you No
excludeElements Array of DOM-elements [] For example: document.querySelectorAll("button") No
withTables Table configuration, array of objects* Reads tables in a more intuitive way. The relevant header is repeated before each cell No
withRootSelector string 'body' Sets the scope from where Talkify will start to crawl the page for text to play No
subscribeTo Json object Event subscriptions No
build Finalizes and creates the playlist instance Yes

*withTables parameter is an array of objects with the following properties:

  • table (DOM-query selector or actual DOM-elements)
  • headerCells (Optional. DOM-query selector or actual DOM-elements. Defaults to "th")
  • bodyCells (Optional. DOM-query selector or actual DOM-elements. Defaults to "td")

withTables works with any standard HTML-table and other non-standard tabular content (for example bootstrap grid system). For non standard tabular content, please use the optional parameters to tell Talkify which elements are header cells and which are body cells.

This is the instance built from the playliste above.

Method Parameters Default Description
getQueue Returns the playlist queue
play Begins playback of playlist
pause Pauses playlist
replayCurrent Replays the current item in the playlist
insert DOM element Inserts new html elements to play. Useful for elements that Talkify were unable to locate. Elements will be inserted in correct order with respect to the page.
isPlaying True if any item is currently in a playing state
setPlayer TtsPlayer/Html5Player Sets the player that the playlist is using
enableTextInteraction Enables click to play on HTML elements
disableTextInteraction Disables click to play on HTML elements
dispose Clean up

Playlist Events

Event
onEnded
onVoiceCommandListeningStarted
onVoiceCommandListeningEnded

Player (valid for all players)

Method Parameters Default Description
enableTextHighlighting Tells the player to use text highlighting. For Html5Player this only works on localVoice.
disableTextHighlighting Turns off text highlighting.
subscribeTo Json object Event listeners
playText string Plays a text
paused True if paused
isPlaying True if playing
play Play
pause Pause
forceVoice object For Talkify hosted voices, this is a JSON object with a name property. The value of name should be the name of a voice from /api/speech/v1/voices. For browser voices, this is the actual voice from window.speechSynthesis.getVoices()
enableEnhancedTextVisibility Enables enhanced text visibility. Subtitle-bar, with a larger font-size, is added to the bottom of the screen.
disableEnhancedTextVisibility Disables enhanced text visibility

Html5Player only

Entry point: talkify.Html5Player().

Method Parameters Default Description
forceLanguage string Force the usage of a specific language. Use standard cultures like se-SE for Swedish and so on. Talkify will select a voice that matches the culture.
setRate double 1 [0.0, 2.0] Playback rate.
setVolume double 1 [0.0 - 1.0 ]
usePitch double 1 [0.0, 2.0] Adjusts the pitch of the voice.

Talkify hosted only

Entry point: talkify.TtsPlayer(options?).

constructor parameter "options" is optional. Example { controlcenter: { container: document.querySelector('p.selector') , name: 'modern' }}

Method Parameters Default Description
setRate int 1 Playback rate. A value between -5 and 5
whisper Sets the player to whispering mode
normalTone Sets the player to normal mode (opposite of whispering)
usePhonation string normal Supports for two phonations. "soft" and "normal". Empty string translates to "normal". Case sensitive
useWordBreak int 0 [0-10000] Adds a break between each word. Any value above 0 adds to the voices standard break length.
usePitch int 0 [-10 - +10] Adjusts the pitch of the voice.
useVolumeBaseline double 0 [-10 - +10] Adjusts the volume baseline

Player Events

Event
onBeforeItemPlaying
onSentenceComplete
onPause
onPlay
onResume
onItemLoaded
onTextHighligtChanged

Example: talkify.formReader.addForm(document.getElementById("form-id"));

Method Parameters Default Description
addForm form element None Adds TTS functionality to the form.
removeForm form element None Unbinds all TTS functionality from the form

This feature allows the user to select/mark text using the mouse and have that text read aloud.

Method Parameters Default Description
activate - - Call this method to actiate the feature
deactivate - - Call this method to deactivate the feature
withTextHighlighting - - Presets text highlighting to activated. Users can turn this off in the control center UI
withEnhancedVisibility - - Presets enhanced visibility to activated. Users can turn this off in the control center UI
withVoice voice object { name: 'Zira' } A voice object from our backend voice API or at the very least an object wih a name property including a valid voice name
withButtonText string "Listen" The text that appears on popover button
excludeElements Array of DOM-elements [] For example: document.querySelectorAll("button")

React to events

TLDR; Example @ http://jsfiddle.net/andreas_hagsten/x6pve0jd/8/

Talkify provides two event models - PubSub and classic callbacks. The newest, and primary, model is the PubSub model. PubSub is a loosly coupled model which enables client applications to hook in to the Talkify pipeline. To subscribe to events you will need to pass a context key (used when unsubscribing) as well as the event type and the event handler function. The event type is a string containing topics. An event is normally divided into 4 topics - context, origin, type and action.

The Context topic

You would use this top level topic if you run multiple instances of Talkify. This allows you to hook into a specific Talkify instance. If you want to listen to all instances or only have one just specify "*". You will find the context ID in the property "correlationId" of your Player instance.

The Origin topic

Where the event originates from. For example "player" or "controlcenter". A common use case is to listen to player events which is done by specifying "player" in this topic section.

The type topic

Type of event. For example "tts" for TTS-based events.

The action topic

This is the topic that describes what action is taken. This can be "play", "loading", "pause" and so forth.

Putting all 4 topics together forms the event type to listen to. You can replace any part with the wildcard "*" which means that you listens to all events of the given topic.

A few examples can be seen below. A full list of events supported is listed Here .

PubSub events

Type args (TBD)
{contextId}.player.tts.ratechanged
{contextId}.player.tts.seeked
{contextId}.player.tts.pause
{contextId}.player.tts.timeupdated
{contextId}.player.tts.play
{contextId}.player.tts.resume
{contextId}.player.tts.loading
{contextId}.player.tts.loaded
{contextId}.player.tts.ended
{contextId}.player.tts.voiceset
{contextId}.player.tts.texthighlight.enabled
{contextId}.player.tts.texthighlight.disabled
{contextId}.player.tts.prepareplay
{contextId}.player.tts.disposed
{contextId}.player.tts.error
{contextId}.player.tts.phonationchanged
{contextId}.player.tts.whisperchanged
{contextId}.player.tts.wordbreakchanged
{contextId}.player.tts.volumechanged
{contextId}.player.tts.pitchchanged
{contextId}.player.tts.created
{contextId}.player.tts.unplayable
{contextId}.player.tts.enhancedvisibilityset
{contextId}.player.tts.creating
- -
{contextId}.player.html5.ratechanged
{contextId}.player.html5.pause
{contextId}.player.html5.utterancecomplete
{contextId}.player.html5.ended
{contextId}.player.html5.loaded
{contextId}.player.html5.play
{contextId}.player.html5.timeupdated
{contextId}.player.html5.voiceset
{contextId}.player.html5.texthighlight.enabled
{contextId}.player.html5.texthighlight.disabled
{contextId}.player.html5.prepareplay
{contextId}.player.html5.created
{contextId}.player.html5.unplayable
{contextId}.player.html5.enhancedvisibilityset
{contextId}.player.html5.creating
- -
{contextId}.controlcenter.request.play
{contextId}.controlcenter.request.pause
{contextId}.controlcenter.request.rate
{contextId}.controlcenter.request.volume
{contextId}.controlcenter.request.pitch
{contextId}.controlcenter.request.wordbreak
{contextId}.controlcenter.request.phonation.normal
{contextId}.controlcenter.request.phonation.soft
{contextId}.controlcenter.request.phonation.whisper
{contextId}.controlcenter.request.texthighlightoggled
{contextId}.controlcenter.request.textinteractiontoggled
{contextId}.controlcenter.request.enhancedvisibility
{contextId}.controlcenter.attached
{contextId}.controlcenter.detached
- -
{contextId}.wordhighlighter.complete
- -
{contextId}.playlist.playing
{contextId}.playlist.loaded
{contextId}.playlist.textinteraction.enabled
{contextId}.playlist.textinteraction.disabled
  • text to speech
  • speech synthesis

Package Sidebar

npm i talkify-tts

Git github.com/Hagsten/Talkify

github.com/Hagsten/Talkify#readme

Downloads Weekly Downloads

Unpacked size, total files, last publish, collaborators.

andreas.hagsten

  • Español – América Latina
  • Português – Brasil
  • Documentation
  • Cloud Text-to-Speech API

Create voice audio files

Text-to-Speech allows you to convert words and sentences into base64 encoded audio data of natural human speech. You can then convert the audio data into a playable audio file like an MP3 by decoding the base64 data. The Text-to-Speech API accepts input as raw text or Speech Synthesis Markup Language (SSML) .

This document describes how to create an audio file from either text or SSML input using Text-to-Speech. You can also review the Text-to-Speech basics article if you are unfamiliar with concepts like speech synthesis or SSML.

These samples require that you have installed and initialized the Google Cloud CLI. For information about setting up the gcloud CLI, see Authenticate to TTS .

Convert text to synthetic voice audio

The following code samples demonstrate how to convert a string into audio data.

You can configure the output of speech synthesis in a variety of ways, including selecting a unique voice or modulating the output in pitch, volume, speaking rate, and sample rate .

Refer to the text:synthesize API endpoint for complete details.

To synthesize audio from text, make an HTTP POST request to the text:synthesize endpoint. In the body of your POST request, specify the type of voice to synthesize in the voice configuration section, specify the text to synthesize in the text field of the input section, and specify the type of audio to create in the audioConfig section.

The following code snippet sends a synthesis request to the text:synthesize endpoint and saves the results to a file named synthesize-text.txt . Replace PROJECT_ID with your project ID.

The Text-to-Speech API returns the synthesized audio as base64-encoded data contained in the JSON output. The JSON output in the synthesize-text.txt file looks similar to the following code snippet.

To decode the results from the Text-to-Speech API as an MP3 audio file, run the following command from the same directory as the synthesize-text.txt file.

To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Go API reference documentation .

To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Java API reference documentation .

To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Node.js API reference documentation .

To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Python API reference documentation .

Additional languages

C# : Please follow the C# setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for .NET.

PHP : Please follow the PHP setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for PHP.

Ruby : Please follow the Ruby setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for Ruby.

Convert SSML to synthetic voice audio

Using SSML in your audio synthesis request can produce audio that is more similar to natural human speech. Specifically, SSML gives you finer-grain control over how the audio output represents pauses in the speech or how the audio pronounces dates, times, acronyms, and abbreviations.

For more details on the SSML elements supported by Text-to-Speech API, see the SSML reference .

To synthesize audio from SSML, make an HTTP POST request to the text:synthesize endpoint. In the body of your POST request, specify the type of voice to synthesize in the voice configuration section, specify the SSML to synthesize in the ssml field of the input section, and specify the type of audio to create in the audioConfig section.

The following code snippet sends a synthesis request to the text:synthesize endpoint and saves the results to a file named synthesize-ssml.txt . Replace PROJECT_ID with your project ID.

The Text-to-Speech API returns the synthesized audio as base64-encoded data contained in the JSON output. The JSON output in the synthesize-ssml.txt file looks similar to the following code snippet.

To decode the results from the Text-to-Speech API as an MP3 audio file, run the following command from the same directory as the synthesize-ssml.txt file.

Try it for yourself

If you're new to Google Cloud, create an account to evaluate how Text-to-Speech performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-09-05 UTC.

IMAGES

  1. @google-cloud/text-to-speech CDN by jsDelivr

    text to speech google npm

  2. How to Use Google Speech Recognition

    text to speech google npm

  3. Google Text To Speech: Read Texts On Your Screen Aloud

    text to speech google npm

  4. Text to Speech on Google Docs

    text to speech google npm

  5. [Geprüft & Sicher] Google Text-to-Speech Tutorials

    text to speech google npm

  6. Google Cloud Text to Speech API: The Future of AI Voice Synthesis

    text to speech google npm

VIDEO

  1. Find inspiration wherever you are. Be the best version of you." #deepikapadukone

  2. Sirivennela Sitaramasastri Speech at Fidaa Movie Audio Launch || Varun Tej, Sai Pallavi

  3. 5 Lines Speech On Independence Day For Lkg/Ukg/First/Second Class in English

  4. 7 चीजे हमेसा याद रखो #motivation #shortvideo #shreekrishna motivational speech #quotes #youtubeshort

  5. 💋 Text To Speech 💋 ASMR Satisfying Eating || @Briannaguidryy || POVs Tiktok Compilations 2023 #

  6. Life Changing Motivational Speech Hindi Video Mr Jodh Inspirational quotes

COMMENTS

  1. @google-cloud/text-to-speech

    Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.

  2. @google-cloud/speech

    Cloud Speech Client Library for Node.js. Latest version: 6.7.0, last published: a month ago. Start using @google-cloud/speech in your project by running `npm i @google-cloud/speech`. There are 98 other projects in the npm registry using @google-cloud/speech.

  3. @google-cloud/text-to-speech

    Cloud Text-to-Speech API client for Node.js. A comprehensive list of changes in each version may be found in the CHANGELOG. Google Cloud Text-to-Speech Node.js Client API Reference

  4. Text-to-Speech client libraries

    This page shows how to get started with the Cloud Client Libraries for the Text-to-Speech API. Client libraries make it easier to access Google Cloud APIs from a supported language. Although you can use Google Cloud APIs directly by making raw requests to the server, client libraries provide simplifications that significantly reduce the amount ...

  5. Node.js client library

    The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported ...

  6. Package @google-cloud/text-to-speech (5.0.2)

    tpu; trace-agent; translate; vertexai; video-intelligence; video-stitcher; video-transcoder; vision; vmmigration; vmwareengine; vpc-access; web-risk; web-security-scanner

  7. Using the Text-to-Speech API with Node.js

    Google Cloud Text-to-Speech API allows developers to include natural-sounding, synthetic human speech as playable audio in their applications. The Text-to-Speech API converts text or Speech Synthesis Markup Language (SSML) input into audio data like MP3 or LINEAR16 (the encoding used in WAV files). In this codelab, you will focus on using the ...

  8. Google Cloud Text-to-Speech: Node.js Client

    The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Versioning. This library follows Semantic Versioning.. This library is considered to be in alpha.This means it is still a work-in-progress and under active development.

  9. Using the Speech-to-Text API with Node.js

    Install the Google Cloud Speech-to-Text API client library for Node.js First, create a project that you will use to run this Speech-to-Text API lab, initialize a new Node.js package in a folder of your choice: npm init NPM asks several questions about the project configuration, such as name and version. ... npm install --save @google-cloud/speech

  10. Google Cloud Text-to-Speech: Node.js Client

    The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported ...

  11. Node.js client library

    The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js.. Client libraries targeting some end-of-life versions of Node.js are available, and can be installed via npm dist-tags.

  12. zlargon/google-tts: Google TTS (Text-To-Speech) for node.js

    Google TTS (Text-To-Speech) for node.js. Contribute to zlargon/google-tts development by creating an account on GitHub. Skip to content. Navigation Menu ... $ npm install --save google-tts-api $ npm install -D typescript @types/node # Only for TypeScript. Change Log. Please see CHANGELOG. Usage. Method Options (all optional)

  13. @google-cloud/text-to-speech NPM

    The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported ...

  14. google-tts-api

    google-tts. text-to-speech. Google TTS (Text-To-Speech) for node.js. Latest version: 2.0.2, last published: 3 years ago. Start using google-tts-api in your project by running `npm i google-tts-api`. There are 83 other projects in the npm registry using google-tts-api.

  15. Node.js

    The so called text-to-speech technology allows you to do so. Developing your own text-to-speech technology takes a long time and it's not an easy thing. Therefore, the easiest solution is using a service, with the drawback of having to pay. One of the Text-to-Speech service is provied by Google. It's known to have a pretty good results.

  16. javascript

    Play directly to speakers in nodejs Install [Terminal] install play: sudo apt install sox [Terminal] install encoder: sudo apt install libsox-fmt-mp3 [Terminal] install node-gtts: npm install node-gtts [IDE][speech.js] See code listing

  17. Google Cloud Text-to-Speech: Node.js Client

    Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.

  18. speech-to-text

    The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, as well as the latest LeMUR models. AssemblyAI. Speech-to-text. Transcription. Audio.

  19. Node.js client library

    The Cloud Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.

  20. Text-to-Speech AI: Lifelike Speech Synthesis

    Convert text into natural-sounding speech using an API powered by the best of Google's AI technologies. New customers get up to $300 in free credits to try Text-to-Speech and other Google Cloud products. Try Text-to-Speech free Contact sales. Improve customer interactions with intelligent, lifelike responses.

  21. Google-text-to-speech

    A wrapper for the Google Text To Speech API with various features. google-tts text-to-speech google-text-to-speech google texttospeech googletexttospeech 0.0.2 • Published 1 year ago

  22. talkify-tts

    A JavaScript text to speech (TTS) library. Provides you with high quality TTS voices in many languages and a high quality language. These voices and engines runs on a Talkify hosted server. This lib also supports browser built in voices via the SpeechSynt. Latest version: 4.0.0, last published: 10 months ago. Start using talkify-tts in your project by running `npm i talkify-tts`. There is 1 ...

  23. Create voice audio files

    Convert SSML to synthetic voice audio. Text-to-Speech allows you to convert words and sentences into base64 encoded audio data of natural human speech. You can then convert the audio data into a playable audio file like an MP3 by decoding the base64 data. The Text-to-Speech API accepts input as raw text or Speech Synthesis Markup Language (SSML).