- Español – América Latina
- Português – Brasil
- Tiếng Việt
Using the Speech-to-Text API with Node.js
1. overview.
Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API.
In this codelab, you will focus on using the Speech-to-Text API with Node.js. You will learn how to send an audio file in English and other languages to the Cloud Speech-to-Text API for transcription.
What you'll learn
- How to enable the Speech-to-Text API
- How to Authenticate API requests
- How to install the Google Cloud client library for Node.js
- How to transcribe audio files in English
- How to transcribe audio files with word timestamps
- How to transcribe audio files in different languages
What you'll need
- A Google Cloud Platform Project
- A Browser, such Chrome or Firefox
- Familiarity using Javascript/Node.js
How will you use this tutorial?
How would you rate your experience with node.js, how would you rate your experience with using google cloud platform services, 2. setup and requirements, self-paced environment setup.
- Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one .)
Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID .
- Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.
Running through this codelab shouldn't cost much, if anything at all. Be sure to to follow any instructions in the "Cleaning up" section which advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.
Start Cloud Shell
While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell , a command line environment running in the Cloud.
Activate Cloud Shell
If you've never started Cloud Shell before, you'll be presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:
It should only take a few moments to provision and connect to Cloud Shell.
This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.
Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
Command output
If it is not, you can set it with this command:
3. Enable the Speech-to-Text API
Before you can begin using the Speech-to-Text API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:
4. Authenticate API requests
In order to make requests to the Speech-to-Text API, you need to use a Service Account . A Service Account belongs to your project and it is used by the Google Client Node.js library to make Speech-to-Text API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the Cloud SDK to create a service account and then create credentials you will need to authenticate as the service account.
First, set an environment variable with your PROJECT_ID which you will use throughout this codelab, if you are using Cloud Shell this will be set for you:
Next, create a new service account to access the Speech-to-Text API by using:
Next, create credentials that your Node.js code will use to login as your new service account. Create these credentials and save it as a JSON file ~/key.json by using the following command:
Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Speech-to-Text API Node.js library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:
You can read more about authenticating the Speech-to-Text API .
5. Install the Google Cloud Speech-to-Text API client library for Node.js
First, create a project that you will use to run this Speech-to-Text API lab, initialize a new Node.js package in a folder of your choice:
NPM asks several questions about the project configuration, such as name and version. For each question, press ENTER to accept the default values. The default entry point is a file named index.js .
Next, install the Google Cloud Speech library to the project:
For more instructions on how to set up a Node.js development for Google Cloud please see the Setup Guide .
Now, you're ready to use Speech-to-Text API!
6. Transcribe Audio Files
In this section, you will transcribe a pre-recorded audio file in English. The audio file is available on Google Cloud Storage.
Navigate to the index.js file inside the and replace the code with the following:
Take a minute or two to study the code and see it is used to transcribe an audio file*.*
The Encoding parameter tells the API which type of audio encoding you're using for the audio file. Flac is the encoding type for .raw files (see the doc for encoding type for more details).
In the RecognitionAudio object, you can pass the API either the uri of our audio file in Cloud Storage or the local file path for the audio file. Here, we're using a Cloud Storage uri.
Run the program:
You should see the following output:
7. Transcribe with word timestamps
Speech-to-Text can detect time offset (timestamp) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.
Take a minute or two to study the code and see it is used to transcribe an audio file with word timestamps*.* The EnableWordTimeOffsets parameter tells the API to enable time offsets (see the doc for more details).
Run your program again:
8. Transcribe different languages
Speech-to-Text API supports transcription in over 100 languages! You can find a list of supported languages here .
In this section, you will transcribe a pre-recorded audio file in French. The audio file is available on Google Cloud Storage.
Run your program again and you should see the following output:
This is a sentence from a popular French children's tale .
For the full list of supported languages and language codes, see the documentation here .
9. Congratulations!
You learned how to use the Speech-to-Text API using Node.js to perform different kinds of transcription on audio files!
To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:
- Go to the Cloud Platform Console .
- Select the project you want to shut down, then click ‘Delete' at the top: this schedules the project for deletion.
- Google Cloud Speech-to-Text API: https://cloud.google.com/speech-to-text/docs
- Node.js on Google Cloud Platform: https://cloud.google.com/nodejs/
- Google Cloud Node.js client: https://googlecloudplatform.github.io/google-cloud-node/
This work is licensed under a Creative Commons Attribution 2.0 Generic License.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Google Cloud Text-to-Speech: Node.js Client
Cloud Text-to-Speech API client for Node.js
A comprehensive list of changes in each version may be found in the CHANGELOG .
- Google Cloud Text-to-Speech Node.js Client API Reference
- Google Cloud Text-to-Speech Documentation
- github.com/googleapis/google-cloud-node/packages/google-cloud-texttospeech
Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained .
Table of contents:
Before you begin
Installing the client library, using the client library, contributing.
- Select or create a Cloud Platform project .
- Enable billing for your project .
- Enable the Google Cloud Text-to-Speech API .
- Set up authentication with a service account so you can access the API from your local workstation.
Samples are in the samples/ directory. Each sample's README.md has instructions for running its sample.
Sample | Source Code | Try it |
---|---|---|
Text_to_speech.list_voices | ||
Text_to_speech.streaming_synthesize | ||
Text_to_speech.synthesize_speech | ||
Text_to_speech_long_audio_synthesize.synthesize_long_audio | ||
Text_to_speech.list_voices | ||
Text_to_speech.streaming_synthesize | ||
Text_to_speech.synthesize_speech | ||
Text_to_speech_long_audio_synthesize.synthesize_long_audio | ||
Quickstart |
The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.
Supported Node.js Versions
Our client libraries follow the Node.js release schedule . Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.
Google's client libraries support legacy versions of Node.js runtimes on a best-efforts basis with the following warnings:
- Legacy versions are not tested in continuous integration.
- Some security patches and features cannot be backported.
- Dependencies cannot be kept up-to-date.
Client libraries targeting some end-of-life versions of Node.js are available, and can be installed through npm dist-tags . The dist-tags follow the naming convention legacy-(version) . For example, npm install @google-cloud/text-to-speech@legacy-8 installs client libraries for versions compatible with Node.js 8.
This library follows Semantic Versioning .
This library is considered to be stable . The code surface will not change in backwards-incompatible ways unless absolutely necessary (e.g. because of critical security issues) or with an extensive deprecation period. Issues and requests against stable libraries are addressed with the highest priority.
More Information: Google Cloud Platform Launch Stages
Contributions welcome! See the Contributing Guide .
Please note that this README.md , the samples/README.md , and a variety of configuration files in this repository (including .nycrc and tsconfig.json ) are generated from a central template. To edit one of these files, make an edit to its templates in directory .
Apache Version 2.0
See LICENSE
Node.js - Google Cloud Text-to-Speech API Examples
For some reasons you may need to convert a text into an audio file. The so called text-to-speech technology allows you to do so. Developing your own text-to-speech technology takes a long time and it's not an easy thing. Therefore, the easiest solution is using a service, with the drawback of having to pay.
One of the Text-to-Speech service is provied by Google. It's known to have a pretty good results. They also provide the API, makes it easy to integrate with your application. In this tutorial, I'm going to show you the basic example usages of Google Text-to-Speech API in Node.js, from the preparation until the code.
Preparation
1. Create or select a Google Cloud project
A Google Cloud project is required to use this service. Open Google Cloud console , then create a new project or select existing project
2. Enable billing for the project
Like other cloud platforms, Google requires you to enable billing for your project. If you haven't set up billing, open billing page .
3. Enable Google Text-to-Speech API
To use an API, you must enable it first. Open this page to enable Text-to-Speech API.
4. Set up service account for authentication
As for authentication, you need to create a new service account. Create a new one on the service account management page and download the credentials, or you can use your already created service account.
In your .env file, you have to add a new variable
The .env file should be loaded of course, so you need to use a module for reading .env such as dotenv .
Dependencies
This tutorial uses @google-cloud/text-to-speech . Add the following dependency to your package.json and run npm install
1. Synthesize Speech
The example below is a basic example of how to use speech synthesization. You need to provide the text to synthesize, audio encoding, and voice output configuration (optional). If successful, it will return audioContent on the response body. Then you can write it to a file.
2. List Voices
The example below is for getting the list of voices supported by Google Text-to-Speech service. You may need to run it to get the latest list.
Below is the list of supported voices at the time this post was written.
Language Code | Name | SSML Gender | Natural Sample Rate (Hz) |
es-ES | es-ES-Standard-A | FEMALE | 24000 |
it-IT | it-IT-Standard-A | FEMALE | 24000 |
ja-JP | ja-JP-Standard-A | FEMALE | 22050 |
ko-KR | ko-KR-Standard-A | FEMALE | 22050 |
pt-BR | pt-BR-Standard-A | FEMALE | 24000 |
tr-TR | tr-TR-Standard-A | FEMALE | 22050 |
sv-SE | sv-SE-Standard-A | FEMALE | 22050 |
nl-NL | nl-NL-Standard-A | FEMALE | 24000 |
en-US | en-US-Wavenet-D | MALE | 24000 |
de-DE | de-DE-Wavenet-A | FEMALE | 24000 |
de-DE | de-DE-Wavenet-B | MALE | 24000 |
de-DE | de-DE-Wavenet-C | FEMALE | 24000 |
de-DE | de-DE-Wavenet-D | MALE | 24000 |
en-AU | en-AU-Wavenet-A | FEMALE | 24000 |
en-AU | en-AU-Wavenet-B | MALE | 24000 |
en-AU | en-AU-Wavenet-C | FEMALE | 24000 |
en-AU | en-AU-Wavenet-D | MALE | 24000 |
en-GB | en-GB-Wavenet-A | FEMALE | 24000 |
en-GB | en-GB-Wavenet-B | MALE | 24000 |
en-GB | en-GB-Wavenet-C | FEMALE | 24000 |
en-GB | en-GB-Wavenet-D | MALE | 24000 |
en-US | en-US-Wavenet-A | MALE | 24000 |
en-US | en-US-Wavenet-B | MALE | 24000 |
en-US | en-US-Wavenet-C | FEMALE | 24000 |
en-US | en-US-Wavenet-E | FEMALE | 24000 |
en-US | en-US-Wavenet-F | FEMALE | 24000 |
fr-FR | fr-FR-Wavenet-A | FEMALE | 24000 |
fr-FR | fr-FR-Wavenet-B | MALE | 24000 |
fr-FR | fr-FR-Wavenet-C | FEMALE | 24000 |
fr-FR | fr-FR-Wavenet-D | MALE | 24000 |
it-IT | it-IT-Wavenet-A | FEMALE | 24000 |
ja-JP | ja-JP-Wavenet-A | FEMALE | 24000 |
nl-NL | nl-NL-Wavenet-A | FEMALE | 24000 |
en-GB | en-GB-Standard-A | FEMALE | 24000 |
en-GB | en-GB-Standard-B | MALE | 24000 |
en-GB | en-GB-Standard-C | FEMALE | 24000 |
en-GB | en-GB-Standard-D | MALE | 24000 |
en-US | en-US-Standard-B | MALE | 24000 |
en-US | en-US-Standard-C | FEMALE | 24000 |
en-US | en-US-Standard-D | MALE | 24000 |
en-US | en-US-Standard-E | FEMALE | 24000 |
de-DE | de-DE-Standard-A | FEMALE | 24000 |
de-DE | de-DE-Standard-B | MALE | 24000 |
en-AU | en-AU-Standard-A | FEMALE | 24000 |
en-AU | en-AU-Standard-B | MALE | 24000 |
en-AU | en-AU-Standard-C | FEMALE | 24000 |
en-AU | en-AU-Standard-D | MALE | 24000 |
fr-CA | fr-CA-Standard-A | FEMALE | 24000 |
fr-CA | fr-CA-Standard-B | MALE | 24000 |
fr-CA | fr-CA-Standard-C | FEMALE | 24000 |
fr-CA | fr-CA-Standard-D | MALE | 24000 |
fr-FR | fr-FR-Standard-A | FEMALE | 24000 |
fr-FR | fr-FR-Standard-B | MALE | 24000 |
fr-FR | fr-FR-Standard-C | FEMALE | 24000 |
fr-FR | fr-FR-Standard-D | MALE | 24000 |
That's all about how to use Google Text-to-Speech API in Node.js. Thank you for reading this post.
Ivan Andrianto
Ivan Andrianto is a software engineer and the founder of woolha.com. I create high-quality programming tutorials for free.
- Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
- Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
- OverflowAI GenAI features for Teams
- OverflowAPI Train & fine-tune LLMs
- Labs The future of collective knowledge sharing
- About the company Visit the blog
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Get early access and see previews of new features.
Google text-to-speech nodejs
I am trying to code a nodejs application that uses google tts api what my problem is, it returns an url to an audio. I need to be able to hear the text automatically without going to link and playing the audio.
- text-to-speech
- You need to provide more information about what you're looking for and what you've tried so far to get it. This isn't enough for anyone to give you an answer. – Paul Commented Nov 10, 2017 at 12:59
3 Answers 3
first, install mpv player then try this ==>
Just take the url and "play it" – it's a link to audio file. Example using play-sound :
The play-sound package works by executing an external player – see #options for a list. You can even specify another one with the player option. The player needs to support playing from https urls, obviously. I tried it with mpv and it works perfectly.
If you can't or don't want to use the external player, you'll need to fetch the audio, get the data buffer from response and play it somehow. So something along this way:
- I tried using your play sound code but i doesn't play the audio at the url and it doesn't give me any error. – Çağdaş Öksüztepe Commented Nov 10, 2017 at 14:28
- It did that to me when it couldn't find any of the listed players. What OS are you on? – helb Commented Nov 10, 2017 at 14:34
- Ubuntu 16.04. I am using node 7.5.0 and npm 4.1.2 – Çağdaş Öksüztepe Commented Nov 10, 2017 at 14:54
- And do you have any of the listed players installed? Please try running which mplayer afplay mpg123 mpg321 play omxplayer in terminal (and post the output here). – helb Commented Nov 10, 2017 at 14:57
- when i use Afplay, mplayer , mpg123, play, omxplayer, i get this error: ===> (node:7148) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): TypeError: player.play is not a function (node:7148) DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code. – Çağdaş Öksüztepe Commented Nov 13, 2017 at 6:09
Play directly to speakers in nodejs
- [Terminal] install play: sudo apt install sox
- [Terminal] install encoder: sudo apt install libsox-fmt-mp3
- [Terminal] install node-gtts : npm install node-gtts
- [IDE][speech.js] See code listing
- it would be better all in memory but the mp3 encoder lame is currently not installing on the current version of nodejs
If they get that fixed, then this code will work
Your Answer
Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more
Sign up or log in
Post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
Not the answer you're looking for? Browse other questions tagged javascript node.js text-to-speech or ask your own question .
- The Overflow Blog
- The hidden cost of speed
- The creator of Jenkins discusses CI/CD and balancing business with open source
- Featured on Meta
- Announcing a change to the data-dump process
- Bringing clarity to status tag usage on meta sites
- What does a new user need in a homepage experience on Stack Overflow?
- Feedback requested: How do you use tag hover descriptions for curating and do...
- Staging Ground Reviewer Motivation
Hot Network Questions
- Can you equip or unequip a weapon before or after a Bonus Action?
- An error in formula proposed by Riley et al to calculate the sample size
- Can I counter an opponent's attempt to counter my own spell?
- What would be a good weapon to use with size changing spell
- What is the nature of the relationship between language and thought?
- Is there a way to read lawyers arguments in various trials?
- How to change upward facing track lights 26 feet above living room?
- Why didn't Air Force Ones have camouflage?
- Gravitational potential energy of a water column
- What does an isolated dash mean in figured bass?
- Book about a wormhole found inside the Moon
- Nausea during high altitude cycling climbs
- What's the benefit or drawback of being Small?
- Is the 2024 Ukrainian invasion of the Kursk region the first time since WW2 Russia was invaded?
- Transform a list of rules into a list of function definitions
- Inductive and projective limit of circles
- Where is this railroad track as seen in Rocky II during the training montage?
- How to run only selected lines of a shell script?
- Did Babylon 4 actually do anything in the first shadow war?
- How to connect 20 plus external hard drives to a computer?
- Why is notation in logic so different from algebra?
- Is it helpful to use a thicker gage wire for part of a long circuit run that could have higher loads?
- Tiller use on takeoff
- Is a stable quantifier-free language really possible?
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.
googleapis/nodejs-text-to-speech
Folders and files.
Name | Name | |||
---|---|---|---|---|
474 Commits | ||||
Repository files navigation
THIS REPOSITORY IS DEPRECATED. ALL OF ITS CONTENT AND HISTORY HAS BEEN MOVED TO GOOGLE-CLOUD-NODE
Google Cloud Text-to-Speech: Node.js Client
Cloud Text-to-Speech API client for Node.js
A comprehensive list of changes in each version may be found in the CHANGELOG .
- Google Cloud Text-to-Speech Node.js Client API Reference
- Google Cloud Text-to-Speech Documentation
- github.com/googleapis/nodejs-text-to-speech
Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained .
Table of contents:
Before you begin
Installing the client library, using the client library, contributing.
- Select or create a Cloud Platform project .
- Enable billing for your project .
- Enable the Google Cloud Text-to-Speech API .
- Set up authentication with a service account so you can access the API from your local workstation.
Samples are in the samples/ directory. Each sample's README.md has instructions for running its sample.
Sample | Source Code | Try it |
---|---|---|
Audio Profile | ||
List Voices | ||
Quickstart | ||
Ssml Addresses | ||
Synthesize |
The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.
Supported Node.js Versions
Our client libraries follow the Node.js release schedule . Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.
Google's client libraries support legacy versions of Node.js runtimes on a best-efforts basis with the following warnings:
- Legacy versions are not tested in continuous integration.
- Some security patches and features cannot be backported.
- Dependencies cannot be kept up-to-date.
Client libraries targeting some end-of-life versions of Node.js are available, and can be installed through npm dist-tags . The dist-tags follow the naming convention legacy-(version) . For example, npm install @google-cloud/text-to-speech@legacy-8 installs client libraries for versions compatible with Node.js 8.
This library follows Semantic Versioning .
This library is considered to be stable . The code surface will not change in backwards-incompatible ways unless absolutely necessary (e.g. because of critical security issues) or with an extensive deprecation period. Issues and requests against stable libraries are addressed with the highest priority.
More Information: Google Cloud Platform Launch Stages
Contributions welcome! See the Contributing Guide .
Please note that this README.md , the samples/README.md , and a variety of configuration files in this repository (including .nycrc and tsconfig.json ) are generated from a central template. To edit one of these files, make an edit to its templates in directory .
Apache Version 2.0
See LICENSE
Code of conduct
Security policy, releases 48, contributors 34.
Google-text-to-speech Packages
Splits long texts with SSML tags by batches suitable for working with AWS Polly TTS and Google Cloud Text to Speech.
node-google-text-to-speech
Google TTS(Text-To-Speech) for node.js
google-text-to-speech
free text to speech with google translate
google-tts.js
A wrapper for the Google Text To Speech API with various features.
@sefinek/google-tts-api
Google TTS (Text-To-Speech) for Node.js.
talkify-tts
- 0 Dependencies
- 1 Dependents
- 47 Versions
A javascript text to speech (TTS) library. Originally from and used by https://talkify.net .
Give a voice to your website in a matter of minutes. Talkify library provides you with high quality text to speech (TTS) voices in many languages.
To use our backend services (our hosted voices) you will require an api-key. Visit our portal ( https://manage.talkify.net ) to create your own API-key, Talkify offers 1000 free requests per month.
Dependencies
Configuration.
- Form reader
Text selection reader
Installation.
Font Awesome 5+ (Used in Talkify Control Center)
Quick demos
- Web Reader http://jsfiddle.net/5atrbjc6/
- Form Reader http://jsfiddle.net/dx53bg6k/2/
- Text selection Reader http://jsfiddle.net/t5dbcL64/
- Enhanced text visibility http://jsfiddle.net/pwbqkzxj/2/
Include the scripts and stylesheets
Minified version, non-minified version, stylesheets.
You find our stylesheets under /styles folder. Include the stylesheets that you need (i.e. all under /modern-control-center for our "modern" UI).
Play all, top to bottom
Play simple text.
High qualiy voices ( https://manage.talkify.net/docs#voices )
Supported languages:
Text highligting for easy read along
Control pitch, pauses between words, volume, speech rate, phonation and much more
Download as mp3
Playback of entire website or paragraph/s of your choice
Fully integrated UI options
Read web forms aloud
Listen to selected text
Enhanced visibility features
When useSSML is active, Talkify will translate the following markup into SSML. This has the potential of creating a smoother voice experience.
HTML tags | SSML |
---|---|
h1 - h3 | emphasis strong |
b | emphasis strong |
strong | emphasis strong |
i | emphasis reduced |
em | emphasis strong |
br | break-strength strong |
Declarative settings
These settings are only supported by the TtsPlayer for now.
Talkify supports declarative settings. These settings will override general settings. The following attributes can be added to any element that Talkify is connected to. When these attributes are present, Talkify will use them as playback settings.
data-attribute | Accepted values | Example | Remarks |
---|---|---|---|
data-talkify-wordbreakms | [0, 10000] | data-talkify-wordbreakms="100" | |
data-talkify-pitch | [-5, 5] | data-talkify-pitch="-2" | |
data-talkify-rate | [-10, 10] | data-talkify-rate="-2" | |
data-talkify-voice | Any authorized voice | data-talkify-voice="David" | |
data-talkify-phonation | "soft", "normal" or "" | data-talkify-phonation="soft" | |
data-talkify-whisper | "true" or "false" | data-talkify-whisper="true" | |
data-talkify-whisper | "true" or "false" | data-talkify-whisper="true" | |
data-talkify-read-as-lowercase | "true" | data-talkify-read-as-lowercase="true" | Some voices spell out capital letters, which might be unwanted, this setting will read the content of the element as lower case |
WebReader demo
Talkify lives in its own namespace - talkify. Hence, everything below is scoped to that namespace (i.e. talkify.playlist, etc).
Auto scroll
Talkify provides an opt in auto scroll to the item to be played.
Activate the feature by calling talkify.autoScroll.activate()
| Method | | activate |
Playlist fluent builder
Playlist builder is Talkifys way to instantiate your playlist. It comes with a fluent API.
Entry point: talkify.playlist()
Method | Parameters | Default | Description | Mandatory |
---|---|---|---|---|
begin | Entry point. Call this to start building your playlist | Yes | ||
usingPlayer | TtsPlayer/Html5Player | Specify which player to be used. | Yes | |
withTextInteraction | Enables you to click on paragraphs (and other text) to play | No | ||
withElements | DOM elements | Specifies with elements to play. If omitted, Talkify will crawl the page and select for you | No | |
excludeElements | Array of DOM-elements | [] | For example: document.querySelectorAll("button") | No |
withTables | Table configuration, array of objects* | Reads tables in a more intuitive way. The relevant header is repeated before each cell | No | |
withRootSelector | string | 'body' | Sets the scope from where Talkify will start to crawl the page for text to play | No |
subscribeTo | Json object | Event subscriptions | No | |
build | Finalizes and creates the playlist instance | Yes |
*withTables parameter is an array of objects with the following properties:
- table (DOM-query selector or actual DOM-elements)
- headerCells (Optional. DOM-query selector or actual DOM-elements. Defaults to "th")
- bodyCells (Optional. DOM-query selector or actual DOM-elements. Defaults to "td")
withTables works with any standard HTML-table and other non-standard tabular content (for example bootstrap grid system). For non standard tabular content, please use the optional parameters to tell Talkify which elements are header cells and which are body cells.
This is the instance built from the playliste above.
Method | Parameters | Default | Description |
---|---|---|---|
getQueue | Returns the playlist queue | ||
play | Begins playback of playlist | ||
pause | Pauses playlist | ||
replayCurrent | Replays the current item in the playlist | ||
insert | DOM element | Inserts new html elements to play. Useful for elements that Talkify were unable to locate. Elements will be inserted in correct order with respect to the page. | |
isPlaying | True if any item is currently in a playing state | ||
setPlayer | TtsPlayer/Html5Player | Sets the player that the playlist is using | |
enableTextInteraction | Enables click to play on HTML elements | ||
disableTextInteraction | Disables click to play on HTML elements | ||
dispose | Clean up |
Playlist Events
Event |
---|
onEnded |
onVoiceCommandListeningStarted |
onVoiceCommandListeningEnded |
Player (valid for all players)
Method | Parameters | Default | Description |
---|---|---|---|
enableTextHighlighting | Tells the player to use text highlighting. For Html5Player this only works on localVoice. | ||
disableTextHighlighting | Turns off text highlighting. | ||
subscribeTo | Json object | Event listeners | |
playText | string | Plays a text | |
paused | True if paused | ||
isPlaying | True if playing | ||
play | Play | ||
pause | Pause | ||
forceVoice | object | For Talkify hosted voices, this is a JSON object with a name property. The value of name should be the name of a voice from /api/speech/v1/voices. For browser voices, this is the actual voice from window.speechSynthesis.getVoices() | |
enableEnhancedTextVisibility | Enables enhanced text visibility. Subtitle-bar, with a larger font-size, is added to the bottom of the screen. | ||
disableEnhancedTextVisibility | Disables enhanced text visibility |
Html5Player only
Entry point: talkify.Html5Player().
Method | Parameters | Default | Description |
---|---|---|---|
forceLanguage | string | Force the usage of a specific language. Use standard cultures like se-SE for Swedish and so on. Talkify will select a voice that matches the culture. | |
setRate | double | 1 | [0.0, 2.0] Playback rate. |
setVolume | double | 1 | [0.0 - 1.0 ] |
usePitch | double | 1 | [0.0, 2.0] Adjusts the pitch of the voice. |
Talkify hosted only
Entry point: talkify.TtsPlayer(options?).
constructor parameter "options" is optional. Example { controlcenter: { container: document.querySelector('p.selector') , name: 'modern' }}
Method | Parameters | Default | Description |
---|---|---|---|
setRate | int | 1 | Playback rate. A value between -5 and 5 |
whisper | Sets the player to whispering mode | ||
normalTone | Sets the player to normal mode (opposite of whispering) | ||
usePhonation | string | normal | Supports for two phonations. "soft" and "normal". Empty string translates to "normal". Case sensitive |
useWordBreak | int | 0 | [0-10000] Adds a break between each word. Any value above 0 adds to the voices standard break length. |
usePitch | int | 0 | [-10 - +10] Adjusts the pitch of the voice. |
useVolumeBaseline | double | 0 | [-10 - +10] Adjusts the volume baseline |
Player Events
Event |
---|
onBeforeItemPlaying |
onSentenceComplete |
onPause |
onPlay |
onResume |
onItemLoaded |
onTextHighligtChanged |
Example: talkify.formReader.addForm(document.getElementById("form-id"));
Method | Parameters | Default | Description |
---|---|---|---|
addForm | form element | None | Adds TTS functionality to the form. |
removeForm | form element | None | Unbinds all TTS functionality from the form |
This feature allows the user to select/mark text using the mouse and have that text read aloud.
Method | Parameters | Default | Description |
---|---|---|---|
activate | - | - | Call this method to actiate the feature |
deactivate | - | - | Call this method to deactivate the feature |
withTextHighlighting | - | - | Presets text highlighting to activated. Users can turn this off in the control center UI |
withEnhancedVisibility | - | - | Presets enhanced visibility to activated. Users can turn this off in the control center UI |
withVoice | voice object | { name: 'Zira' } | A voice object from our backend voice API or at the very least an object wih a name property including a valid voice name |
withButtonText | string | "Listen" | The text that appears on popover button |
excludeElements | Array of DOM-elements | [] | For example: document.querySelectorAll("button") |
React to events
TLDR; Example @ http://jsfiddle.net/andreas_hagsten/x6pve0jd/8/
Talkify provides two event models - PubSub and classic callbacks. The newest, and primary, model is the PubSub model. PubSub is a loosly coupled model which enables client applications to hook in to the Talkify pipeline. To subscribe to events you will need to pass a context key (used when unsubscribing) as well as the event type and the event handler function. The event type is a string containing topics. An event is normally divided into 4 topics - context, origin, type and action.
The Context topic
You would use this top level topic if you run multiple instances of Talkify. This allows you to hook into a specific Talkify instance. If you want to listen to all instances or only have one just specify "*". You will find the context ID in the property "correlationId" of your Player instance.
The Origin topic
Where the event originates from. For example "player" or "controlcenter". A common use case is to listen to player events which is done by specifying "player" in this topic section.
The type topic
Type of event. For example "tts" for TTS-based events.
The action topic
This is the topic that describes what action is taken. This can be "play", "loading", "pause" and so forth.
Putting all 4 topics together forms the event type to listen to. You can replace any part with the wildcard "*" which means that you listens to all events of the given topic.
A few examples can be seen below. A full list of events supported is listed Here .
PubSub events
Type | args (TBD) |
---|---|
{contextId}.player.tts.ratechanged | |
{contextId}.player.tts.seeked | |
{contextId}.player.tts.pause | |
{contextId}.player.tts.timeupdated | |
{contextId}.player.tts.play | |
{contextId}.player.tts.resume | |
{contextId}.player.tts.loading | |
{contextId}.player.tts.loaded | |
{contextId}.player.tts.ended | |
{contextId}.player.tts.voiceset | |
{contextId}.player.tts.texthighlight.enabled | |
{contextId}.player.tts.texthighlight.disabled | |
{contextId}.player.tts.prepareplay | |
{contextId}.player.tts.disposed | |
{contextId}.player.tts.error | |
{contextId}.player.tts.phonationchanged | |
{contextId}.player.tts.whisperchanged | |
{contextId}.player.tts.wordbreakchanged | |
{contextId}.player.tts.volumechanged | |
{contextId}.player.tts.pitchchanged | |
{contextId}.player.tts.created | |
{contextId}.player.tts.unplayable | |
{contextId}.player.tts.enhancedvisibilityset | |
{contextId}.player.tts.creating | |
- | - |
{contextId}.player.html5.ratechanged | |
{contextId}.player.html5.pause | |
{contextId}.player.html5.utterancecomplete | |
{contextId}.player.html5.ended | |
{contextId}.player.html5.loaded | |
{contextId}.player.html5.play | |
{contextId}.player.html5.timeupdated | |
{contextId}.player.html5.voiceset | |
{contextId}.player.html5.texthighlight.enabled | |
{contextId}.player.html5.texthighlight.disabled | |
{contextId}.player.html5.prepareplay | |
{contextId}.player.html5.created | |
{contextId}.player.html5.unplayable | |
{contextId}.player.html5.enhancedvisibilityset | |
{contextId}.player.html5.creating | |
- | - |
{contextId}.controlcenter.request.play | |
{contextId}.controlcenter.request.pause | |
{contextId}.controlcenter.request.rate | |
{contextId}.controlcenter.request.volume | |
{contextId}.controlcenter.request.pitch | |
{contextId}.controlcenter.request.wordbreak | |
{contextId}.controlcenter.request.phonation.normal | |
{contextId}.controlcenter.request.phonation.soft | |
{contextId}.controlcenter.request.phonation.whisper | |
{contextId}.controlcenter.request.texthighlightoggled | |
{contextId}.controlcenter.request.textinteractiontoggled | |
{contextId}.controlcenter.request.enhancedvisibility | |
{contextId}.controlcenter.attached | |
{contextId}.controlcenter.detached | |
- | - |
{contextId}.wordhighlighter.complete | |
- | - |
{contextId}.playlist.playing | |
{contextId}.playlist.loaded | |
{contextId}.playlist.textinteraction.enabled | |
{contextId}.playlist.textinteraction.disabled |
- text to speech
- speech synthesis
Package Sidebar
npm i talkify-tts
Git github.com/Hagsten/Talkify
github.com/Hagsten/Talkify#readme
Downloads Weekly Downloads
Unpacked size, total files, last publish, collaborators.
- Español – América Latina
- Português – Brasil
- Documentation
- Cloud Text-to-Speech API
Create voice audio files
Text-to-Speech allows you to convert words and sentences into base64 encoded audio data of natural human speech. You can then convert the audio data into a playable audio file like an MP3 by decoding the base64 data. The Text-to-Speech API accepts input as raw text or Speech Synthesis Markup Language (SSML) .
This document describes how to create an audio file from either text or SSML input using Text-to-Speech. You can also review the Text-to-Speech basics article if you are unfamiliar with concepts like speech synthesis or SSML.
These samples require that you have installed and initialized the Google Cloud CLI. For information about setting up the gcloud CLI, see Authenticate to TTS .
Convert text to synthetic voice audio
The following code samples demonstrate how to convert a string into audio data.
You can configure the output of speech synthesis in a variety of ways, including selecting a unique voice or modulating the output in pitch, volume, speaking rate, and sample rate .
Refer to the text:synthesize API endpoint for complete details.
To synthesize audio from text, make an HTTP POST request to the text:synthesize endpoint. In the body of your POST request, specify the type of voice to synthesize in the voice configuration section, specify the text to synthesize in the text field of the input section, and specify the type of audio to create in the audioConfig section.
The following code snippet sends a synthesis request to the text:synthesize endpoint and saves the results to a file named synthesize-text.txt . Replace PROJECT_ID with your project ID.
The Text-to-Speech API returns the synthesized audio as base64-encoded data contained in the JSON output. The JSON output in the synthesize-text.txt file looks similar to the following code snippet.
To decode the results from the Text-to-Speech API as an MP3 audio file, run the following command from the same directory as the synthesize-text.txt file.
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Go API reference documentation .
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Java API reference documentation .
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Node.js API reference documentation .
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries . For more information, see the Text-to-Speech Python API reference documentation .
Additional languages
C# : Please follow the C# setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for .NET.
PHP : Please follow the PHP setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for PHP.
Ruby : Please follow the Ruby setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for Ruby.
Convert SSML to synthetic voice audio
Using SSML in your audio synthesis request can produce audio that is more similar to natural human speech. Specifically, SSML gives you finer-grain control over how the audio output represents pauses in the speech or how the audio pronounces dates, times, acronyms, and abbreviations.
For more details on the SSML elements supported by Text-to-Speech API, see the SSML reference .
To synthesize audio from SSML, make an HTTP POST request to the text:synthesize endpoint. In the body of your POST request, specify the type of voice to synthesize in the voice configuration section, specify the SSML to synthesize in the ssml field of the input section, and specify the type of audio to create in the audioConfig section.
The following code snippet sends a synthesis request to the text:synthesize endpoint and saves the results to a file named synthesize-ssml.txt . Replace PROJECT_ID with your project ID.
The Text-to-Speech API returns the synthesized audio as base64-encoded data contained in the JSON output. The JSON output in the synthesize-ssml.txt file looks similar to the following code snippet.
To decode the results from the Text-to-Speech API as an MP3 audio file, run the following command from the same directory as the synthesize-ssml.txt file.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how Text-to-Speech performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-05 UTC.
IMAGES
VIDEO
COMMENTS
Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.
Cloud Speech Client Library for Node.js. Latest version: 6.7.0, last published: a month ago. Start using @google-cloud/speech in your project by running `npm i @google-cloud/speech`. There are 98 other projects in the npm registry using @google-cloud/speech.
Cloud Text-to-Speech API client for Node.js. A comprehensive list of changes in each version may be found in the CHANGELOG. Google Cloud Text-to-Speech Node.js Client API Reference
This page shows how to get started with the Cloud Client Libraries for the Text-to-Speech API. Client libraries make it easier to access Google Cloud APIs from a supported language. Although you can use Google Cloud APIs directly by making raw requests to the server, client libraries provide simplifications that significantly reduce the amount ...
The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported ...
tpu; trace-agent; translate; vertexai; video-intelligence; video-stitcher; video-transcoder; vision; vmmigration; vmwareengine; vpc-access; web-risk; web-security-scanner
Google Cloud Text-to-Speech API allows developers to include natural-sounding, synthetic human speech as playable audio in their applications. The Text-to-Speech API converts text or Speech Synthesis Markup Language (SSML) input into audio data like MP3 or LINEAR16 (the encoding used in WAV files). In this codelab, you will focus on using the ...
The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Versioning. This library follows Semantic Versioning.. This library is considered to be in alpha.This means it is still a work-in-progress and under active development.
Install the Google Cloud Speech-to-Text API client library for Node.js First, create a project that you will use to run this Speech-to-Text API lab, initialize a new Node.js package in a folder of your choice: npm init NPM asks several questions about the project configuration, such as name and version. ... npm install --save @google-cloud/speech
The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported ...
The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js.. Client libraries targeting some end-of-life versions of Node.js are available, and can be installed via npm dist-tags.
Google TTS (Text-To-Speech) for node.js. Contribute to zlargon/google-tts development by creating an account on GitHub. Skip to content. Navigation Menu ... $ npm install --save google-tts-api $ npm install -D typescript @types/node # Only for TypeScript. Change Log. Please see CHANGELOG. Usage. Method Options (all optional)
The Google Cloud Text-to-Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported ...
google-tts. text-to-speech. Google TTS (Text-To-Speech) for node.js. Latest version: 2.0.2, last published: 3 years ago. Start using google-tts-api in your project by running `npm i google-tts-api`. There are 83 other projects in the npm registry using google-tts-api.
The so called text-to-speech technology allows you to do so. Developing your own text-to-speech technology takes a long time and it's not an easy thing. Therefore, the easiest solution is using a service, with the drawback of having to pay. One of the Text-to-Speech service is provied by Google. It's known to have a pretty good results.
Play directly to speakers in nodejs Install [Terminal] install play: sudo apt install sox [Terminal] install encoder: sudo apt install libsox-fmt-mp3 [Terminal] install node-gtts: npm install node-gtts [IDE][speech.js] See code listing
Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.
The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, as well as the latest LeMUR models. AssemblyAI. Speech-to-text. Transcription. Audio.
The Cloud Speech Node.js Client API Reference documentation also contains samples.. Supported Node.js Versions. Our client libraries follow the Node.js release schedule.Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.
Convert text into natural-sounding speech using an API powered by the best of Google's AI technologies. New customers get up to $300 in free credits to try Text-to-Speech and other Google Cloud products. Try Text-to-Speech free Contact sales. Improve customer interactions with intelligent, lifelike responses.
A wrapper for the Google Text To Speech API with various features. google-tts text-to-speech google-text-to-speech google texttospeech googletexttospeech 0.0.2 • Published 1 year ago
A JavaScript text to speech (TTS) library. Provides you with high quality TTS voices in many languages and a high quality language. These voices and engines runs on a Talkify hosted server. This lib also supports browser built in voices via the SpeechSynt. Latest version: 4.0.0, last published: 10 months ago. Start using talkify-tts in your project by running `npm i talkify-tts`. There is 1 ...
Convert SSML to synthetic voice audio. Text-to-Speech allows you to convert words and sentences into base64 encoded audio data of natural human speech. You can then convert the audio data into a playable audio file like an MP3 by decoding the base64 data. The Text-to-Speech API accepts input as raw text or Speech Synthesis Markup Language (SSML).