Microsoft Cognitive Services provides a powerful Speech-to-Text API that allows you to easily convert audio files into text. In this blog, we’ll walk through the steps to use this API with Node.js to transcribe speech from an audio file.

Step-by-Step Guide

1. Prerequisites

Before starting, ensure you have:

Node.js installed on your system (available at Node.js Official Website).
A Microsoft Azure Subscription. If you don’t have one, sign up here.
Microsoft Cognitive Services Speech API resource created in Azure.

2. Creating a Cognitive Services Resource in Azure

To use the Speech-to-Text API, you first need to create a Speech service on Azure. Here’s how:

Sign in to the Azure Portal:

Go to Azure Portal and log in with your Microsoft account.

Create a new Speech Service:

In the Azure Portal, search for “Speech” under the AI + Machine Learning category.
Click on Create and select Speech under the “Language service” section.
Fill in the necessary details:
Subscription: Select your subscription (you can create a free trial if needed).
Resource Group: Choose or create a new resource group.
Region: Select a region (e.g., East US).
Name: Name your Speech service (e.g., “MySpeechService”).
Once all details are filled out, click Review + Create and then Create.

Get Your Subscription Key and Region:

After the Speech service is created, navigate to the Keys and Endpoint section in your resource.
Copy Key 1 (your subscription key) and the Region (e.g., eastus). You’ll need these to authenticate your requests.

3. Install Required Node.js Packages

To work with the Cognitive Services Speech API, you need to install the Microsoft Cognitive Services Speech SDK.

Install it by running the following command in your Node.js project:

npm install microsoft-cognitiveservices-speech-sdk

4. Code Walkthrough

Now, let’s go through the code. The code below shows how to transcribe an audio file to text.

const fs = require('fs');  // Importing file system module to read audio files
const sdk = require("microsoft-cognitiveservices-speech-sdk");  // Importing the SDK for Speech API

// Replace with your subscription key and region
const speechKey = "YOUR_SUBSCRIPTION_KEY";
const speechRegion = "YOUR_REGION";

// Setting up the Speech Config with your subscription key and region
const speechConfig = sdk.SpeechConfig.fromSubscription(speechKey, speechRegion);
speechConfig.speechRecognitionLanguage = "en-US";  // Setting the language for speech recognition

// Configuring audio input (you need to replace with the path to your own audio file)
const audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync("C:\\path\\to\\your\\audio.wav"));

// Initializing the Speech Recognizer with the speech configuration and audio input
const speechRecognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);

// Event handler that gets triggered when speech is recognized
speechRecognizer.recognized = (s, e) => {
  if (e.result.reason === sdk.ResultReason.RecognizedSpeech) {
    console.log("Recognized: " + e.result.text);  // Output the recognized text
  } else {
    console.log("Speech could not be recognized.");
  }
};

// Start continuous recognition to transcribe the audio
speechRecognizer.startContinuousRecognitionAsync();

5. Running the Code

To run the application, simply execute the script with Node.js:

node speechToText.js

Conclusion

With the steps above, you can easily set up speech-to-text functionality using Microsoft Cognitive Services. This setup is perfect for integrating voice recognition features into your applications.

Happy coding!

About the Author

Harika Puppala

Reference:

Puppala, H (2025). Getting Started with Speech-to-Text Using Microsoft Cognitive Services. Available at: Getting Started with Speech-to-Text Using Microsoft Cognitive Services | by Harika Puppala | Medium [Accessed: 7th August 2025].

Share this on...

Keep up, Get ahead

You’re almost there…

Getting Started with Speech-to-Text Using Microsoft Cognitive Services

Microsoft Cognitive Services provides a powerful Speech-to-Text API that allows you to easily convert audio files into text. In this blog, we’ll walk through the steps to use this API with Node.js to transcribe speech from an audio file.

Step-by-Step Guide

1. Prerequisites

2. Creating a Cognitive Services Resource in Azure

3. Install Required Node.js Packages

4. Code Walkthrough

5. Running the Code

Conclusion

Harika Puppala

You might also like ...

Add more Smarts to your bot:Detecting emotions from giphy posts

AI in Machine Learning for Fraud Detection and Risk Management

IoT Explained

Recent Posts

Rate This Post

Join our Mailing List!

Resource Centre Login - Content

Resource Centre Login - Content

Email Updates Signup

STAY UP TO DATE - JOIN OUR MAILING LIST

Super Early Bird Sale Ends Soon
	,		,		,

Keep up, Get ahead

You’re almost there…

Microsoft Cognitive Services provides a powerful Speech-to-Text API that allows you to easily convert audio files into text. In this blog, we’ll walk through the steps to use this API with Node.js to transcribe speech from an audio file.

Step-by-Step Guide

1. Prerequisites

2. Creating a Cognitive Services Resource in Azure

3. Install Required Node.js Packages

4. Code Walkthrough

5. Running the Code

Conclusion

Harika Puppala

You might also like ...

Add more Smarts to your bot:Detecting emotions from giphy posts

AI in Machine Learning for Fraud Detection and Risk Management

IoT Explained

Trending Posts

Recent Posts

Rate This Post

Join our Mailing List!

Resource Centre Login - Content

Resource Centre Login - Content

STAY UP TO DATE - JOIN OUR MAILING LIST