Microsoft Cognitive Services provides a powerful Speech-to-Text API that allows you to easily convert audio files into text. In this blog, we’ll walk through the steps to use this API with Node.js to transcribe speech from an audio file.
Step-by-Step Guide
1. Prerequisites
Before starting, ensure you have:
- Node.js installed on your system (available at Node.js Official Website).
- A Microsoft Azure Subscription. If you don’t have one, sign up here.
- Microsoft Cognitive Services Speech API resource created in Azure.
2. Creating a Cognitive Services Resource in Azure
To use the Speech-to-Text API, you first need to create a Speech service on Azure. Here’s how:
Sign in to the Azure Portal:
- Go to Azure Portal and log in with your Microsoft account.
Create a new Speech Service:
- In the Azure Portal, search for “Speech” under the AI + Machine Learning category.
- Click on Create and select Speech under the “Language service” section.
- Fill in the necessary details:
Subscription: Select your subscription (you can create a free trial if needed).
Resource Group: Choose or create a new resource group.
Region: Select a region (e.g., East US).
Name: Name your Speech service (e.g., “MySpeechService”). - Once all details are filled out, click Review + Create and then Create.
Get Your Subscription Key and Region:
- After the Speech service is created, navigate to the Keys and Endpoint section in your resource.
- Copy Key 1 (your subscription key) and the Region (e.g., eastus). You’ll need these to authenticate your requests.
3. Install Required Node.js Packages
To work with the Cognitive Services Speech API, you need to install the Microsoft Cognitive Services Speech SDK.
Install it by running the following command in your Node.js project:
npm install microsoft-cognitiveservices-speech-sdk
4. Code Walkthrough
Now, let’s go through the code. The code below shows how to transcribe an audio file to text.
const fs = require('fs'); // Importing file system module to read audio files
const sdk = require("microsoft-cognitiveservices-speech-sdk"); // Importing the SDK for Speech API
// Replace with your subscription key and region
const speechKey = "YOUR_SUBSCRIPTION_KEY";
const speechRegion = "YOUR_REGION";
// Setting up the Speech Config with your subscription key and region
const speechConfig = sdk.SpeechConfig.fromSubscription(speechKey, speechRegion);
speechConfig.speechRecognitionLanguage = "en-US"; // Setting the language for speech recognition
// Configuring audio input (you need to replace with the path to your own audio file)
const audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync("C:\\path\\to\\your\\audio.wav"));
// Initializing the Speech Recognizer with the speech configuration and audio input
const speechRecognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
// Event handler that gets triggered when speech is recognized
speechRecognizer.recognized = (s, e) => {
if (e.result.reason === sdk.ResultReason.RecognizedSpeech) {
console.log("Recognized: " + e.result.text); // Output the recognized text
} else {
console.log("Speech could not be recognized.");
}
};
// Start continuous recognition to transcribe the audio
speechRecognizer.startContinuousRecognitionAsync(); 5. Running the Code
To run the application, simply execute the script with Node.js:
node speechToText.js
Conclusion
With the steps above, you can easily set up speech-to-text functionality using Microsoft Cognitive Services. This setup is perfect for integrating voice recognition features into your applications.
Happy coding!
About the Author
Harika Puppala
SDE II @ Microsoft | Ex – Paypal, ADP | Node.js | JavaScript | JAVA | Kafka | AWS | GCP | SQL | NoSQL | Express | Database | Microservices | Distributed Systems | System Design
Puppala, H (2025). Getting Started with Speech-to-Text Using Microsoft Cognitive Services. Available at: Getting Started with Speech-to-Text Using Microsoft Cognitive Services | by Harika Puppala | Medium [Accessed: 7th August 2025].