Introduction to speech translation using Azure Cognitive Speech Services

Introduction to speech translation using Azure Cognitive Speech Services

Microsoft Azure contains many various services. One of them is a service that allows speech translation, which is part of Cognitive Services. How does it work? What is it about? How to use it in your application? In this blog post, I will try to answer these questions and show you the example of translating speech to text from microphone input.

Speech translation

Speech translation is the process by which conversational spoken phrases are immediately translated and then spoken aloud in a second language. This technology enables conversations between people using different languages what can be especially useful in intercultural exchange, research or global business interactions. I am done with these formulas, let’s get down to more technical things.

Speech translation on Azure

Azure Cognitive Services Speech Translation includes services based on machine learning and artificial intelligence. They primarily focus on real-time multilingual speech translation, enabling developers to add end-to-end real-time speech translations to their applications or service. This can be used both ways – you can stream speech audio, or you can send a text to a service and receive a stream of results. These results include the recognized text in the source language and its translation in the target language.

Create an Azure Speech Resource

We can now move to Microsoft Azure. First, make sure you have Azure subscription. If it’s ok you can start creating a resource. In the Marketplace box, type speech and select the first item shown below:

The next step is to set the site such as its name, choosing a subscription or pricing tier. As for the pricing tier, I recommend the free option, it will be enough for testing. After filling in all the fields, click the ‘Create’ button. After a few minutes you should see something similar in the notifications:

Now you can go to your resource and see all site settings. An important tab is ‘Keys and endpoint’ because you will need a subscription key and region’s name to configure the service in your application:

Create the application

In order to quickly test the service, we will create a console application in .NET Core. After creating a project and downloading the following library from NuGet Packages:

you can proceed to implementation. First, paste the following libraries:

Now you can move on to creating the function responsible for speech recognition and translation. Of course, you should create an asynchronous method for this. First, we create an instance of a speech translation whose arguments are the previously mentioned: subscription key and service region. Additionally, the speech-language and the target language should be set. At last, you can go to create a translation recognizer using the default microphone audio input device.

Let’s see how it works

The service works great and can be used in real systems. The possibilities of Microsoft Azure surprise me every time. If you want to explore this topic further, I encourage you to check the official documentation of that.

Scroll to top