by Alessandro Ebersol (Agent Smith)
Speech recognition is one of the most interesting aspects of computing. Great strides have been made in recent years. These advances have been passed on to personal computing. That is, what was achieved with Research on voice recognition technology, reached the home users.
Speech recognition is a feature that can be used to improve accessibility, or even when you're tired of typing very large texts. Yes, being able to rest your hands after entering many texts is extremely comfortable.
Users of other operating systems can enjoy voice recognition easily. In Windows there is Dragon Naturally Speaking. On Mac, Dictanote and WordQ SpeakQ. But what about Linux?
In Linux, we have several mechanisms of speech recognition. However, usually they are extremely complicated to make work. Generally, these are solutions that need pocketsphinx, which is a speech-recognition and text-to-speech engine.
But do not be afraid, because voice recognition technology has come a long way thanks to Google (now the Alphabet Company), and we, PCLinuxOS users, can take advantage of these benefits from the largest Internet search engine.
The Google Voice Voice Recognition Infrastructure, Google cloud voice API
Google has unveiled its new Cloud Speech API - announced at the NEXT event in San Francisco - for a limited preview for developers in 2016. This speech recognition technology has been developed and has already been used by various Google products for some time, such as Google's search engine, where you have the option to do voice search.
The ability to convert speech to text is based on deep neural networks, and state-of-the-art machine learning algorithms have recently proven to be particularly effective for detecting patterns in audio and video signals. The neural network is updated as new speech samples are collected by Google, so that new terms are learned and recognition accuracy continues to increase. This technology from Google is what powers Ok Google, Android, Google Chrome and Google Virtual Assistant. With its cloud-based speech recognition technology, Google intends to rival Nuance, maker of the old Windows Dragon Dictate (now Dragon Naturally Speaking).
Voice recognition in the cloud
Speech-to-text features are used in a variety of cases, including smart wizards controlled by voice on mobile devices, home automation, audio transcription, and automatic classification of phone calls.
Now that this technology will be accessible as a developer's cloud service, it will allow any application to integrate speech recognition into text, representing a valuable alternative to the common Nuance technology (used by Apple's Siri and Samsung's S-Voice, for example) and challenging other solutions such as the IBM Watson Speech to Text and the Microsoft Bing Speech API.
Now, how do we benefit from Google Voice Cloud Recognition?
It is possible to enjoy the advantages that Google integrates with its products, since several sites offer the service of voice recognition, through the Google cloud voice API.
To take advantage of these services, you will need an internet connection, a microphone attached to your computer, and use the Google Chrome browser. The following is the analysis of some of these services.
Speechnotes is a powerful, speech-enabled online notepad designed to empower your ideas by implementing a clean and efficient design so you can focus on your thoughts. The website staff strives to provide the best online dictation tool, involving state-of-the-art speech recognition technology for the most accurate results technology can achieve today, as well as incorporating internal (automatic or manual) tools to increase efficiency, productivity and user comfort. It works totally online on your Chrome browser. No download, no installation and no registration required, you can start working immediately.
My rating: It supports Brazilian Portuguese and more than 40 languages, saves to Google drive, and if you subscribe to the service of the company, you can access your documents from any machine logged into your account. I've used it, and voice recognition is fast and accurate. It works on Google Chrome and with cell phones too.
SpeechTexter is a professional online speech to text converter designed to simplify and speed up your work. The company wants to provide the best online transcription experience using innovative speech recognition technology with high precision results. This technology is only supported by the Chrome browser for computers. Other browsers have not yet implemented speech recognition. No registration is required, so you can get to work immediately.
My rating: Supports Brazilian Portuguese and more than 40 languages, saves as TXT, prints or copies to the clipboard. It has no registration, so that means it can only be used on one machine at a time. In my tests, it was a bit slower than Speechnotes.
This utility is a text to speech/speech to text service. What does that mean? This means that you can simply dictate text, and it will be written without lifting a finger. On the other hand, you can type text and the service provided by site will read it loud.
You have the following features:
- Free and online
- No download, installation or registration. Multiplatform
- Supports only 7 languages (and Brazilian Portuguese is not one of them)
- You can pause or stop dictation (it saves the position of the last word)
- Recognizes voice commands to enter punctuation: for example, say "Comma" and type ","
- Smart capitalization
- You can save, copy, print, or send dictation text.
My rating: Since it does not support Brazilian Portuguese, I have not tested it. But, it seems to be the service with fewer resources among those evaluated.
Dictation.io can recognize and transcribe popular languages, including English, Spanish, French, Italian, Portuguese, and more. The full list of supported languages can be checked here: https://dictation.io/languages You can add new paragraphs, punctuation marks, smileys and other special characters using simple voice commands. For example, say "New line" to move the cursor to the next list or say "Smiley Face" to enter smiley :-). It has a list of supported commands.
My rating: Allows to perform several actions with the text that was recognized: Copy to the clipboard, save as .TXT, send as Tweet, send by email, print or export as PDF, or have the computer dictate the text you dictated previously. In terms of speed, it resembles Speechnotes.
Thus, there are even more options than these four analyzed, but would require more research time for their analyzes. What matters is that if you need to dictate lengthy texts, PCLinuxOS can help you accomplish the task. Regarding speech recognition, Linux is no longer a second-class citizen.
I hope you enjoyed it, and, this text was produced with the help of Speechnotes and, edited in LibreOffice later.
Enjoy these new capabilities, and see you next month! Look ma! I entered text with no hands!