{"id":483,"date":"2018-09-25T05:27:06","date_gmt":"2018-09-25T09:27:06","guid":{"rendered":"https:\/\/langa.com\/?p=483"},"modified":"2018-09-24T15:00:20","modified_gmt":"2018-09-24T19:00:20","slug":"native-voice-to-text-can-you-here-mi-noun","status":"publish","type":"post","link":"https:\/\/langa.com\/index.php\/2018\/09\/25\/native-voice-to-text-can-you-here-mi-noun\/","title":{"rendered":"Native voice-to-text: Can you here mi noun?"},"content":{"rendered":"<p>I&#8217;m again experimenting with voice-to-text transcription. I keep hoping for that Star Trek experience &#8212; you know, where the characters whap their comm badge, speak with normal speed and diction, and have the computer fully understand them.<\/p>\n<p>But for me, that&#8217;s truly fiction. In real life, with free-form text, voice to-text accuracy is likely to range from <i>meh<\/i> to <i>awful<\/i>.<\/p>\n<p>For example, I recently used voice-to-text in a jokey email to my wife in which I said:<br \/>\n<strong>\u201cYou may have missed it, lucky girl.\u201d<\/strong><\/p>\n<p>Voice to text rendered it as:<br \/>\n<strong>\u201cYou may have missed it, like a cow.\u201d<\/strong><\/p>\n<p>I\u2019m glad I proofread that before I sent it!<\/p>\n<p>(I bet my wife is, too!)<\/p>\n<p>That kind of not-even-close transcription is all too common with voice to text. And even when the errors are less severe, they still require much back-pedaling, correction, and revision before your text is ready for anything but the most casual of uses.<\/p>\n<p>On more than one occasion, I\u2019ve dictated a paragraph or two into a voice-to-text application, but when I later tried to read the transcript, so many words were mangled that I couldn&#8217;t figure out what I was originally trying to say!<\/p>\n<p>Still, it had been a while since I explored voice to text, and with all the current activity (OK Google, Siri, Alexa, Bixby, etc.,&#8230;) I thought it was time for a fresh look at the kinds of native text-to-speech being built into today&#8217;s devices and operating systems.<\/p>\n<p>I wasn\u2019t so much interested in exploring the pre-scripted commands that those apps can respond to &#8212; with a context-limited universe of words to choose from, most of these applications do pretty well.<\/p>\n<p>You can, for example, say to an Android phone, \u201cOK Google, navigate from here to Boston Common by foot.\u201d Google will open Maps, find the best walking route from your current location to Boston Common (or wherever you specify), and launch turn-by-turn directions.<\/p>\n<p>That\u2019s cool, but it\u2019s not free-form, flowing, natural speech. True natural speech to text transcription is a whole \u2018nother ballgame.<\/p>\n<p>Want to see some real-life examples? Below, you\u2019ll find four transcripts of voice-to-text output, as produced by four different combinations of hardware and software.<\/p>\n<p>They&#8217;re all versions of the following sample paragraph that I wrote manually &#8212; fingers on keyboard. The sample paragraph also explains more of the setup:<\/p>\n<p style=\"padding-left: 30px;\"><em>First, I\u2019ll read this paragraph using Google\u2019s \u201cvoice typing\u201d (the voice-to-text option built into the Gboard keyboard on my Android Samsung S8 phone). I\u2019ll read it using the phone\u2019s built-in mic, and then again using a brand new Bluetooth 4.1 headset optimized for dictation (condenser microphone on a boom). Next, I\u2019ll read this paragraph using Windows 10\u2019s built-in dictation function (<strong>Windows Key+H<\/strong>) both with my PC&#8217;s built-in microphone, and with the same Bluetooth headset used for the Android versions.<\/em><\/p>\n<p>But before I show you the results, let me stipulate that <span style=\"text-decoration: underline;\"><em>none<\/em><\/span> of these transcription apps handle punctuation very well. The Android app is supposed to recognize punctuation such as \u201copen parentheses\u201d and \u201cclose parentheses,\u201d but it&#8217;s spotty at best. Windows 10 can understand that you mean a \u201c<strong><big>;<\/big><\/strong>\u201d when you say the word \u201csemicolon,\u201d but it falls down in areas such as recognizing that you want to start a new paragraph when you say \u201cnew paragraph.\u201d (It inserts the phrase \u201cnew paragraph\u201d into the body of your text.)<\/p>\n<p>The poor handling of punctuation is a real problem unless you&#8217;re producing very simple texts. But it\u2019s a whole different kind of trouble than, say, mistaking \u201clucky girl\u201d for \u201clike a cow!\u201d<\/p>\n<p>Now, back to the results: Up first, the nearly-perfect output of Google\u2019s Gboard voice typing on a Galaxy S8, with a good-quality dictation headset connected via Bluetooth 4.1; with the phone connected to the internet via my office 5GHz Wifi router.<\/p>\n<p>As you can see, the punctuation and capitalization are funky, but the words themselves are OK.<\/p>\n<p style=\"padding-left: 30px;\"><em>First I&#8217;ll read this paragraph using Google&#8217;s voice typing The Voice to Text option built into the gboard keyboard on my Android Samsung S8 phone. I&#8217;ll read it using the phone&#8217;s built-in mic, and then again using a brand new Bluetooth 4.1 headset optimized for dictation condenser microphone on a boom. Next, I&#8217;ll read this paragraph using Windows 10 built-in dictation function Windows key + H both with my PCS built-in microphone, and with the same Bluetooth headset used for the Android versions.<\/em><\/p>\n<p>Next up: A surprise &#8212; I also got the same nearly perfect results using the phone\u2019s standard, built-in mic instead of the separate headset!<\/p>\n<p style=\"padding-left: 30px;\"><em>First, I&#8217;ll read this paragraph using Google&#8217;s voice typing The Voice to Text option built into the gboard keyboard on my Android Samsung S8 phone. I&#8217;ll read it using the phone&#8217;s built-in mic, and then again using a brand new Bluetooth 4.1 headset optimized for dictation condenser microphone on a boom. Next, I&#8217;ll read this paragraph using Windows 10 built-in dictation function Windows &#8211; key + H both with my PCS built-in microphone and with the same Bluetooth headset used for the Android versions.<\/em><\/p>\n<p>Next, in the third example, you\u2019ll see that Windows 10\u2019s built-in dictation function (<strong>Windows Key+H<\/strong>) didn\u2019t do as well. The PC itself shouldn\u2019t have been a problem&#8212; it\u2019s an SSD-based 64-bit, 2.4GHz Core i7, with a wired Ethernet connection to my office router. But here\u2019s the result of reading the above sample paragraph, using the PC&#8217;s built-in microphone. I&#8217;ve highlighted the worst non-punctuation\/verbiage errors in red.<\/p>\n<p style=\"padding-left: 30px;\"><em>First, I&#8217;ll read this paragraph using Google Voice typing Voice to text option built into the <span style=\"color: #ff0000;\"><strong>Jeep board<\/strong><\/span> keyboard on my Samsung S 8 phone. I&#8217;ll read it using the phone&#8217;s built in mic, and then again using a brand new Bluetooth 4.1 headset optimized for dictation condenser microphone on a boom. next, I&#8217;ll read this paragraph using Windows 10 built in dictation function <strong><span style=\"color: #ff0000;\">windows key<\/span><\/strong> plus H both with my PC&#8217;s built in microphone, and with the same <strong><span style=\"color: #ff0000;\">blue tooth<\/span><\/strong> headset use for Android versions.<\/em><\/p>\n<p>I can understand mistaking <strong>Gboard<\/strong> for <strong>Jeep board<\/strong>, I guess, but shouldn&#8217;t Windows recognize the phrase \u201c<strong>Windows Key<\/strong>?\u201d And how can it not know <strong>Bluetooth<\/strong>?<\/p>\n<p>The fourth and last test was very odd: Using my dictation headset (same as above) actually made Windows 10\u2019s built-in dictation function a little <strong><span style=\"text-decoration: underline;\"><em>worse<\/em><\/span><\/strong>! I have no idea why, because using a good headset usually improves voice recognition. Not this time.<\/p>\n<p style=\"padding-left: 30px;\"><em>First, <span style=\"color: #ff0000;\"><strong>out<\/strong><\/span> read this paragraph using Google&#8217;s voice typing voice to text option built into the<span style=\"color: #ff0000;\"><strong> Jeep board<\/strong><\/span> keyboard on my Android Samsung S 8 phone. I&#8217;ll read it using the phone&#8217;s built in mic, and then again using a brand new Bluetooth 4.1 headset optimized for dictation condenser microphone on a boom. Next, I&#8217;ll read this paragraph using windows tens built in dictation function <span style=\"color: #ff0000;\"><strong>windows key<\/strong><\/span> plus H both with my <span style=\"color: #ff0000;\"><strong>kisise<\/strong><\/span> built in microphone and with the same <span style=\"color: #ff0000;\"><strong>blue tooth<\/strong><\/span> headset used for Android version.<\/em><\/p>\n<p>Along with the same errors mentioned above, this iteration mistook \u201c<strong>I\u2019ll<\/strong>\u201d for \u201c<strong>out<\/strong>\u201d and turned \u201c<strong>my PC\u2019s<\/strong>\u201d into \u201c<strong>kisise.<\/strong>\u201d That last is truly baffling: It\u2019s not even an English word, and makes at least that part of the sentence wholly unintelligible.<\/p>\n<p>That many fundamental verbiage errors, along with a dozen or so punctuation and capitalization errors is, to me, unacceptable in so brief a paragraph (just 83 words). For me, it&#8217;s still much faster and cleaner to type.<\/p>\n<p>I love the convenience of voice-to-text (especially when there\u2019s no keyboard, or only a virtual\/on-screen keyboard available). And voice to text can work quite well indeed if you stick to scripted commands (words and phrases that fall within the context of what the software is already expecting), and use careful diction. But that\u2019s not normal speech.<\/p>\n<p>For natural, freeform, floating speech, voice to text still isn&#8217;t quite ready for prime time.<\/p>\n<p>Those Star Trek comm badges are going to have to wait<span style=\"background-color: #ffffff; color: #006000;\">!<\/span><\/p>\n<hr \/>\n<p><em>Permalink: <a href=\"https:\/\/wp.me\/paaiox-7N\">https:\/\/wp.me\/paaiox-7N<\/a><\/em><\/p>\n<p><em>Ask me anything! Click the <a href=\"https:\/\/langa.com\/index.php\/contact\/\">CONTACT<\/a> link on any page.<\/em><\/p>\n<p><em>Share this item via the links below:<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m again experimenting with voice-to-text transcription. I keep hoping for that Star Trek experience &#8212; you know, where the characters whap their comm badge, speak with normal speed and diction, and have the computer fully understand them. But for me, that&#8217;s truly fiction. In real life, with free-form text, voice to-text accuracy is likely to&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[6,10,12,5,19],"tags":[],"class_list":["post-483","post","type-post","status-publish","format-standard","hentry","category-long-form","category-science-and-tech","category-smartphones","category-windows","category-writing"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paaiox-7N","jetpack-related-posts":[{"id":394,"url":"https:\/\/langa.com\/index.php\/2018\/09\/17\/september-2018-site-update\/","url_meta":{"origin":483,"position":0},"title":"September 2018 Site Update","author":"Fred Langa","date":"2018-09-17","format":false,"excerpt":"OK! The test drive has been working fine --- I've been posting a mix of content types via immediate- and delayed-action posts, and from desktop and mobile apps. Everything seems to work more or less as it should. Finally! (I'd still like to tune the site appearance, but that's far\u2026","rel":"","context":"In &quot;Langa.Com Site News&quot;","block_context":{"text":"Langa.Com Site News","link":"https:\/\/langa.com\/index.php\/category\/langa-com-news\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/09\/sitestats-country.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1012,"url":"https:\/\/langa.com\/index.php\/2018\/11\/13\/interesting-site-freelivetranscript\/","url_meta":{"origin":483,"position":1},"title":"Interesting site: FreeLiveTranscript","author":"Fred Langa","date":"2018-11-13","format":false,"excerpt":"FreeLiveTranscript.com is an open-sourced, browser-based, speech-to-text application that creates \"live transcripts of speech on the web, that can be displayed (and edited) in real-time on a big screen, or watched on anybody's personal device.\" It's not really meant for personal speech-to-text\/dictation --- you can use it for that, but there\u2026","rel":"","context":"In &quot;Interesting Site&quot;","block_context":{"text":"Interesting Site","link":"https:\/\/langa.com\/index.php\/category\/interesting-site\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/11\/freelivetranscript.jpg?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/11\/freelivetranscript.jpg?resize=350%2C200 1x, https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/11\/freelivetranscript.jpg?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/11\/freelivetranscript.jpg?resize=700%2C400 2x"},"classes":[]},{"id":3621,"url":"https:\/\/langa.com\/index.php\/2019\/10\/25\/can-i-recover-lost-internet-files\/","url_meta":{"origin":483,"position":2},"title":"&#8220;Can I recover lost internet files?&#8221;","author":"Fred Langa","date":"2019-10-25","format":false,"excerpt":"(Answer requested by Rodrigo Melgar) Rodrigo's full question: \"How can I recover lost internet files? I want to recover Steam reviews I deleted a while ago (I only managed to retrieve a couple using Wayback Machine). Bear in mind it\u2019s only raw text, which used to be under the link\u2026","rel":"","context":"In &quot;A reader asks...&quot;","block_context":{"text":"A reader asks...","link":"https:\/\/langa.com\/index.php\/category\/a-reader-asks\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":687,"url":"https:\/\/langa.com\/index.php\/2018\/10\/14\/have-you-played-the-hidden-text-adventure-game-in-chrome\/","url_meta":{"origin":483,"position":3},"title":"Have you played the hidden text-adventure game in Chrome?","author":"Fred Langa","date":"2018-10-14","format":false,"excerpt":"Yup, it's yet another \"easter egg\" hidden inside Chrome, this one recently-discovered: A classic 1980's-style, text-based adventure game, of the sort from the early days of computing! Graphics? GRAPHICS? We don't need no steenking graphics! Just look at the text around the blue G: See for yourself: Open the Developer's\u2026","rel":"","context":"In &quot;Cool Site&quot;","block_context":{"text":"Cool Site","link":"https:\/\/langa.com\/index.php\/category\/cool-site\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/10\/googletextadventure.jpg?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/10\/googletextadventure.jpg?resize=350%2C200 1x, https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/10\/googletextadventure.jpg?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2018\/10\/googletextadventure.jpg?resize=700%2C400 2x"},"classes":[]},{"id":2813,"url":"https:\/\/langa.com\/index.php\/2019\/05\/22\/how-do-i-run-a-batch-file-in-windows-10\/","url_meta":{"origin":483,"position":4},"title":"&#8220;How do I run a batch file in Windows 10?&#8221;","author":"Fred Langa","date":"2019-05-22","format":false,"excerpt":"Um, you click on it? I think you\u2019re actually asking how to create and then run a batch file. That\u2019s easy. It's also pretty old-school, as Microsoft is pushing everyone towards the much-more-powerful, but much-more-complicated, PowerShell environment instead. But batch files still work fine In Windows 10; they're still useful\u2026","rel":"","context":"In &quot;A reader asks...&quot;","block_context":{"text":"A reader asks...","link":"https:\/\/langa.com\/index.php\/category\/a-reader-asks\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2019\/05\/batch-file-2019-05-20_14-40-44.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2019\/05\/batch-file-2019-05-20_14-40-44.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2019\/05\/batch-file-2019-05-20_14-40-44.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/langa.com\/wp-content\/uploads\/2019\/05\/batch-file-2019-05-20_14-40-44.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":318,"url":"https:\/\/langa.com\/index.php\/2018\/09\/10\/a-reader-asks-what-was-the-last-generation-that-learned-to-type-on-a-typewriter\/","url_meta":{"origin":483,"position":5},"title":"A reader asks: What was the last generation that learned to type on a typewriter?","author":"Fred Langa","date":"2018-09-10","format":false,"excerpt":"Q: \"What was the last generation that learned to type on a typewriter instead of a computer keyboard?\" (via Quora) A: In the developed world, the \u201cboomer\u201d generation \u2014 offspring of WW2-era parents \u2014 are almost surely the last. I'm part of the tail end of the Boomer generation. In\u2026","rel":"","context":"In &quot;A reader asks...&quot;","block_context":{"text":"A reader asks...","link":"https:\/\/langa.com\/index.php\/category\/a-reader-asks\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":false,"_links":{"self":[{"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/posts\/483","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/comments?post=483"}],"version-history":[{"count":6,"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/posts\/483\/revisions"}],"predecessor-version":[{"id":491,"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/posts\/483\/revisions\/491"}],"wp:attachment":[{"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/media?parent=483"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/categories?post=483"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/langa.com\/index.php\/wp-json\/wp\/v2\/tags?post=483"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}