rhu: (Default)
[personal profile] rhu

I'm serious about this question, although you may not believe it.

As you probably know, computers store everything as sequences of binary numbers. It is often necessary to translate those bit sequences into another format that can be transferred from one computer to another. So, for example:

Decimal encoding simply renders them as a number, such as 127. This is great when the typical user needs to understand the number, such as when answering "How many minutes should the egg timer wait before beeping?"
Hexadecimal encoding is similar but uses the letters A-F along with the digits 0-9 so that each character represents a grouping of four bits.
Base-64 encoding translates them into strings of seemingly random characters such as QmFzZS02NCBlbmNvZGluZwo= which are great for sending large attachments through email systems that might munge raw binary data.

Now, here's my idea and my question. Often at my company, we have moderately-long bit strings (32 to 128 bytes is typical) that sometimes need to be read over the phone. It occurred to me today that how we display and enter these numbers is arbitrary and transient, so we aren't limited to the encodings I listed above. In particular, I was wondering:

How many monosyllabic words could be used to encode binary data with no ambiguity, accounting for regional pronounciations?

What I'm thinking is: assume that we can make a list of 256 such words. "Cat," "blue", etc. Then each byte of the bit string could be represented by a single word, and users would be less likely to transpose digits or confuse a "B" and a "P" on the phone. Our GUIDs or license keys would become like spam headers:

blue scoff cat pie shoot wing

Would such a scheme be practical? Could one construct a list of 256 or even 1,024 words chosen to provide error-resistant spoken encoding?

(no subject)

Date: 2006-02-15 11:21 pm (UTC)
From: [identity profile] hahathor.livejournal.com
I think you'd need words of more than 3 phonemes. Take the short list you gave: blue scoff cat pie shoot wing. Let's say your transmittor has a regional accent, or a cold, or is just rattling off words really fast and has a slip of the tongue. And gives the word "coot." That's just one phoneme off from "cat" and also one phoneme off from "shoot." With sets of three phonemes, you'd need each phoneme set to be unique. If you had five-phoneme sets, you could have sets that match on two phonemes and you could generate a fairly large universe of words - I'm not a linguist so I won't bother to estimate the number of usable phonemes, nor am I a semanticist, so I won't estimate the number of words, but I will go out on a limb far enough to say three-phoneme words ain't gonna cut it.

(no subject)

Date: 2006-02-16 02:46 am (UTC)
sethg: a petunia flower (Default)
From: [personal profile] sethg
You mean this?

The mnemonic encoding presented here is a method for converting binary data into a sequence of words suitable for transmission or storage by voice, handwriting, memorization or other non-computerized means.

The encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words. The words have been chosen to be easy to understand over the phone and recognizable internationally as much as possible....

(no subject)

Date: 2006-02-16 02:41 pm (UTC)
ext_87516: (Default)
From: [identity profile] 530nm330hz.livejournal.com
Yeah, something like that.

Profile

rhu: (Default)
Andrew M. Greene

January 2013

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags