rhu | Auditory encoding of binary data

I'm serious about this question, although you may not believe it.

As you probably know, computers store everything as sequences of binary numbers. It is often necessary to translate those bit sequences into another format that can be transferred from one computer to another. So, for example:

Decimal encoding simply renders them as a number, such as 127. This is great when the typical user needs to understand the number, such as when answering "How many minutes should the egg timer wait before beeping?"

Hexadecimal encoding is similar but uses the letters A-F along with the digits 0-9 so that each character represents a grouping of four bits.

Base-64 encoding translates them into strings of seemingly random characters such as QmFzZS02NCBlbmNvZGluZwo= which are great for sending large attachments through email systems that might munge raw binary data.

Now, here's my idea and my question. Often at my company, we have moderately-long bit strings (32 to 128 bytes is typical) that sometimes need to be read over the phone. It occurred to me today that how we display and enter these numbers is arbitrary and transient, so we aren't limited to the encodings I listed above. In particular, I was wondering:

How many monosyllabic words could be used to encode binary data with no ambiguity, accounting for regional pronounciations?

What I'm thinking is: assume that we can make a list of 256 such words. "Cat," "blue", etc. Then each byte of the bit string could be represented by a single word, and users would be less likely to transpose digits or confuse a "B" and a "P" on the phone. Our GUIDs or license keys would become like spam headers:

blue scoff cat pie shoot wing

Would such a scheme be practical? Could one construct a list of 256 or even 1,024 words chosen to provide error-resistant spoken encoding?

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Most Popular Tags

"what i believe" - 22 uses
about dad - 24 uses
ask lj - 32 uses
bible bystanders - 14 uses
biking - 16 uses
bookbinding - 18 uses
books - 80 uses
daf - 24 uses
dafcast - 48 uses
drowsy chaperone - 10 uses
dvar torah - 9 uses
food - 28 uses
funny - 12 uses
geekitude - 23 uses
genealogy - 20 uses
humor - 40 uses
israel - 48 uses
jcds - 8 uses
judaism - 342 uses
kabbalat shabbat - 40 uses
kids - 166 uses
kvelling - 14 uses
kvetches - 71 uses
language - 15 uses
links - 88 uses
liturgy - 22 uses
mit mystery hunt - 20 uses
mourning - 26 uses
music - 158 uses
newton - 10 uses
panda magazine - 18 uses
parody - 8 uses
parsha - 8 uses
pdz - 18 uses
pesach - 34 uses
pesachim - 10 uses
politics - 34 uses
puzzles - 428 uses
quotes - 10 uses
review - 36 uses
reviews - 95 uses
siddur - 93 uses
silly - 79 uses
software - 14 uses
talmud - 28 uses
torah - 8 uses
typography - 12 uses
vacation - 10 uses
voice lessons - 14 uses
work - 10 uses

Flat | Top-Level Comments Only

From:

hahathor.livejournal.com

I think you'd need words of more than 3 phonemes. Take the short list you gave: blue scoff cat pie shoot wing. Let's say your transmittor has a regional accent, or a cold, or is just rattling off words really fast and has a slip of the tongue. And gives the word "coot." That's just one phoneme off from "cat" and also one phoneme off from "shoot." With sets of three phonemes, you'd need each phoneme set to be unique. If you had five-phoneme sets, you could have sets that match on two phonemes and you could generate a fairly large universe of words - I'm not a linguist so I won't bother to estimate the number of usable phonemes, nor am I a semanticist, so I won't estimate the number of words, but I will go out on a limb far enough to say three-phoneme words ain't gonna cut it.

sethg

You mean this?

The mnemonic encoding presented here is a method for converting binary data into a sequence of words suitable for transmission or storage by voice, handwriting, memorization or other non-computerized means.

The encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words. The words have been chosen to be easy to understand over the phone and recognizable internationally as much as possible....

530nm330hz.livejournal.com

Yeah, something like that.

Brainripples

Auditory encoding of binary data

(no subject)

(no subject)

(no subject)

Profile

January 2013

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags