rhu: (torah)
[personal profile] rhu
So for the past few months I've been working on rethinking the Dafcast website. And I'd like a sanity check here, folks.

Quick background: about 3 years ago I had a great idea (no, really) about a new way to do a "Daf Yomi" podcast. (Background to the background: "Daf Yomi" is a study program in which you learn one page of Talmud every day, getting through the whole thing in 7 1/2 years. The twelfth cycle ends on August 2, and on August 3 we start all over at the beginning.)

The problem is that for my podcast to work, I needed an English text that isn't encumbered by copyright. So phase one, I thought, would be to crowdsource translating the Talmud. I figured that if 1,000 people each volunteered to translate 3 pages, we'd get the whole thing done. And this is the internet, so how hard would it be to find 1,000 people who'd be willing to translate 3 pages?

Turns out, pretty hard. In 3 years I had a dozen volunteers who submitted a total of... zero pages. Then one more volunteer showed up a few months ago and did a great job on three pages.

So I had a new plan. I developed a website that I code-named Machine-Assisted Translation And Notation of Torah she-be-al Peh. (The acronym is cute in Hebrew, trust me.) The basic idea is that it has the entire corpus in Hebrew, and as I translate each page I break it down into words and short phrases whose translation can be recycled. It's not a Machine-Translation system -- it knows nothing about context or grammar; it just picks the most frequently-used translation as the default and offers all the others as options. But since this is a closed corpus, the idea is that once I've used it to help me translate a page manually, it doesn't need to translate that page ever again. It's just supposed to make things faster.

Here's a 2-minute video showing it in action: http://www.screencast.com/t/c1KpxBOH. And you can see a rough preview of the end result at http://cycle13.dafcast.net. [PLEASE DON'T FORWARD THESE LINKS AROUND YET. I'm not friends-locking this post, so strangers may stumble across it, but I'm not really ready to go all-out public with this before the cycle begins --- and, depending on the answers to the questions at the end of this post, possibly not even then!]

The other piece of the system is that it generates a "heatmap" showing me what the translation-so-far looks like, and a list of which pages would give me the most "bang for my buck" -- that is, if I jump around and do pages out of order, which ones have the most frequently used words that are not yet in the database? That got me off to a great start: it took a couple of weeks to get to the point where 2/3 of the total text of the Talmud was covered by my database. Again, that doesn't yield a great translation (Sometimes "bar" means "son of" and sometimes it means "wheat" -- and worse, "אין" in Hebrew means "not" and in Aramaic it means "Yes!") But it's a useful starting point.

I haven't abandoned my long-term vision. Once the translation is far enough along, I still hope, with God's help, to turn it into scripts and to record podcasts in time for cycle fourteen in 7 1/2 more years.

But right now I'm looking at where I am, relative to August 3, and I'm realizing it's still a stretch goal. (I want to be at least one month ahead of the Daf Yomi schedule, on average.) And because I'm rushing, and because I want my fragments to be reusable, I haven't had the time to copy-edit and make sure that the punctuation and capitalization are correct, or that all the fudgy connector words are there.

So I'd like your feedback. Am I insane to keep at this? Should I plow ahead at whatever pace I can manage and not worry about whether or not I'm keeping up? Should I ask for help and, if so, should I limit that to copy editing and cleanup or should I ask for help with translations? (I've been reluctant to do that because right now the way the system works is captured in my brain; does my atomic decomposition strategy make sense to anyone else or would we end up with a useless mash?) Should I add on the RSS feed and daily emails that I had been hoping to get set up in time for the start of the cycle, or is the content not good enough to worry about syndication at this point?

(no subject)

Date: 2012-07-12 11:17 pm (UTC)
From: [identity profile] michael carasik (from livejournal.com)
I'm a little dubious about the idea, and certainly wouldn't have anything intelligent to say about the code. Just thought I'd point out the typo on THIS page (it should be אין, of course). And wondering whether -- if the translation is just a means to an end -- whether it wouldn't be better to figure out a different way to get where you actually want to go.

(no subject)

Date: 2012-07-12 11:51 pm (UTC)
ext_87516: (530nm330Hz)
From: [identity profile] 530nm330hz.livejournal.com
Thanks for catching the typo, which I've repaired above.

Given your expertise in this area, I find your doubt worrisome. Thanks. (I mean that. No point in asking for a sanity check if I won't listen to people who tell me I'm being unrealistic.)

(no subject)

Date: 2012-07-13 02:13 am (UTC)
cellio: (talmud)
From: [personal profile] cellio
To what extent does the podcast rely on having a translation versus a summary? You were planning to dramatize the discussions as a way of engaging listeners, right? How literal does that need to be?

(And I apologize for my ongoing slackitude for the pages you sent me. :-( )

(no subject)

Date: 2012-07-13 02:17 am (UTC)
ext_87516: (torah)
From: [identity profile] 530nm330hz.livejournal.com
The podcast is intended to cover every word on one daf per episode. So it really does want to start with a translation, I think.

No need to apologize. If you had left a gaping hole in an otherwise complete perek, I'd worry. But -- aside from the pages I got out of the blue just recently -- I was batting 0.000. If you'd sent something in that I then couldn't use, I'd be the one feeling guilty!

(no subject)

Date: 2012-07-13 02:21 am (UTC)
cellio: (talmud)
From: [personal profile] cellio
Oh! Ok, then yes, translation is critical and this is a large task. I forget; have you already ruled out approaching copyright-holders for existing translations? A podcast may be sufficiently different from printed volumes that they might be willing to go along with the idea in the interests of spreading torah learning? (This is probably old ground and I've just forgotten, sorry!)

If you'd sent something in that I then couldn't use, I'd be the one feeling guilty!

You shouldn't. As you may recall, I said I didn't think I had enough skill and you encouraged me to try anyway. If I produce something then I will have benefitted from the effort even if no one else does, by getting my skills to the point where I can do it!

(no subject)

Date: 2012-07-13 05:51 am (UTC)
From: [identity profile] vettecat.livejournal.com
Sounds like a cool idea but I don't think you should pressure yourself timewise. You'll be more upset with yourself if you rush to publish an imperfect product. Nobody will be upset if it's delayed, and people will appreciate it whenever it's available.

BTW this may be of interest: http://www.webshas.org/

Profile

rhu: (Default)
Andrew M. Greene

January 2013

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags