Sanity check: The Dafcast for Cycle 13
Jul. 12th, 2012 02:52 pmSo for the past few months I've been working on rethinking the Dafcast website. And I'd like a sanity check here, folks.
Quick background: about 3 years ago I had a great idea (no, really) about a new way to do a "Daf Yomi" podcast. (Background to the background: "Daf Yomi" is a study program in which you learn one page of Talmud every day, getting through the whole thing in 7 1/2 years. The twelfth cycle ends on August 2, and on August 3 we start all over at the beginning.)
The problem is that for my podcast to work, I needed an English text that isn't encumbered by copyright. So phase one, I thought, would be to crowdsource translating the Talmud. I figured that if 1,000 people each volunteered to translate 3 pages, we'd get the whole thing done. And this is the internet, so how hard would it be to find 1,000 people who'd be willing to translate 3 pages?
Turns out, pretty hard. In 3 years I had a dozen volunteers who submitted a total of... zero pages. Then one more volunteer showed up a few months ago and did a great job on three pages.
So I had a new plan. I developed a website that I code-named Machine-Assisted Translation And Notation of Torah she-be-al Peh. (The acronym is cute in Hebrew, trust me.) The basic idea is that it has the entire corpus in Hebrew, and as I translate each page I break it down into words and short phrases whose translation can be recycled. It's not a Machine-Translation system -- it knows nothing about context or grammar; it just picks the most frequently-used translation as the default and offers all the others as options. But since this is a closed corpus, the idea is that once I've used it to help me translate a page manually, it doesn't need to translate that page ever again. It's just supposed to make things faster.
Here's a 2-minute video showing it in action: http://www.screencast.com/t/c1KpxBOH. And you can see a rough preview of the end result at http://cycle13.dafcast.net. [PLEASE DON'T FORWARD THESE LINKS AROUND YET. I'm not friends-locking this post, so strangers may stumble across it, but I'm not really ready to go all-out public with this before the cycle begins --- and, depending on the answers to the questions at the end of this post, possibly not even then!]
The other piece of the system is that it generates a "heatmap" showing me what the translation-so-far looks like, and a list of which pages would give me the most "bang for my buck" -- that is, if I jump around and do pages out of order, which ones have the most frequently used words that are not yet in the database? That got me off to a great start: it took a couple of weeks to get to the point where 2/3 of the total text of the Talmud was covered by my database. Again, that doesn't yield a great translation (Sometimes "bar" means "son of" and sometimes it means "wheat" -- and worse, "אין" in Hebrew means "not" and in Aramaic it means "Yes!") But it's a useful starting point.
I haven't abandoned my long-term vision. Once the translation is far enough along, I still hope, with God's help, to turn it into scripts and to record podcasts in time for cycle fourteen in 7 1/2 more years.
But right now I'm looking at where I am, relative to August 3, and I'm realizing it's still a stretch goal. (I want to be at least one month ahead of the Daf Yomi schedule, on average.) And because I'm rushing, and because I want my fragments to be reusable, I haven't had the time to copy-edit and make sure that the punctuation and capitalization are correct, or that all the fudgy connector words are there.
So I'd like your feedback. Am I insane to keep at this? Should I plow ahead at whatever pace I can manage and not worry about whether or not I'm keeping up? Should I ask for help and, if so, should I limit that to copy editing and cleanup or should I ask for help with translations? (I've been reluctant to do that because right now the way the system works is captured in my brain; does my atomic decomposition strategy make sense to anyone else or would we end up with a useless mash?) Should I add on the RSS feed and daily emails that I had been hoping to get set up in time for the start of the cycle, or is the content not good enough to worry about syndication at this point?
Quick background: about 3 years ago I had a great idea (no, really) about a new way to do a "Daf Yomi" podcast. (Background to the background: "Daf Yomi" is a study program in which you learn one page of Talmud every day, getting through the whole thing in 7 1/2 years. The twelfth cycle ends on August 2, and on August 3 we start all over at the beginning.)
The problem is that for my podcast to work, I needed an English text that isn't encumbered by copyright. So phase one, I thought, would be to crowdsource translating the Talmud. I figured that if 1,000 people each volunteered to translate 3 pages, we'd get the whole thing done. And this is the internet, so how hard would it be to find 1,000 people who'd be willing to translate 3 pages?
Turns out, pretty hard. In 3 years I had a dozen volunteers who submitted a total of... zero pages. Then one more volunteer showed up a few months ago and did a great job on three pages.
So I had a new plan. I developed a website that I code-named Machine-Assisted Translation And Notation of Torah she-be-al Peh. (The acronym is cute in Hebrew, trust me.) The basic idea is that it has the entire corpus in Hebrew, and as I translate each page I break it down into words and short phrases whose translation can be recycled. It's not a Machine-Translation system -- it knows nothing about context or grammar; it just picks the most frequently-used translation as the default and offers all the others as options. But since this is a closed corpus, the idea is that once I've used it to help me translate a page manually, it doesn't need to translate that page ever again. It's just supposed to make things faster.
Here's a 2-minute video showing it in action: http://www.screencast.com/t/c1KpxBOH. And you can see a rough preview of the end result at http://cycle13.dafcast.net. [PLEASE DON'T FORWARD THESE LINKS AROUND YET. I'm not friends-locking this post, so strangers may stumble across it, but I'm not really ready to go all-out public with this before the cycle begins --- and, depending on the answers to the questions at the end of this post, possibly not even then!]
The other piece of the system is that it generates a "heatmap" showing me what the translation-so-far looks like, and a list of which pages would give me the most "bang for my buck" -- that is, if I jump around and do pages out of order, which ones have the most frequently used words that are not yet in the database? That got me off to a great start: it took a couple of weeks to get to the point where 2/3 of the total text of the Talmud was covered by my database. Again, that doesn't yield a great translation (Sometimes "bar" means "son of" and sometimes it means "wheat" -- and worse, "אין" in Hebrew means "not" and in Aramaic it means "Yes!") But it's a useful starting point.
I haven't abandoned my long-term vision. Once the translation is far enough along, I still hope, with God's help, to turn it into scripts and to record podcasts in time for cycle fourteen in 7 1/2 more years.
But right now I'm looking at where I am, relative to August 3, and I'm realizing it's still a stretch goal. (I want to be at least one month ahead of the Daf Yomi schedule, on average.) And because I'm rushing, and because I want my fragments to be reusable, I haven't had the time to copy-edit and make sure that the punctuation and capitalization are correct, or that all the fudgy connector words are there.
So I'd like your feedback. Am I insane to keep at this? Should I plow ahead at whatever pace I can manage and not worry about whether or not I'm keeping up? Should I ask for help and, if so, should I limit that to copy editing and cleanup or should I ask for help with translations? (I've been reluctant to do that because right now the way the system works is captured in my brain; does my atomic decomposition strategy make sense to anyone else or would we end up with a useless mash?) Should I add on the RSS feed and daily emails that I had been hoping to get set up in time for the start of the cycle, or is the content not good enough to worry about syndication at this point?