rhu: (Default)
[personal profile] rhu
I'm writing a little web-based word game, and put together a Python script to compute scores, which involves scanning a wordlist.

Scanning the list by opening a text file and iterating through: 7 seconds
Importing the list into a SQLite DB and using SELECT LIKE: 15 seconds
Computing the set of substrings in Python and using SELECT IN: 19 seconds

If you'd asked me going in, I would have expected these to be in the other order. Of course, there are some other tricks I can start using with indexing the SQL tables and optimizing my queries that will help.

Results are not surprising

Date: 2010-06-25 03:51 pm (UTC)
From: [identity profile] jikamens.livejournal.com
SELECT LIKE requires a full scan of every row in the database. LIKE doesn't benefit from indices. So you're doing the same work as just scanning the text file, plus all the overhead of SQLite.

I'm not sure exactly what you mean by "Computing the set of substrings in Python and using SELECT IN," so I can't comment on its performance.

If you want this to be sped up by the DB, you're going to have to build a more complex data model that does all the heavy lifting in advance, so you end up with queries that can benefit from indices.

That's, e.g., what password crackers for old-style UNIX crypt()ed passwords do -- they've essentially calculated huge numbers of encrypted password strings in advance, so cracking a password often requires no more than a single indexed table lookup.

Re: Results are not surprising

Date: 2010-06-25 05:06 pm (UTC)
ext_87516: (Default)
From: [identity profile] 530nm330hz.livejournal.com
CREATE INDEX on the SQL table knocks the SELECT IN time down to 0.2 seconds. That makes me happy. (Splitting the text file into ones for each word length knocked that time down to 0.5 seconds, so SQLite wins.)

(In the particular game I'm building, your score is based on the longest words that can be found in the grid you've filled in. So if along one particular path you have the letters SWREN, you'd get 4 points. That's what I mean by having Python precompute the substrings: I mean

[s[j:k] for j in range(0,4) for k in range(j+1, 5)]

Profile

rhu: (Default)
Andrew M. Greene

January 2013

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags