SQL gripe

Mar. 13th, 2007 02:19 pm
rhu: (Default)
[personal profile] rhu
SQL is a standard language for expressing database design. It has the following useful constructs:

* A datum may be assigned the value NULL, which means (roughly) "there is no meaningful value to put here." NULL is not considered equal to any other value, even another NULL.

* A column (such as "the mayors of all the cities") may be constrained to have unique values ("no person can be the mayor of more than one city at the same time.")

But if you mark a column as unique, it won't allow more than one row to have NULL as its value, even though testing two NULLs for equality returns false. So there's no way to express "If there is a value here, it must be unique; but any number of rows may have an unknown or empty value at the same time."

Surely I'm not the first person to run into this problem. But I can't find any good solutions --- neither in my SQL books nor via Google.

Edited to add: Thanks to [livejournal.com profile] abbasegal for suggesting setting up a unique nonclustered index on a schemabound view.

(no subject)

Date: 2007-03-13 07:31 pm (UTC)
From: [identity profile] http://users.livejournal.com/_opus_/
Hmm. It sounds like a DB-specific behaviour. I have no problem with the following in mySQL:

mysql> create table foo (foo_id int not null primary key auto_increment,
first char(64) unique,
second char(64) unique);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into foo (first) values ('test1'), ('test2');
Query OK, 2 rows affected (0.03 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> select * from foo;
+--------+-------+--------+
| foo_id | first | second |
+--------+-------+--------+
| 1 | test1 | NULL |
| 2 | test2 | NULL |
+--------+-------+--------+
2 rows in set (0.02 sec)

(no subject)

Date: 2007-03-18 07:23 am (UTC)
From: [identity profile] abbasegal.livejournal.com
Which DB are you using? A google search seemed to imply that it works in Oracle and DB2 (apparently this behavior is not quite defined by the standard).

Regardless, assuming you want to enforce this constraint at the DB (and not app) level, you could probably whip something up with a trigger on the column to check for uniqueness.

If you are using SQL Server I found these two different approaches:
http://weblogs.sqlteam.com/brettk/archive/2005/04/20/4592.aspx
http://weblogs.sqlteam.com/davidm/archive/2004/05/21/1364.aspx

-AS

(no subject)

Date: 2007-03-19 09:11 pm (UTC)
ext_87516: (Default)
From: [identity profile] 530nm330hz.livejournal.com
You are correct. But I have to use MS SQL Server.

(no subject)

Date: 2007-03-19 09:12 pm (UTC)
ext_87516: (Default)
From: [identity profile] 530nm330hz.livejournal.com
Ah, I like the idea of creating a view with a clustered index. Triggers can lead to performance problems and, of course, the point of having a DB constraint is to protect against bugs -- and triggers can themselves have bugs and race conditions.

(no subject)

Date: 2007-03-20 12:36 am (UTC)
sethg: a petunia flower (Default)
From: [personal profile] sethg
Not in PostgreSQL, either:

...[E]ven in the presence of a unique constraint it is possible to store an unlimited number of rows that contain a null value in at least one of the constrained columns. This behavior conforms to the SQL standard, but we have heard that other SQL databases may not follow this rule.

(no subject)

Date: 2007-03-20 12:48 am (UTC)
sethg: a petunia flower (Default)
From: [personal profile] sethg
You could also refactor the table into two separate tables, i.e., instead of

CREATE TABLE one_big_table (
id INTEGER PRIMARY KEY,
foo INTEGER UNIQUE -- may be null
-- other fields
);

do something like this:

CREATE TABLE first_little_table (
id INTEGER PRIMARY KEY
-- other fields
);

CREATE TABLE second_little_table (
id INTEGER NOT NULL REFERENCES first_little_table.id,
foo INTEGER UNIQUE NOT NULL
);

CREATE VIEW one_big_view AS
SELECT * FROM first_little_table LEFT JOIN second_little_table USING (id);
--SQL99 outer join syntax

There are relational-theory purists (well, at least one purist that I know of) who would say that you should always refactor a schema this way instead of allowing fields in a table to be null.

(no subject)

Date: 2007-03-20 01:10 am (UTC)
ext_87516: (Default)
From: [identity profile] 530nm330hz.livejournal.com
Tried that. But in this case there are other constraints that can be expressed only within one table.

The actual example is that my main table is a list of Engines, which have among their many properties a Status, which is an enumerated tinyint, and a CurrentTaskId, which is a foreign key into the Tasks table. I want to ensure that no task is simultaneously assigned to more than one engine, that no engine is simultaneously assigned to more than one task, and that an engine's CurrentTaskId is constrained to be NULL for certain status values (such as INITIALIZING or AVAILABLE) and non-NULL for other status values (such as ASSIGNED or WORKING). MSSQL, at least, does not allow check constraints to include SELECT statements, and it would be prohibitively expensive to wrap each update in a transaction with deferred checking. So I chose the DB design that lets me describe two out of my three constraints, and tried to ensure the third by careful programming. But apparently there's a race condition that I didn't catch, and I want to use SQL constraints to nail the thing.

In my experience, relational-theory purists don't concern themselves much with piddling details like throughput and preventing deadlock by eliminating the need for transactions.

(no subject)

Date: 2007-03-20 03:21 pm (UTC)
ext_87516: (Default)
From: [identity profile] 530nm330hz.livejournal.com
The schemabound view with a unique nonclustered index works like a charm! Thanks so much for the reference.

(no subject)

Date: 2007-03-20 09:26 pm (UTC)
From: [identity profile] abbasegal.livejournal.com
I'm glad it worked!

Profile

rhu: (Default)
Andrew M. Greene

January 2013

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags