SQL is a standard language for expressing database design. It has the following useful constructs:
* A datum may be assigned the value NULL, which means (roughly) "there is no meaningful value to put here." NULL is not considered equal to any other value, even another NULL.
* A column (such as "the mayors of all the cities") may be constrained to have unique values ("no person can be the mayor of more than one city at the same time.")
But if you mark a column as unique, it won't allow more than one row to have NULL as its value, even though testing two NULLs for equality returns false. So there's no way to express "If there is a value here, it must be unique; but any number of rows may have an unknown or empty value at the same time."
Surely I'm not the first person to run into this problem. But I can't find any good solutions --- neither in my SQL books nor via Google.
Edited to add: Thanks to
abbasegal for suggesting setting up a unique nonclustered index on a schemabound view.
* A datum may be assigned the value NULL, which means (roughly) "there is no meaningful value to put here." NULL is not considered equal to any other value, even another NULL.
* A column (such as "the mayors of all the cities") may be constrained to have unique values ("no person can be the mayor of more than one city at the same time.")
But if you mark a column as unique, it won't allow more than one row to have NULL as its value, even though testing two NULLs for equality returns false. So there's no way to express "If there is a value here, it must be unique; but any number of rows may have an unknown or empty value at the same time."
Surely I'm not the first person to run into this problem. But I can't find any good solutions --- neither in my SQL books nor via Google.
Edited to add: Thanks to
(no subject)
Date: 2007-03-20 12:48 am (UTC)CREATE TABLE one_big_table (
id INTEGER PRIMARY KEY,
foo INTEGER UNIQUE -- may be null
-- other fields
);
do something like this:
CREATE TABLE first_little_table (
id INTEGER PRIMARY KEY
-- other fields
);
CREATE TABLE second_little_table (
id INTEGER NOT NULL REFERENCES first_little_table.id,
foo INTEGER UNIQUE NOT NULL
);
CREATE VIEW one_big_view AS
SELECT * FROM first_little_table LEFT JOIN second_little_table USING (id);
--SQL99 outer join syntax
There are relational-theory purists (well, at least one purist that I know of) who would say that you should always refactor a schema this way instead of allowing fields in a table to be null.
(no subject)
Date: 2007-03-20 01:10 am (UTC)The actual example is that my main table is a list of Engines, which have among their many properties a Status, which is an enumerated tinyint, and a CurrentTaskId, which is a foreign key into the Tasks table. I want to ensure that no task is simultaneously assigned to more than one engine, that no engine is simultaneously assigned to more than one task, and that an engine's CurrentTaskId is constrained to be NULL for certain status values (such as INITIALIZING or AVAILABLE) and non-NULL for other status values (such as ASSIGNED or WORKING). MSSQL, at least, does not allow check constraints to include SELECT statements, and it would be prohibitively expensive to wrap each update in a transaction with deferred checking. So I chose the DB design that lets me describe two out of my three constraints, and tried to ensure the third by careful programming. But apparently there's a race condition that I didn't catch, and I want to use SQL constraints to nail the thing.
In my experience, relational-theory purists don't concern themselves much with piddling details like throughput and preventing deadlock by eliminating the need for transactions.