SQL Filtered Index
Came across a data situation where a previous decision to allow 1-N
matching for a parent->child table relationship was now unecessary and had led to confusion on data matching because the availability to add multiple records meant duplication was not apparent to the user.
Data Axiom: Respect the data
We need to make data accessible and fundamentally useful for applications to build quality software
Therefore I am at the situation where I have a table like so
CREATE TABLE MyTable (
Sequence INT NOT NULL,
Column1 DateTime NULL,
Column2 DateTime NULL,
Column3 VARCHAR(12) NULL,
ImportantNote VARCHAR(MAX) NULL,
....
)
GO
CREATE UNIQUE INDEX [IX_U_MyTable_Sequence] ON [dbo].[MyTable] ([Sequence])
GO
I did not design this table, but now I needed to make it work so that I guarantee uniqueness for each record and switch the data relationship from 1-N
to 1-1
(it's ok to do this as the application code already only displays the latest record anyway).
There are more fields but for brevity will only show the important ones. Previously most of the application and database code depended on the autogenerated Sequence
field from another table that would be filled in with a new auto incremented int value each time note processing was run.
This led to notes disappearing as only the latest note was visible from the application code (yes another flaw as the database was designed by different people at different times than the application code who did the shortcut of only showing the latest note). I digress but now I come along years later and should change the system to match 1-1
between parent->child table
The issue here is that the important note
field needs to be tied some uniqueness in the table and prior to this the only uniqueness was the Sequence
but I can make uniqueness on this table by combining Column1, Column2, Column3
into a composite index.
Great I will do that
CREATE UNIQUE NONCLUSTERED INDEX [IX_U_MyTable_MultiColumn] ON [dbo].[MyTable] ([Column1], [Column2], [Column3] ASC)
GO
Oops that doesn't work because I already said that in many cases there are multiple matching records that only differ on the Sequence
number so I want to null out the 3 columns that will give uniqueness and match going forward, but if I group the records by the three column proposed uniqueness and order by highest sequence in each grouping and set all but the first to NULL
I create the 1-1
matching I want but I get multiple non unique matches in the NULL, NULL, NULL
bucket (sadness).
However I can keep my little trick if I turn my index into a filtered index (this is better in postgres I think) in this case filtering on NULL
like so
CREATE UNIQUE NONCLUSTERED INDEX [IX_U_MyTable_MultiColumn] ON [dbo].[MyTable] ([Column1], [Column2], [Column3] ASC) WHERE ([Column1] IS NOT NULL AND [Column2] IS NOT NULL AND [Column3] IS NOT NULL);
GO
Now the uniqueness condition of the index only pops in when all 3 columns are not NULL
, otherwise all other data gets thrown into a big no matching bucket of records. They are preserved, but will no longer be considered going forward and second point all new records will be created with this data filled in by virtue of the application code filling it in and at a later point we can go in and delete all the NULL
records and set the 3 columns to NOT NULL
but until the system stabilizes need to run like this for a little while.
Thanks filtered indexes.
Scripts
DROP INDEX IF EXISTS [IX_U_MyTable_Sequence] ON [dbo].[MyTable];
Or if you prefer ye olde syntax
IF EXISTS(SELECT * FROM sys.indexes WHERE object_id = object_id('dbo.MyTable') AND NAME ='IX_U_MyTable_Sequence')
DROP INDEX [IX_U_MyTable_Sequence] ON [dbo].[MyTable];
and add the new filtered multiple column index
CREATE UNIQUE NONCLUSTERED INDEX [IX_U_MyTable_MultiColumn] ON [dbo].[MyTable] ([Column1], [Column2], [Column3] ASC) WHERE ([Column1] IS NOT NULL AND [Column2] IS NOT NULL AND [Column3] IS NOT NULL);
GO