SQL Filtered Index

Came across a data situation where a previous decision to allow 1-N matching for a parent->child table relationship was now unecessary and had led to confusion on data matching because the availability to add multiple records meant duplication was not apparent to the user.  

Data Axiom: Respect the data

We need to make data accessible and fundamentally useful for applications to build quality software

Therefore I am at the situation where I have a table like so

    Sequence INT NOT NULL,
    Column1 DateTime NULL,
    Column2 DateTime NULL,
    Column3 VARCHAR(12) NULL,
    ImportantNote VARCHAR(MAX) NULL,

CREATE UNIQUE INDEX [IX_U_MyTable_Sequence] ON [dbo].[MyTable] ([Sequence])

I did not design this table, but now I needed to make it work so that I guarantee uniqueness for each record and switch the data relationship from 1-N to 1-1 (it's ok to do this as the application code already only displays the latest record anyway).

There are more fields but for brevity will only show the important ones.  Previously most of the application and database code depended on the autogenerated Sequence field from another table that would be filled in with a new auto incremented int value each time note processing was run.  

This led to notes disappearing as only the latest note was visible from the application code (yes another flaw as the database was designed by different people at different times than the application code who did the shortcut of only showing the latest note).  I digress but now I come along years later and should change the system to match 1-1 between parent->child table

The issue here is that the important note field needs to be tied some uniqueness in the table and prior to this the only uniqueness was the Sequence but I can make uniqueness on this table by combining Column1, Column2, Column3 into a composite index.

Great I will do that

CREATE UNIQUE NONCLUSTERED INDEX [IX_U_MyTable_MultiColumn] ON [dbo].[MyTable] ([Column1], [Column2], [Column3] ASC)

Oops that doesn't work because I already said that in many cases there are multiple matching records that only differ on the Sequence number so I want to null out the 3 columns that will give uniqueness and match going forward, but if I group the records by the three column proposed uniqueness and order by highest sequence in each grouping and set all but the first to NULL I create the 1-1 matching I want but I get multiple non unique matches in the NULL, NULL, NULL bucket (sadness).

However I can keep my little trick if I turn my index into a filtered index (this is better in postgres I think) in this case filtering on NULL like so

CREATE UNIQUE NONCLUSTERED INDEX [IX_U_MyTable_MultiColumn] ON [dbo].[MyTable] ([Column1], [Column2], [Column3] ASC) WHERE ([Column1] IS NOT NULL AND [Column2] IS NOT NULL AND [Column3] IS NOT NULL);

Now the uniqueness condition of the index only pops in when all 3 columns are not NULL, otherwise all other data gets thrown into a big no matching bucket of records.  They are preserved, but will no longer be considered going forward and second point all new records will be created with this data filled in by virtue of the application code filling it in and at a later point we can go in and delete all the NULL records and set the 3 columns to NOT NULL but until the system stabilizes need to run like this for a little while.

Thanks filtered indexes.


DROP INDEX IF EXISTS [IX_U_MyTable_Sequence] ON [dbo].[MyTable];

Or if you prefer ye olde syntax

IF EXISTS(SELECT * FROM sys.indexes WHERE object_id = object_id('dbo.MyTable') AND NAME ='IX_U_MyTable_Sequence')
    DROP INDEX [IX_U_MyTable_Sequence] ON [dbo].[MyTable];

and add the new filtered multiple column index

CREATE UNIQUE NONCLUSTERED INDEX [IX_U_MyTable_MultiColumn] ON [dbo].[MyTable] ([Column1], [Column2], [Column3] ASC) WHERE ([Column1] IS NOT NULL AND [Column2] IS NOT NULL AND [Column3] IS NOT NULL);