A datawash solution:
Problem: From Active Directory I had a long list, where column alias was not unique.
Instead of do any manual filtering away the aliases that were found more than once, I just wanted SQL server to throw away the duplets, leaving only one alias bin the list.
If I did
select distinct alias from ADList
I would only get the column alias from the List.
If I did
select distinct alias, ou from ADList
I would not get a list of unique aliases since ou would differ and there would still be more than one of the same alias.
In other words: Souce table contains duplets in the key column. The data columns differs, which makes it not possilbe to filter with select distinct alone.
So what to do?
Solution: Use row_number() with partition clause to filter out the 1st instance of the key
-- Print distinct (all columns included) with alias as a unique key with AdUserids as ( select a.*, Row_number() OVER (Partition by a.Alias ORDER BY a.Alias ) as instanceno from (select * from ADList) a ) Select * from AdUserids where instanceno = 1
The row_number() returns 1 for each unique alias. For the second alias of an existing alias, it will return 2.
Now I can get a unique alias key by throwing away all other instanceno than the 1st.
Yes, I agree – not easy to remember this SQL, but it solves my problem.
The row_number() also comes in another version:
-- Add a row-number column to any an ordered select select a.*, Row_number() OVER (ORDER BY a.Alias ) as rowno from (select distinct Alias from ADList) a
Without the Partition by clause it will add a row-number column to the existing select
Links:
http://msdn.microsoft.com/en-us/library/ms186734.aspx
http://mysqlserverblog.com/2007/12/03/tsql-row_number_function.aspx
Thanks to Raoul Illyés
The End.