SQL2005: Create a unique key with Row_number()

A datawash solution:

Problem: From Active Directory I had a long list, where column alias was not unique.
Instead of do any manual filtering away the aliases that were found more than once, I just wanted SQL server to throw away the duplets, leaving only one alias bin the list.
If I did

select distinct alias from ADList

I would only get the column alias from the List.

If I did

select distinct alias, ou from ADList

I would not get a list of unique aliases since ou would differ and there would still be more than one of the same alias.

In other words: Souce table contains duplets in the key column. The data columns differs, which makes it not possilbe to filter with select distinct alone.

So what to do?

Solution: Use row_number() with partition clause to filter out the 1st instance of the key

-- Print distinct (all columns included) with alias as a unique key
with AdUserids as (
        select a.*, Row_number() OVER (Partition by a.Alias ORDER BY a.Alias ) as instanceno
        from (select * from ADList) a
)
Select * from AdUserids
where instanceno = 1

The row_number() returns 1 for each unique alias. For the second alias of an existing alias, it will return 2.
Now I can get a unique alias key by throwing away all other instanceno than the 1st.
Yes, I agree – not easy to remember this SQL, but it solves my problem.

The row_number() also comes in another version:

-- Add a row-number column to any an ordered select
select a.*, Row_number() OVER (ORDER BY a.Alias ) as rowno
from (select distinct Alias from ADList) a

Without the Partition by clause it will add a row-number column to the existing select

Links:
http://msdn.microsoft.com/en-us/library/ms186734.aspx
http://mysqlserverblog.com/2007/12/03/tsql-row_number_function.aspx

Thanks to Raoul Illyés


After having struggled with SQL, another clever guy told me how to do the same trick with Excel.

In a small data sample, where I want to filter away the rows, where col1 has duplicated data you select whole that column:

Sample data

Choose Data – Advanced… A message pops up:

Confirm header row included

Now press OK to inform Excel that header is included in data. A form appears:

Unique only

Check the Unique records only and press OK.

Voila! 3 rows/records out of 7 are filtered away.

Filtered unique rows

Thanks to Stefan Carlsson.

The End.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: