C# Tip: Use a SortedSet to avoid duplicates and sort items
Using the right data structure is crucial to building robust and efficient applications. So, why use a List or a HashSet to sort items (and remove duplicates) when you have a SortedSet?
Table of Contents
Just a second! 🫷
If you are here, it means that you are a software developer. So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible - I don't want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding.
- Davide
As you probably know, you can create collections of items without duplicates by using a HashSet<T>
object.
It is quite useful to remove duplicates from a list of items of the same type.
How can we ensure that we always have sorted items? The answer is simple: SortedSet<T>
!
HashSet: a collection without duplicates
A simple HashSet
creates a collection of unordered items without duplicates.
This example
var hashSet = new HashSet<string>();
hashSet.Add("Turin");
hashSet.Add("Naples");
hashSet.Add("Rome");
hashSet.Add("Bari");
hashSet.Add("Rome");
hashSet.Add("Turin");
var resultHashSet = string.Join(',', hashSet);
Console.WriteLine(resultHashSet);
prints this string: Turin,Naples,Rome,Bari. The order of the inserted items is maintained.
SortedSet: a sorted collection without duplicates
To sort those items, we have two approaches.
You can simply sort the collection once you’ve finished adding items:
var hashSet = new HashSet<string>();
hashSet.Add("Turin");
hashSet.Add("Naples");
hashSet.Add("Rome");
hashSet.Add("Bari");
hashSet.Add("Rome");
hashSet.Add("Turin");
var items = hashSet.ToList<string>().OrderBy(s => s);
var resultHashSet = string.Join(',', items);
Console.WriteLine(resultHashSet);
Or, even better, use the right data structure: a SortedSet<T>
var sortedSet = new SortedSet<string>();
sortedSet.Add("Turin");
sortedSet.Add("Naples");
sortedSet.Add("Rome");
sortedSet.Add("Bari");
sortedSet.Add("Rome");
sortedSet.Add("Turin");
var resultSortedSet = string.Join(',', sortedSet);
Console.WriteLine(resultSortedSet);
Both results print Bari,Naples,Rome,Turin. But the second approach does not require you to sort a whole list: it is more efficient, both talking about time and memory.
Use custom sorting rules
What if we wanted to use a SortedSet
with a custom object, like User
?
public class User {
public string FirstName { get; set; }
public string LastName { get; set; }
public User(string firstName, string lastName)
{
FirstName = firstName;
LastName = lastName;
}
}
Of course, we can do that:
var set = new SortedSet<User>();
set.Add(new User("Davide", "Bellone"));
set.Add(new User("Scott", "Hanselman"));
set.Add(new User("Safia", "Abdalla"));
set.Add(new User("David", "Fowler"));
set.Add(new User("Maria", "Naggaga"));
set.Add(new User("Davide", "Bellone"));//DUPLICATE!
foreach (var user in set)
{
Console.WriteLine($"{user.LastName} {user.FirstName}");
}
But, we will get an error: our class doesn’t know how to compare things!
That’s why we must update our User
class so that it implements the IComparable
interface:
public class User : IComparable
{
public string FirstName { get; set; }
public string LastName { get; set; }
public User(string firstName, string lastName)
{
FirstName = firstName;
LastName = lastName;
}
public int CompareTo(object obj)
{
var other = (User)obj;
var lastNameComparison = LastName.CompareTo(other.LastName);
return (lastNameComparison != 0)
? lastNameComparison :
(FirstName.CompareTo(other.FirstName));
}
}
In this way, everything works as expected:
Abdalla Safia
Bellone Davide
Fowler David
Hanselman Scott
Naggaga Maria
Notice that the second Davide Bellone has disappeared since it was a duplicate.
This article first appeared on Code4IT
Wrapping up
Choosing the right data type is crucial for building robust and performant applications.
In this article, we’ve used a SortedSet
to insert items in a collection and expect them to be sorted and without duplicates.
I’ve never used it in a project. So, how did I know that? I just explored the libraries I was using!
From time to time, spend some minutes reading the documentation, have a glimpse of the most common libraries, and so on: you’ll find lots of stuff that you’ve never thought existed!
Toy with your code! Explore it. Be curious.
And have fun!
🐧