Questions to ask before choosing a Collection implementation in Java

In any non trivial project, developers are faced with having to deal a collection of objects. Java Collections framework provides a rich set of options, but too often developers choose one of the popular options such as an ArrayList or a HashSet without thinking through the consequences of his choice. On the other hand, some choose “performance” as the foremost criteria even though the scalability requirements are hard to predict beforehand.

My rule of thumb in choosing the right implementation of the Collection or Map interface is as follows.

Thread safe

The first question that should be asked is this — do I need thread safety or not? Most of the Collection implementations do not provide thread safety by default due to performance reasons. If you need your code to be thread-safe, then either you have to use a thread-safe implementation or be willing to wrap your collection using Collections.synchronized…

Duplicates

Next critical criteria in choosing a good implementation is whether I have to store duplicate elements or not. Many collection implementations do permit duplicates, though this may not be the desirable way in your code. This applies to duplicate elements in case of Lists or Queues and duplicate keys in case of a key-value pair collection such as Map.

Order

Does the ordering of elements in the collection matter? If yes, what type of ordering — natural ordering or the insertion order?

Null

Do we have to store nulls or not is another factor which could influence your choice

Performance

This should be in my opinion the last criteria in most of the applications. Performance of different implementations varies, depending upon the type of implementation and also depending upon the thread-safety. ArrayList provides very fast insertion and retrieval, making it a good candidate for collections which require fast access to random elements. But if you want to remove elements frequently, then LinkedList is a better candidate. But in practice for most cases, performance considerations are much less decisive in choosing the right implementation.

Quick look
Type Implementation Thread-safe Ordered Duplicates Allows null Insertion Retrieval Removal
List ArrayList No Yes Yes Yes O(1) O(1) O(n)
List LinkedList No Yes Yes Yes O(1) O(n) O(1)
List CopyOnArrayList Yes Yes Yes Yes O(n) O(1) O(n)
Set HashSet No No No Yes O(1) na O(1)
Set LinkedHashSet No Yes No Yes O(1) na O(1)
Set TreeSet No Yes No Yes O(logn)  na O(logn)
Set ConcurrentSkipListSet Yes Yes No No  O(logn)  O(logn)  O(logn)
Queue ArrayBlockingQueue Yes Yes No No O(1) O(1) O(1)
Queue PriorityQueue No Yes No No O(logn) O(1) O(logn)
Queue LinkedBlockingQueue Yes Yes No No O(1) O(1) O(1)

Summary

Choosing the right Collection implementation is often tricky, but you can’t make serious mistakes if you keep in mind some of the above rules of thumb. Thread-safety and ordering requirements should primarily dictate the choice of any implementation and only when you are faced with unusual scalability requirements, start looking at performance as a major decision criteria.

 

Leave a Reply