Archive for May, 2016

Questions to ask before choosing a Collection implementation in Java

Wednesday, May 25th, 2016

In any non-trivial software application, developers are faced with having to deal with a collection of objects. Java Collections framework provides a rich set of options, but quite often I have seen people choosing one of the popular options such as an ArrayList or a HashSet without thinking through the consequences of their choice. On the other hand, some choose “performance” as the foremost criteria even though the scalability requirements are hard to predict beforehand.

My rule of thumb in choosing the right implementation of the Collection or Map interface is as follows:

Thread safety

In my opinion, thread safety requirements are a good start for narrowing down your options. Do I need thread safety or not? Most of the Collection implementations do not provide thread safety by default due to performance reasons. If you need your code to be thread-safe, then either you have to use a thread-safe implementation or be willing to wrap your collection using Collections.synchronized

Duplicates

Next critical criteria in choosing a good implementation are whether you have to store duplicate elements or not. Many collection implementations do permit duplicates, though this may not be desirable depending upon your requirements. This applies to duplicate elements in case of Lists or Queues and duplicate keys in case of a key-value pair structure such as Map.

Order

Does the ordering of elements in the collection matter? If yes, what type of ordering — natural ordering or the insertion order?

Null

Do we have to store nulls or not is another factor which could influence your choice.

Performance

This should be in my opinion the last criteria for most of the applications. The performance of different implementations varies, depending upon the type of implementation and also depending upon the thread-safety. ArrayList provides very fast insertion and retrieval, making it a good candidate for collections which require fast access to random elements. But if you want to remove elements frequently, then LinkedList is a better candidate. But in practice for most cases, you may not know in advance how much scalability is needed. Also, performance differences will start to show up only for very large datasets, so better to avoid premature optimization.

[table caption=”Quick look” width=”500″ colwidth=”20|100|50″ colalign=”left|left|center|left|right”]

Type, Implementation, Thread-safe, Ordered, Duplicates, Allows null, Insertion, Retrieval, Removal

List, ArrayList, No, Yes, Yes, Yes, O(1), O(1), O(n)

List, LinkedList, No, Yes, Yes, Yes, O(1), O(n), O(1)

List, CopyOnArrayList, Yes, Yes, Yes, Yes, O(n), O(1), O(n)

Set, HashSet, No, No, No, Yes, O(1), na, O(1)

Set, LinkedHashSet, No, Yes, No, Yes, O(1), na, O(1)

Set, TreeSet, No, Yes, No, Yes, O(logn), na, O(logn)

Set, ConcurrentSkipListSet, Yes, Yes, No, No, O(logn), O(logn), O(logn)

Queue, ArrayBlockingQueue, Yes, Yes, No, No, O(1), O(1), O(1)

Queue, PriorityQueue, No, Yes, No, No, O(logn), O(1), O(logn)

Queue, LinkedBlockingQueue, Yes, Yes, No, No, O(1), O(1), O(1)
[/table]

Summary

Choosing the right Collection implementation is often tricky, but you can’t make serious mistakes if you keep in mind some of the above rules of thumb. Thread-safety and ordering requirements should primarily dictate the choice of any implementation and only when you are faced with unusual scalability requirements, start looking at performance as major decision criteria.