Monday, April 30, 2012

A few words about ConcurrentHashMap

There is a popular belief that, because ConcurrentHashMap is part of the java.util.concurrent package then it is designed to deal with multi-threading only. However, ConcurrentHashMap also has quite a useful property that I will exploit in this post, without mentioning multi-threading at all.

Let's start with the following code:

import java.util.Map;
import java.util.HashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

Nothing complicated, we generate a map and process it within the updateMap method. Everything works fine at this point. However, let's imagine that we need a slightly more complicated processing of the map, e.g. if we encounter the key "C", then we need to add a new entry to the map, something like this:

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

Now, when we execute the updated code, it fails (run time) with the following exception:

Exception in thread "main" java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
        at java.util.HashMap$KeyIterator.next(Unknown Source)
        at HashMapTest.updateMap(HashMapTest.java:19)
        at HashMapTest.main(HashMapTest.java:55)

And this is exactly what we are told by the specification:

"Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined."

In other words, HashMap (in fact, this isn't specific to HashMap only) doesn't allow for "updates while iterating", more specifically adding new entries with new keys.

One way to fix the problem is to create a copy of the key set, something like this:

import java.util.Map;
import java.util.HashMap;
import java.util.HashSet;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  HashSet<String> keys = new HashSet<String>(map.keySet());
  for (String str : keys) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

But, if we look at the output, we will mention that "C1" node is unprocessed:

    ...
    A -- 1
    B -- 1
    C -- 1
    C1 -- 0
    H – 1
    ...

As a result, we will need a new routine to process the unprocessed data. There is a different solution though, to use ConcurrentHashMap, which according to the specification:

"Retrievals reflect the results of the most recently completed update operations holding upon their onset."

And here is the final code:

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

All the entries are processed accordingly.

Now, let's check the efficiency, by executing the following code:

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  long total = 0;
  int max = 1000000;
  for (int i = 0; i < max; i++) {
   long start = System.currentTimeMillis();
   Map<String, Integer> map = prepareMap(arr);
   updateMap(map);
   long end = System.currentTimeMillis();
   total += end - start;
  }
  System.out.println("Average time " + ((1.0*total)/max) + "ms");
 }

Average for the ConcurrentHashMap version:

Average time 0.0022ms

Average for the HashMap and HashSet version:

Average time 0.0010ms

Indeed, ConcurrentHashMap brings some (worth to consider) performance penalties. But it isn't like it is twice slower than the HashMap and HashSet version, because in this example we are operating with a small number of entries:

String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

If we increase the number of entries to e.g. 259 (from "A0", "A1", ... to "Z9", excluding "C1"), then we have the following figures:

For the ConcurrentHashMap version:

Average time 0.048ms

For the HashMap and HashSet version:

Average time 0.033ms