Software Development

Implementing a Queryable Map in Java


In this post I’ll show you how to implement a Queryable Map in Java.

What’s a Queryable Map? Think of modern databases. They allow us to efficiently fetch records by primary key while also allowing us to create secondary indexes that fetch records by other attributes. The idea is to have those same indexing and querying capabilities but in a Java Collection.

We would like to have an in-memory Collection that allows us to fetch data efficiently by any number of attributes.

You can find the code in this github repository: https://github.com/lucasmajerowicz/java-queryable-map.

The way QueryableMap works is very much like a document database. When creating an instance of QueryableMap we need to define what the primary key will be and optionally what attributes we would like to index. Once there is data in the collection, we can either fetch it by primary key or by any of the indeed attributes.

The key principles that guided me in the implementation were:

  • QueryableMap should support any data type.
  • Adding, retrieving and removing data should be as efficient as possible
  • Adding indexes should require minimum coding

Let’s look at an example and see how things work. Take the following Entity classes (assume they all contain Getters for their private fields):

public class Order {
    private final String id;
    private final Customer customer;
    private final List items;
    ...
}

public class Customer {
    private final String id;
    private final String name;
    ...
}

public class LineItem {
    private final String itemId;
    private final int quantity;
    private final float price;
    ...
}

We want to store a collection of Orders and be able to fetch orders by any of the following fields: order id, customer id and item id. Note that an order might contain several items. Here is how you would do it using QueryableMap:

QueryableMap map = QueryableMap.newBuilder()
                .keyFunction(Order::getId)
                .addIndex("customer_index", order -> order.getCustomer().getId())
                .addIndex("item_id_index", order -> order.getItems().stream().map(LineItem::getItemId).collect(toList()))
                .build();

To create an instance of QueryableMap we need to define which field from Order will act as the primary key (in our case Order::getId). For each index we want to add, we need to give it a name and define which field from Order to index. For example, since we would like to index the order’s customer id, we’ve created the following lambda order -> order.getCustomer().getId(), which returns the corresponding customer id for a given order.

For the item id index, note that since an order might contain several items, our lambda returns a list of item ids. Internally QueryMap will index each value in the list separately.

Once the map has been created, inserting data is as simple as

Order order = new Order(...);
map.put(order);

In order to retrieve data, we can do so either by key as follows

Order order = map.get("order id")

or we can “run some queries” based on the previously defined indexes as follows:

Collection<Order> orders_with_customer_1 = map.query("customer_index", "customer_1");
Collection<Order> orders_containing_item_1 = map.query("item_index", "item_1");

Note that the primary key and values to be indexes can be of any type. In this example, they are all strings but that’s a coincidence.

Implementation

The implementation of QueryableMap is surprisingly short and consists of two classes.

Firstly we have a class called SecondaryIndex, which works very similarly to Multi Map: it allows us to store multiple values for a single key.

public class SecondaryIndex {
    private Map> map = new ConcurrentHashMap<>();

    public void put(K k, V v) {
        Set vs = map.computeIfAbsent(k, k1 -> new HashSet<>());
        vs.add(v);
    }

In our case, we’ll have an instance of SecondaryIndex for every field we want to index. The keys will contain the different values present in our data for the indexed field. The values, for each key, will be the document ids that have that key. For example, the following entry in the secondary index [“customer_1”] -> [“1”, “5”] would indicate that the orders with ids 1 and 5 belong to customer_1.
The public interface of SecondaryIndex contains a get method, which allows to get all the corresponding document ids for a given attribute value, and a remove which allows to remove a document id from the index.

Secondly we have the QueryableMap. An instance of this class will contain a regular map, which will contain the values or records indexed by id. In addition, for each defined secondary index, it will contain a pair of Function and SecondaryIndex.

public class QueryableMap {
    private final Function keyFunction;
    private Map map = new ConcurrentHashMap<>();
    private Map indices = new ConcurrentHashMap<>();
	...

	private static class IndexEntry {
			private final String name;
			private final Function function;
			private final SecondaryIndex map;	
		...
	}
}

Every time we add a new record to QueryableMap, the record will be first inserted in the regular map. This will allow us to fetch records by id. For each registered index, we’ll use the function to compute the value for the index and insert it into the SecondaryIndex instance: the key will the value we computed and the value will be the record id.

public void put(V value) {
        K key = keyFunction.apply(value);

        delete(key);
        map.put(key, value);

        indices.values().forEach(index -> {
            Object indexKey = index.getFunction().apply(value);
            if (indexKey instanceof Collection) {
                ((Collection) indexKey)
                        .forEach(eachIndexKey -> index.getMap().put(eachIndexKey, key));
            } else {
                index.getMap().put(indexKey, key);
            }
        });
    }

Note that when the result of the function is a collection, we index each value separately.

Querying is taking the correct secondary index, finding all the record ids that match the given input value, and then returning corresponding records:

    public Collection query(String indexName, Object value) {
        SecondaryIndex indexMap = indices.get(indexName).getMap();

        return indexMap.get(value)
                .stream()
                .map(map::get)
                .collect(Collectors.toList());
    }

Memory Consumption

Based on tests that I have run, with the current implementation and assuming the record ids are UUIDs, every index we add takes about 40 MBs per million records in the map. The reason indexes require that amount of memory is because we are using HashSets to store the mappings between values and keys (https://github.com/lucasmajerowicz/java-queryable-map/blob/master/src/main/java/SecondaryIndex.java#L8). By using HashSets, removing elements from the map becomes very efficient.

If you are concerned about memory consumption and can afford having less efficient deletes, you could use ArrayLists or Sorted Lists instead of HashSets. In this case, the 40 MBs per million records becomes only 5 MBs.

Limitation

The current implementation only contains basic Collection operations: add, fetch and remove. Also as you might have noticed, it only supports querying using equality. Adding support for range queries would not be hard to do.

Another limitation is that the fields to be indexed should be immutable.

Software Development
Spring Data Couchbase – Pitfalls to Avoid
Software Development
Want to be a great developer? Start reading Schopenhauer
Software Development
Are You Agile? 6 Signs Your Team May Not Be