
Indexing Nodes in Neo4J
I’ve been playing with #neo4j quite a bit lately. It’s a great & fun project. It’s a graph database that mitigates all the bullshit you have to deal with when trying to, ya know, do graph stuff. Example: find all User Nodes who’s gender property is set to female, have an outgoing likes relationship to the Node punk music and are less than 3 degrees of separation from Node #4. Stuff like that. Its super good at doing this.
But here’s the deal… each Node is gettable via ID which is nice – but the ID’s are Neo4J’s internal ID; you don’t get to set ‘em when you create a Node. So, what if I want to get a Node who’s username property is phatduckk & start the traversal from there? The problem lies in the fact that you don’t know that phatduckk is Node #4 so you need a simple & efficient way to do that lookup & grab that Node.
If your dataset is small, I guess, you can just use a Map and store the mapping yourself but that solution will fall over pretty quickly. You could also toss info into MySQL but why would you do that? It just doesn’t feel right to use 2 different stores. So, checking out some of the docs you’ll see that Neo4J’s got some indexing capabilities.
Initially I tried out the SingleValueIndex which fell over in a multi-threaded scenario. So, I hit up the list and was advised to check out the LuceneIndexService. This worked like a charm. Even with multiple threads constantly indexing the same Node.
Here’s a little test app. It’s a brute force, little hack that creates a single Node and indexes it by its username property 100,000 times using 10 threads. This is a pretty unrealistic situation but I really wanted to make sure it behaved well in a multi-threaded scenario and didn’t frustrate me like the SingleValueIndex did.
package com.digg.tmp; import org.neo4j.api.core.*;import org.neo4j.util.index.IndexService;import org.neo4j.util.index.LuceneIndexService; import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors; public class LuceneIndex { private static final String USERNAME_INDEX = "usernameIndex"; private static final int NUM_THREADS = 10; private static final int NUM_LINES = 1000000; private static final String USERNAME = "phatduckk"; public static void main(String[] args) { // always use a new store NeoService neo = new EmbeddedNeo("test-" + System.currentTimeMillis()); // now create the node we want indexed: Transaction txUser = neo.beginTx(); Node userNode = neo.createNode(); userNode.setProperty(USERNAME_INDEX, USERNAME); txUser.success(); txUser.finish(); // now create the index & setup a pool IndexService idxServ = new LuceneIndexService(neo); final ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS); // now let's index that same node NUM_LINES times // the reason we're indexing the same node is cuz i'm checking for thread safety during indexing issues // otherwise you'd normally be indexing new nodes who's data you got from some external source for (int i = 0; i < NUM_LINES; i++) { System.out.println("line: " + i); IndexRunner command = new IndexRunner(userNode, neo, idxServ); executorService.execute(command); } // should do a clean neo.shutdown() at some point ;-) } static class IndexRunner implements Runnable { NeoService neo; IndexService idxServ; Node userNode; IndexRunner(Node userNode, NeoService neo, IndexService idxServ) { this.userNode = userNode; this.neo = neo; this.idxServ = idxServ; } public void run() { Transaction nodetx = neo.beginTx(); Node nodeFromIndex = idxServ.getSingleNode(USERNAME_INDEX, USERNAME); if (nodeFromIndex != null) { System.out.println("found " + USERNAME + " in the " + USERNAME_INDEX + " index. Node ID is: " + nodeFromIndex.getId()); } else { idxServ.index(userNode, USERNAME_INDEX, USERNAME); } nodetx.success(); nodetx.finish(); } }}Although this is an off the wall example it can also serve as a simple example of how to index a Node. Anywho – hope this helps out a few folks that ran into the same needs/problems/scenarios I did. In hindsight it’s all pretty simple & straightforward – I just went down the wrong path with the SingleValueIndex… when browsing the docs it sounded like the right tool for the job but, from what I can tell, you should avoid it and use the LuceneService instead.


