December 11th, 2009

Indexing Nodes in Neo4J

I’ve been playing with #neo4j quite a bit lately. It’s a great & fun project. It’s a graph database that mitigates all the bullshit you have to deal with when trying to, ya know, do graph stuff. Example: find all User Nodes who’s gender property is set to female, have an outgoing likes relationship to the Node punk music and are less than 3 degrees of separation from Node #4. Stuff like that. Its super good at doing this.

But here’s the deal… each Node is gettable via ID which is nice – but the ID’s are Neo4J’s internal ID; you don’t get to set ‘em when you create a Node. So, what if I want to get a Node who’s username property is phatduckk & start the traversal from there? The problem lies in the fact that you don’t know that phatduckk is Node #4 so you need a simple & efficient way to do that lookup & grab that Node.

If your dataset is small, I guess, you can just use a Map and store the mapping yourself but that solution will fall over pretty quickly. You could also toss info into MySQL but why would you do that? It just doesn’t feel right to use 2 different stores. So, checking out some of the docs you’ll see that Neo4J’s got some indexing capabilities.

Initially I tried out the SingleValueIndex which fell over in a multi-threaded scenario. So, I hit up the list and was advised to check out the LuceneIndexService. This worked like a charm. Even with multiple threads constantly indexing the same Node.

Here’s a little test app. It’s a brute force, little hack that creates a single Node and indexes it by its username property 100,000 times using 10 threads. This is a pretty unrealistic situation but I really wanted to make sure it behaved well in a multi-threaded scenario and didn’t frustrate me like the SingleValueIndex did.

package com.digg.tmp;

import org.neo4j.api.core.*;
import org.neo4j.util.index.IndexService;
import org.neo4j.util.index.LuceneIndexService;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class LuceneIndex {
    private static final String USERNAME_INDEX = "usernameIndex";
    private static final int NUM_THREADS = 10;
    private static final int NUM_LINES = 1000000;
    private static final String USERNAME = "phatduckk";

    public static void main(String[] args) {
        // always use a new store
        NeoService neo = new EmbeddedNeo("test-" + System.currentTimeMillis());

        // now create the node we want indexed:
        Transaction txUser = neo.beginTx();
        Node userNode = neo.createNode();
        userNode.setProperty(USERNAME_INDEX, USERNAME);
        txUser.success();
        txUser.finish();

        // now create the index & setup a pool
        IndexService idxServ = new LuceneIndexService(neo);
        final ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS);

        // now let's index that same node NUM_LINES times
        // the reason we're indexing the same node is cuz i'm checking for thread safety during indexing issues
        // otherwise you'd normally be indexing new nodes who's data you got from some external source
        for (int i = 0; i < NUM_LINES; i++) {
            System.out.println("line: " + i);
            IndexRunner command = new IndexRunner(userNode, neo, idxServ);
            executorService.execute(command);
        }

        // should do a clean neo.shutdown() at some point ;-)
    }

    static class IndexRunner implements Runnable {
        NeoService neo;
        IndexService idxServ;
        Node userNode;

        IndexRunner(Node userNode, NeoService neo, IndexService idxServ) {
            this.userNode = userNode;
            this.neo = neo;
            this.idxServ = idxServ;
        }

        public void run() {
            Transaction nodetx = neo.beginTx();
            Node nodeFromIndex = idxServ.getSingleNode(USERNAME_INDEX, USERNAME);
            
            if (nodeFromIndex != null) {
                System.out.println("found " + USERNAME + " in the " + USERNAME_INDEX
                        + " index. Node ID is: " + nodeFromIndex.getId());
            } else {
                idxServ.index(userNode, USERNAME_INDEX, USERNAME);
            }
            
            nodetx.success();
            nodetx.finish();
        }
    }
}

Although this is an off the wall example it can also serve as a simple example of how to index a Node. Anywho – hope this helps out a few folks that ran into the same needs/problems/scenarios I did. In hindsight it’s all pretty simple & straightforward – I just went down the wrong path with the SingleValueIndex… when browsing the docs it sounded like the right tool for the job but, from what I can tell, you should avoid it and use the LuceneService instead.

  • Digg
  • del.icio.us
  • Facebook
  • Reddit
  • Twitter
Tagged: , , ,
<?php

if (! isset($argv[1])) {
    echo "enter a search term:\n";
    echo 'php ' . __FILE__ . " <search_term>\n";
    exit;
}

$term = urlencode($argv[1]);
$url = "http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStoreServices.woa/wa/wsSearch?limit=10&entity=software&term=$term";
$json = file_get_contents($url);

print_r(json_decode($json, true));

?>

More info here.

  • Digg
  • del.icio.us
  • Facebook
  • Reddit
  • Twitter
Tagged: , ,

I stumbled upon Paul William’s plugin for embedding a Gist into a WordPress blog.

Its a quick ‘n clean plugin but it relies on a JS <script> tag to render the Gist’s content… so, I made a quick tweak to get the plugin to actually put the Gist’s content into your HTML source. There may already be something similar but, eh, it was just a quick hackjob.

The plugin’s code and instructions for installation & usage are in the Gist below.

<?php
/*
Plugin Name: Gistson - Embedded Gist WP Plugin
Plugin URI: http://arin.me/blog/tag/gistson
Description: Use a shortcode [gist id="12345"] to embed A Gist from http://gist.github.com into your blog
Version: 0.1
Author: Arin Sarkissian
Author URI: http://arin.me

Copyright 2009 Arin Sarkissian

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/

/*
CREDIT:
Heavily based on Paul William's plugin:
http://www.entropytheblog.com/blog/
http://www.entropytheblog.com/blog/2008/12/wordpress-github-gist-shortcode-plugin/
Main difference is that this version doesn't do a JS, <script>, embed... the code from your gist is
actually in the HTML source.

INSTALL:
Toss the gistson.php file into your blogs wp-content/plugins folder. Login to WP and enable the plugin.

USE:
Put this <LINK> tag in <HEAD> of header.php
<link rel="stylesheet" href="http://gist.github.com/stylesheets/gist/embed.css"/>
When you wanna embed a gist just type in:
[gist id="gist-id-from-gist.github.com-here"]
example:
[gist id="250709"]
You can exclude the attribution by doing this:
[gist id="250709" nometa="true"]
This is useful for when you have multiple gists. But for big chunks of code etc
I'd encourge you to keep the attribution cuz those guys have a business to run
*/

function gist_shortcode_func($atts, $content = null) {
$url = 'http://gist.github.com/' . trim($atts['id']) . '.json';
$json = file_get_contents($url);
$assoc = json_decode($json, true);

if (isset($atts['nometa'])) {
        // you'll end up with 2 1px borders at the bottom =(
$assoc['div'] = preg_replace('/<div class="gist\-meta">.*?(<\/div>)/is', '', $assoc['div']);
}

return $assoc['div'];

}
add_shortcode('gist', 'gist_shortcode_func');

?>
view raw Gistson.php This Gist brought to you by GitHub.

Oh ya – I named it Gistson ’cause it grabs the Gist’s data via an HTTP GET to a JSON doc. Ya, I know, not too creative.

  • Digg
  • del.icio.us
  • Facebook
  • Reddit
  • Twitter
Tagged: , , ,
December 7th, 2009

I Want A More Better A3

So, I’ve had my current A3 for about 2 years now and I love it but…

There’s a lot to like about the car but I settled when I bought. I had an A4 Wagon before that and dug it but never really made peace with the fact that I was driving around in a station wagon. So, one day, I randomly popped into the dealership and traded it in for the A3.

I was looking for something smaller and a bit less expensive so I didn’t pop for some features. One feature I was stoked on was the iPod integration & did get that… unfortunately I didn’t bother trying it before I signed all the paperwork and it turns out it sucked.

Anyways – over the last couple years I’ve been mostly happy with the A3 but really wished I had the nav (better iPod integration), bigger wheels, Quatro & a few other niceties that, at the time, I was convinced I didn’t want/need.

Well, I’m pretty close to deciding that I kinda want it all at this point. I don’t want a different car – just a “better” version of the one I already have. So, today I almost got the car above; things didn’t work at (dealer was a dick) so I didn’t get it. I’m glad I didn’t though… for now. That one had everything I wanted except the nav/ipod kit. Turns out there’s none in America w/ the config I want and ordering’s my only option.

I’m gonna think on this for a bit and see what happens. If you know me (even remotely) then you know the odds are pretty high that I’ll be placing an order pretty soon :) .

  • Digg
  • del.icio.us
  • Facebook
  • Reddit
  • Twitter
Tagged: ,
Lots of responses on my laptop question. Thanks for the feedback guys 2 hrs ago

Search This Blog