
Blekko’s Pretty Rad


Writing comments in your code is fine and dandy but sometimes its just a fucking waste of time. Here’s an example:
<?phpclass UserDataAccessClass{ /** * Get the number of followers a user has * * @param string $userID The user id * * @return int */ public function getNumFollowers($userID) { return valueFromAServiceCallOrQueryEtc($userID) }}?>So, yay – that chunk of code passes PHPCS (using the PEAR standard). All the parameters are documented, there’s a line of text explaining what the function does and the docblock even states the return type… how cute!
But why? Why do I have to type all that crap? The function’s name is self documenting & its sole parameter is obvious. The return type makes sense to me but the rest is bullshit. God forbid your function takes multiple parameters; then you’d have to line up the @param‘s types and descriptions ’cause PHP people have a strange hardon for lining shit up.
The truth is the only reason I do all that stuff is cuz we run PHPCS on our code at work and I don’t wanna be “that guy”. If it wasn’t a “standrad” at work there’s no way in hell id ever bother.
I’d much have the documentation go like this instead:
<?phpclass UserDataAccessClass{ /** * @return int */ public function getNumFollowers(string $userID) { return valueFromAServiceCallOrQueryEtc($userID) }}?>…And to be 100% honest the only reason I’d include the @return is ’cause I’m an Eclipse & it helps PDT’s static analysis (aka autocomplete gets more better).


I’ve been playing with #neo4j quite a bit lately. It’s a great & fun project. It’s a graph database that mitigates all the bullshit you have to deal with when trying to, ya know, do graph stuff. Example: find all User Nodes who’s gender property is set to female, have an outgoing likes relationship to the Node punk music and are less than 3 degrees of separation from Node #4. Stuff like that. Its super good at doing this.
But here’s the deal… each Node is gettable via ID which is nice – but the ID’s are Neo4J’s internal ID; you don’t get to set ‘em when you create a Node. So, what if I want to get a Node who’s username property is phatduckk & start the traversal from there? The problem lies in the fact that you don’t know that phatduckk is Node #4 so you need a simple & efficient way to do that lookup & grab that Node.
If your dataset is small, I guess, you can just use a Map and store the mapping yourself but that solution will fall over pretty quickly. You could also toss info into MySQL but why would you do that? It just doesn’t feel right to use 2 different stores. So, checking out some of the docs you’ll see that Neo4J’s got some indexing capabilities.
Initially I tried out the SingleValueIndex which fell over in a multi-threaded scenario. So, I hit up the list and was advised to check out the LuceneIndexService. This worked like a charm. Even with multiple threads constantly indexing the same Node.
Here’s a little test app. It’s a brute force, little hack that creates a single Node and indexes it by its username property 100,000 times using 10 threads. This is a pretty unrealistic situation but I really wanted to make sure it behaved well in a multi-threaded scenario and didn’t frustrate me like the SingleValueIndex did.
package com.digg.tmp;
import org.neo4j.api.core.*;import org.neo4j.util.index.IndexService;import org.neo4j.util.index.LuceneIndexService;
import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors;
public class LuceneIndex { private static final String USERNAME_INDEX = "usernameIndex"; private static final int NUM_THREADS = 10; private static final int NUM_LINES = 1000000; private static final String USERNAME = "phatduckk";
public static void main(String[] args) { // always use a new store NeoService neo = new EmbeddedNeo("test-" + System.currentTimeMillis());
// now create the node we want indexed: Transaction txUser = neo.beginTx(); Node userNode = neo.createNode(); userNode.setProperty(USERNAME_INDEX, USERNAME); txUser.success(); txUser.finish();
// now create the index & setup a pool IndexService idxServ = new LuceneIndexService(neo); final ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS);
// now let's index that same node NUM_LINES times // the reason we're indexing the same node is cuz i'm checking for thread safety during indexing issues // otherwise you'd normally be indexing new nodes who's data you got from some external source for (int i = 0; i < NUM_LINES; i++) { System.out.println("line: " + i); IndexRunner command = new IndexRunner(userNode, neo, idxServ); executorService.execute(command); }
// should do a clean neo.shutdown() at some point ;-) }
static class IndexRunner implements Runnable { NeoService neo; IndexService idxServ; Node userNode;
IndexRunner(Node userNode, NeoService neo, IndexService idxServ) { this.userNode = userNode; this.neo = neo; this.idxServ = idxServ; }
public void run() { Transaction nodetx = neo.beginTx(); Node nodeFromIndex = idxServ.getSingleNode(USERNAME_INDEX, USERNAME); if (nodeFromIndex != null) { System.out.println("found " + USERNAME + " in the " + USERNAME_INDEX + " index. Node ID is: " + nodeFromIndex.getId()); } else { idxServ.index(userNode, USERNAME_INDEX, USERNAME); } nodetx.success(); nodetx.finish(); } }}Although this is an off the wall example it can also serve as a simple example of how to index a Node. Anywho – hope this helps out a few folks that ran into the same needs/problems/scenarios I did. In hindsight it’s all pretty simple & straightforward – I just went down the wrong path with the SingleValueIndex… when browsing the docs it sounded like the right tool for the job but, from what I can tell, you should avoid it and use the LuceneService instead.


<?php
if (! isset($argv[1])) { echo "enter a search term:\n"; echo 'php ' . __FILE__ . " <search_term>\n"; exit;}
$term = urlencode($argv[1]);$url = "http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStoreServices.woa/wa/wsSearch?limit=10&entity=software&term=$term";$json = file_get_contents($url);
print_r(json_decode($json, true));
?>More info here.


I stumbled upon Paul William’s plugin for embedding a Gist into a WordPress blog.
Its a quick ‘n clean plugin but it relies on a JS <script> tag to render the Gist’s content… so, I made a quick tweak to get the plugin to actually put the Gist’s content into your HTML source. There may already be something similar but, eh, it was just a quick hackjob.
The plugin’s code and instructions for installation & usage are in the Gist below.
<?php/*Plugin Name: Gistson - Embedded Gist WP PluginPlugin URI: http://arin.me/blog/tag/gistsonDescription: Use a shortcode [gist id="12345"] to embed A Gist from http://gist.github.com into your blogVersion: 0.1Author: Arin SarkissianAuthor URI: http://arin.me
Copyright 2009 Arin Sarkissian
This program is free software; you can redistribute it and/or modifyit under the terms of the GNU General Public License as published bythe Free Software Foundation; either version 2 of the License, or(at your option) any later version.
This program is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See theGNU General Public License for more details.
You should have received a copy of the GNU General Public Licensealong with this program; if not, write to the Free SoftwareFoundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA*/
/*CREDIT: Heavily based on Paul William's plugin: http://www.entropytheblog.com/blog/ http://www.entropytheblog.com/blog/2008/12/wordpress-github-gist-shortcode-plugin/ Main difference is that this version doesn't do a JS, <script>, embed... the code from your gist is actually in the HTML source.
INSTALL: Toss the gistson.php file into your blogs wp-content/plugins folder. Login to WP and enable the plugin.
USE: Put this <LINK> tag in <HEAD> of header.php <link rel="stylesheet" href="http://gist.github.com/stylesheets/gist/embed.css"/> When you wanna embed a gist just type in: [gist id="gist-id-from-gist.github.com-here"] example: [gist id="250709"] You can exclude the attribution by doing this: [gist id="250709" nometa="true"] This is useful for when you have multiple gists. But for big chunks of code etc I'd encourge you to keep the attribution cuz those guys have a business to run*/
function gist_shortcode_func($atts, $content = null) { $url = 'http://gist.github.com/' . trim($atts['id']) . '.json'; $json = file_get_contents($url); $assoc = json_decode($json, true);
if (isset($atts['nometa'])) { // you'll end up with 2 1px borders at the bottom =( $assoc['div'] = preg_replace('/<div class="gist\-meta">.*?(<\/div>)/is', '', $assoc['div']); }
return $assoc['div'];
}add_shortcode('gist', 'gist_shortcode_func');
?>Oh ya – I named it Gistson ’cause it grabs the Gist’s data via an HTTP GET to a JSON doc. Ya, I know, not too creative.

