IMDB Finding the Path

Neo4j Wiki から

Here we'll look into an area where a graph database like Neo4j shines: following the paths between nodes.

[edit] How to find a shortest path between two nodes

The basic idea of finding the shortest path is to run two traversers until they meet. The traversers are created from the input nodes.

[edit] Interfaces

Some interfaces you want to know about:

The search for paths has to stop at some point (there may be no path between the nodes!), so we'll need a StopEvaluator:

public interface StopEvaluator
{
	public boolean isStopNode( TraversalPosition currentPos );
}

And we'll have to check if it's time to return some nodes from the traversal as well. What about using a ReturnableEvaluator?

public interface ReturnableEvaluator
{	
	public boolean isReturnableNode( TraversalPosition currentPos );
}

Feeding the traverser the implementations of these two interfaces, things can't go wrong!

As the methods that we need to implement both get a TraversalPosition as input, you may be interested in having a look at this interface as well.

public interface TraversalPosition
{
	public Node currentNode();
	
	// null if start node
	public Node previousNode();

	// null if start node
	public Relationship lastRelationshipTraversed();

	public int depth();

	public int returnedNodesCount();
	
	public boolean notStartNode();

	public boolean isStartNode();
}

What we have so far would look something like this:

Image:Imdb.util.png


To make Spring framework (and ourselves) happy we use an interface for the path finder:

public interface PathFinder
{
    List<Node> shortestPath( Node startNode, Node endNode, RelationshipType relType );
}

[edit] Implementation

We start out by defining the maximum depth to traverse (from both ends) and implement the interface:

public class SimplePathFinder implements PathFinder
{
    private static final int MAXIMUM_DEPTH = 5;

    public List<Node> shortestPath( final Node startNode, final Node endNode, final RelationshipType relType )
    {
        return findPath( startNode, endNode, relType );
    }

Now for the real work to be performed. We will use a simple approach here, using two maps that map the traversed nodes for each traverser to the path to the node. This way we can easily track back the full path when the traversers meet.

What will be returned is a list starting with an actor/actress node and then continuing by pairs of (movie, actor/actress) nodes until the other end is reached.

    private List<Node> findPath( final Node startActor, final Node endActor, final RelationshipType relType )
    {
        final Map<Node,List<Node>> traversedNodes1 = new HashMap<Node,List<Node>>();
        final Map<Node,List<Node>> traversedNodes2 = new HashMap<Node,List<Node>>();

To control the traversing process, we will create implementations of the StopEvaluator and ReturnableEvaluator interfaces. We set up the traversers using those implementations and go for a breadth-first traversal following relationships in both directions, constrained by the given relationship type to use.

        final StopEvaluator stopEval = new PathStopEval();
        final PathReturnEval returnEval1 = new PathReturnEval( traversedNodes1, traversedNodes2 );
        final PathReturnEval returnEval2 = new PathReturnEval( traversedNodes2, traversedNodes1 );
        final Traverser trav1 = startActor.traverse( Order.BREADTH_FIRST, stopEval, returnEval1, relType, Direction.BOTH );
        final Traverser trav2 = endActor.traverse( Order.BREADTH_FIRST, stopEval, returnEval2, relType, Direction.BOTH );

Now everything is set up, so we just iterate from both ends and check for a match. We return whatever match that happens to be the first we find. As we use breadth-first traversals we can be sure that there's no shorter path to be found.

As we want the resulting path to end in the Bacon node for all cases, we reverse it in the second case.

        Iterator<Node> itr1 = trav1.iterator();
        Iterator<Node> itr2 = trav2.iterator();
        while ( itr1.hasNext() || itr2.hasNext() )
        {
            if ( itr1.hasNext() )
            {
                itr1.next();
            }
            if ( returnEval1.getMatch() != null )
            {
                return returnEval1.getMatch();
            }
            if ( itr2.hasNext() )
            {
                itr2.next();
            }
            if ( returnEval2.getMatch() != null )
            {
                List<Node> path = returnEval2.getMatch();
                Collections.reverse( path );
                return path;
            }
        }
        return Collections.emptyList();
    }

Our stop evaluator will set a limit on the depth of the traversals:

    private static class PathStopEval implements StopEvaluator
    {
        public boolean isStopNode( TraversalPosition currentPos )
        {
            return currentPos.depth() >= MAXIMUM_DEPTH;
        }
    }

The ReturnableEvaluator starts out by declaring fields for the maps for traversed nodes of this traverser and the other traverser. We also use a field for the list of matches.

    private static class PathReturnEval implements ReturnableEvaluator
    {
        private final Map<Node,List<Node>> myNodes;
        private final Map<Node,List<Node>> otherNodes;
        private List<Node> match = null;

        public PathReturnEval( final Map<Node,List<Node>> myNodes, final Map<Node,List<Node>> otherNodes )
        {
            this.myNodes = myNodes;
            this.otherNodes = otherNodes;
        }

The next method to look at is the heart of the path finder. First, it will find out the path to the current node, constructed by using the path to the previous node and adding the previous node to it.

        public boolean isReturnableNode( TraversalPosition currentPos )
        {
            Node prevNode = currentPos.previousNode();
            List<Node> prevList = Collections.emptyList();
            List<Node> newList = new LinkedList<Node>();
            if ( prevNode != null )
            {
                prevList = myNodes.get( prevNode );
                newList.addAll( prevList );
                newList.add( prevNode );
            }

Having a trace backwards, it's time to look forward, and check if the current node was found by the other traverser as well. If it wasn't, we just update the map. If it was, then we found a match, and construct the full path of the match.

Getting the full path is done by adding together:

  • path from the start node to the current node
  • the current node
  • the path from the current node to the end node (the order of this part must be reversed)

At the end we return true to make sure that all the nodes within the maximum depth level will be traversed.

            Node currentNode = currentPos.currentNode();
            if ( !otherNodes.containsKey( currentNode ) )
            {
                myNodes.put( currentNode, newList );
            }
            else
            {
                List<Node> otherList = otherNodes.get( currentNode );
                match = new LinkedList<Node>( newList );
                match.add( currentNode );
                Collections.reverse( otherList );
                match.addAll( otherList );
            }
            return true;
        }

Finally, we just want to return the path we found (or null if no path was found).

        protected List<Node> getMatch()
        {
            return match;
        }
    }
Conclusions:
  • traversers provide a convenient and flexible way to navigate the graph
  • it may take some time to get comfortable using the evaluators, but it's well worth it


Next part: IMDB Web application Index page: overview

Neo4j のサイト
ツールボックス