設計ガイド

Neo4j Wiki から

This page is work-in-progress. Send any questions you may have to the mailing list and we'll make sure you get an answer!

So you have read through and understood the Getting Started In One Minute Guide, completed the Getting Started Guide and have a basic understanding of the Neo4j API but still don't really know how to begin using Neo4j in your project?Then this is the guide for you. In this document we will provide you with howtos and design guidelines on how to deal with a number of common scenarios that you will be facing in a normal project.


目次

[edit] How to wrap nodes in POJOs

POJO is an acronym for Plain Old Java Object

[edit] Basic structure

Let us consider an example. You are building a new shiny community website with e-commerce functionality and this time want to store all your customers in Neo4j. You have concluded that you need a Customer interface to represent your customers. So how do we implement this in Neo4j?

public interface Customer
{
    public void setFirstName( String firstName );

    public String getFirstName();

    // ...
}

A common pattern is to wrap a Neo4j node within a POJO that implements the domain interface. In our Customer example, this means:

import org.neo4j.graphdb.Node;

public class CustomerImpl implements Customer
{
    private final Node underlyingNode;
    
    private static final String KEY_FIRST_NAME = "firstName";

    public CustomerImpl( Node underlyingNode )
    {
        this.underlyingNode = underlyingNode;
    }

    public void setFirstName( String firstName )
    {
        underlyingNode.setProperty( KEY_FIRST_NAME, firstName );
    }

    public String getFirstName()
    {
        return ( String ) underlyingNode.getProperty( KEY_FIRST_NAME );
    }

    // ...
}

So we have a POJO called CustomerImpl that wraps a Neo4j node. CustomerImpl is a wrapper object that delegates all operations to the underlying node.

The code that creates the Customer object is responsible for first creating a new Node and supplying it in the constructor when creating the Customer object. You might prefer centralizing the creational code in a factory rather than spreading it around in the application. See the section below for more tips on how to do that.

[edit] How to reach the enclosing POJO from the node instance

So you have a Node instance and you know what type it's supposed to be, but don't know how to transform it?Then maybe you've been guilty of adding state to your POJOs. Because if no state exists in your POJO other than the Node itself all you have to do is:

    Customer customer = new CustomerImpl( node );

[edit] Coding relationships between objects

Now consider the possibility that your customers buy things on your website. You want to model this by creating an Order object and connecting that object to the Customer.

public interface Order 
{
    public void setOrderNumber( long orderNumber );
	
    public long getOrderNumber();
    
    public Customer getCustomer();
}

The Order implementation would have the underlying node and implement the set/getOrderNumber() the same way as shown in Customer implementation for set/getFirstName() but what about getCustomer() method?How do we connect an order to a customer?By using relationships!We want to create a relationship as illustrated by the ASCII art below:

(Customer Node)---CUSTOMER_TO_ORDER--->(Order Node)     

This is the same as saying that the Customer has a relationship to a node representing an Order, meaning the Customer "ordered" the Order.

To code this, we first need to define the CUSTOMER_TO_ORDER relationship type in our realization of the RelationshipType interface (adding of relationship types has been covered in Getting_Started_Guide or API documentation for GraphDatabaseService).

import org.neo4j.graphdb.Node;

public class OrderImpl implements Order
{
    private final Node underlyingNode;
    
    // ...

    public Customer getCustomer()
    {
        Node customerNode = underlyingNode.getSingleRelationship(
            RelationshipTypes.CUSTOMER_TO_ORDER, Direction.INCOMING ).getStartNode();
        return new CustomerImpl( customerNode );
    }
}

So what happens here is every time a call is made to getCustomer() in OrderImpl, it retrieves an incoming relationship of type CUSTOMER_TO_ORDER and fetches the starting node of that relationship. The starting node on a relationship of type CUSTOMER_TO_ORDER represent a customer so we can just create the customer passing that node into the CustomerImpl constructor.

What if there is no such relationship to our Order node?getSingleRelationship() will return null and we'll get a NullPointerException!In this example, it seems reasonable that an Order has exactly one Customer (never zero, never two). And if for some reason this is not the case, it represents a fatal error that should generate an unchecked exception. getSingleRelationship() is a convenience method designed for this commonly occuring scenario and thus does the Right Thing™ by returning null.

Actually getSingleRelationship() behaves correctly "out-of-the-box" in two scenarios:
The first is when we have a domain invariant stating that a concept should at all times have an association to exactly ONE instance of another concept. This is true of our Customer/Order example where, if we find ourselves without a Customer for an Order, we're in big trouble and should raise an unchecked exception. (See item 40, Effective Java.) In this scenario, we simply use getSingleRelationship() and assume that it will return a valid relationship. If it returns null, well, there we have our unchecked exception.
The second is when we have either ZERO or ONE associations to another concept. You could imagine in our example that an Order could optionally be put in a Cart. Sometimes it wouldn't (ZERO relationships), sometimes it would (ONE relationship) but it would never belong to two or more Carts. In this scenario, we use getSingleRelationship() as above but inject a check for whether it returns null and handles that appropriately. If getSingleRelationship() finds more than one relationship of the given type and direction, it will raise an unchecked exception.

For more information, see the API documentation of getSingleRelationship().

[edit] Self-relationships

When designing your domain model there might be a use case where you'd want to create a relationship where the start node and end node are the same. Unfortunately Neo4j doesn't support that. Instead you'll have to do a little work-around for that. One solution is to create a middle-node, see this ASCII example:

   (Node)----[REL_TYPE]---
      ^                  |
      |                  v
      |            (middle-node)
      |                  |
      -------[REL_TYPE]--|

You can think of the middle-node as a node which just sits in the middle of the self-relationship. The down side is that your code will have to handle this as a special case.

[edit] Adding a relationship between wrapped nodes

A common need is to connect two objects. For example, we would need to connect an Order object to a Customer object.

Example:

To add an Order to a customer object we define the following method in the Customer Interface:

    public void addOrder( Order order );

As we have described above, we have a relationship type CUSTOMER_TO_ORDER that is used to relate a Customer node to an Order node. So our implementation should then look something like:

    public void addOrder( Order order )
    {
        underlyingNode.createRelationshipTo( orderNode, RelationshipTypes.CUSTOMER_TO_ORDER );
    }

But wait, we have a problem here: We actually do not have access to the orderNode since we have the wrapped Order object, not the node.

This problem will be so frequent that we will have to make the inner node available.

Solution: As long as you place all Neo4j wrapper classes in the same package, we can define the following method in them:

    Node getUnderlyingNode()
    {
        return underlyingNode;
    }

This makes the underlying node available to all classes in the inheritance hierarchy and in the same package as the class. With this in mind, we can now rewrite our addOrder method so that it works:

    public void addOrder( Order order )
    {
        Node orderNode = ( ( OrderImpl ) order ).getUnderlyingNode();
        getUnderlyingNode().createRelationshipTo( orderNode, RelationshipTypes.CUSTOMER_TO_ORDER );
    }

[edit] Going from collection of nodes to collections of wrappers

Going from a node to its enclosing wrapper POJO is trivial with a "properly" designed POJO domain layer, as displayed elsewhere in this section. Going from a collection of nodes to a collection of their wrapper POJOs is equally easy, but by necessity requires a bit more code. The basic idea is to iterate through the collection and create a wrapper for every node and put into a new collection.

For example, if we wanted to provide a method that returns all orders from a specific customer, it could look something like this:

    public Collection<Order> getOrders()
    {
        Collection<Node> orderNodes = getOrderNodes();
        return wrapNodeCollection( orderNodes );
    }

    private Collection<Node> getOrderNodes()
    {
        Traverser traverser = underlyingNode.traverse(
            Traverser.Order.BREADTH_FIRST, StopEvaluator.DEPTH_ONE,
            ReturnableEvaluator.ALL_BUT_START_NODE,
            RelationshipTypes.CUSTOMER_TO_ORDER, Direction.OUTGOING );
        return traverser.getAllNodes();
    }

    private Collection<Order> wrapNodeCollection( Collection<Node> orderNodes )
    {
        Collection<Order> orderCollection = new LinkedList<Order>();
        for ( Node node : orderNodes )
        {
            orderCollection.add( new OrderImpl( node ) );
        }
        return orderCollection;
    }

However, if you're dealing with larger numbers (millions) it's more efficient to use the iterator idiom. Neo4j makes use of the iterator idiom internally which makes it even more efficient. The basic idea in this case is to create an Iterator which wraps a traverser (which is a java Iterable) that represents our collection. Rewritten with this in mind, the new getOrders() looks like this:

    public Iterator<Order> getOrders()
    {
        return new Iterator<Order>()
        {
            private final Iterator<Node> iterator = underlyingNode.traverse(
                Traverser.Order.BREADTH_FIRST,
                StopEvaluator.DEPTH_ONE,
                ReturnableEvaluator.ALL_BUT_START_NODE,
                RelationshipTypes.CUSTOMER_TO_ORDER,
                Direction.OUTGOING ).iterator();

            public boolean hasNext()
            {
                return iterator.hasNext();
            }

            public Order next()
            {
                Node nextNode = iterator.next();
                return new OrderImpl( nextNode );
            }

            public void remove()
            {
                iterator.remove();
            }
        };
    }

Does this seem like an ample opportunity for a utility that creates and manages Neo4j-backed POJOs and their collection views of each other?You're right. It is. But more about that later.

[edit] Checking for equality between two wrapper POJOs

Maybe you have an Order instance you wonder whether that instance is equal to some other instance. Just override equals method by properly forwarding it to the underlying node.

    @Override
    public boolean equals( Object obj )
    {
        if ( obj instanceof OrderImpl )
        {
            return getUnderlyingNode().equals( 
                ( ( OrderImpl ) obj ).getUnderlyingNode() );
        }
        return false;
    }

    @Override
    public int hashCode()
    {
        return getUnderlyingNode().hashCode();
    }

[edit] Organizing your graph

Of course, there are probably many valid ways to make a good structure of your nodes for a given data model, and the good news with Neo4j is that the structure can easily evolve to meet new demands and requirements. A basic structure will be presented here that we believe is a convenient way to represent a traditional data model. We have in the pipe line a Neo_Meta_Model that will be included in Neo4j 1.1 using OWL to define your domain model.

[edit] Subreferences

In our data structure we have two major types of data: Customers and Orders. The question is how we add them to the graph. Starting out in Neo4j, we have a reference node, which you can think of being a known entry point into the graph. One idea is to add all Customer and Order nodes directly onto the reference node but it will have the drawback of making that node cluttered with lots of relationships as the application grows in size.

Instead we propose to create a subreference node and add it to the starting node. The subreference node will be the connection point for nodes of equal types.

Lets take our example with Customers. Instead of connecting the Customer node direcly onto the start node, we create a new node which will be our subreference node for customers.

The graph would then be:

(Start Node)---CUSTOMERS--->(Customer Subref Node)---CUSTOMER--->(Customer Node 1)
                                                  |
                                                  ---CUSTOMER--->(Customer Node 2)     

The benefits of this approach are:

  • Easier to follow and understand the graph.
  • The subreference node can be used to gather global data (about customers and orders).

We will see more of how the subreference node will be wrapped in the chapter about creating a Neo4j independent API.

[edit] How to create a Neo4j independent API

It is generally a very good idea to hide implementation details between components so that they are not tightly coupled to each other. As an example there is no point in most cases to let the business layer implementation depend on the inner workings of the data layer. A lot has been written and said about the importance of structuring your code to minimize coupling, and if you are feeling uncertain in this area we recommend spending some time reading up on the topic.

In this section, we want to highlight and show some traditional ways of doing this.

[edit] Use Interfaces

As we have already seen, we need to make our beans somewhat dependent on Neo4j. For example, the constructor takes a Neo4j Node object as parameter. So it would be a good design principle to extract all public methods into an interface, and use that interface in all calls.

[edit] Make use of Factories

In order to not spread out object creation throughout your code, it is a good idea to gather that in a few factory classes. For example, to create order classes that wraps a Neo4j Node, we could create a factory class called OrderFactory which has the responsibility of creating Orders.

[edit] Define an interface for the Factory

Lets define an Interface called OrderFactory.

public interface OrderFactory 
{
    public Order createOrder();	
}

[edit] The Factory can wrap a Subreference Node

As we saw earlier we can group nodes of the same type under something we called a subreference node. This node can hold common information regarding the subnodes, for example an id counter for generating unique ids. We will see how this can be done further down.

So if we have structured our nodes under a subreference node, we could then actually wrap the subreference node inside the factory implementation for that type of object.

public class OrderFactoryImpl implements OrderFactory
{
    private final GraphDatabaseService graphDb;

    private final Node orderFactoryNode;

    public OrderFactoryImpl( GraphDatabaseService graphDb )
    {
        this.graphDb = graphDb;

        Relationship rel = graphDb.getReferenceNode().getSingleRelationship(
            MyRelationshipTypes.ORDERS, Direction.OUTGOING );

        if ( rel == null )
        {
            orderFactoryNode = graphDb.createNode();
            graphDb.getReferenceNode().createRelationshipTo( orderFactoryNode,
                MyRelationshipTypes.ORDERS );

        }
        else
        {
            orderFactoryNode = rel.getEndNode();
        }
    }

    public Order createOrder()
    {
        Node node = graphDb.createNode();
        orderFactoryNode.createRelationshipTo( node,
            MyRelationshipTypes.ORDER );
        return new OrderImpl( node );
    }
}

The constructor sets up the subreference node if it didn't already exist. In our case it creates a node that is related to the start node (global reference node) using an ORDERS relationship.

The createOrder method creates a new node and connects it to the subreference node using the Relationshiptype ORDER. This gives us the node graph:

(Start Node)---ORDERS--->(Orders Subref Node)---ORDER--->(Order Node)

[edit] A simple Id Generator

It's common to use internal ids for data objects. In a relational database these are often auto-generated by the database itself. In Neo4j this is also the case, but for Neo4j everything is wrapped in a node so we get unique ids for all nodes (but not over time). However, often we want to have a sequential counter for specific types of nodes, such as Customers, Orders etc.

To create an automated sequential id for orders we could extend the OrderFactoryImpl with:

    private static final String KEY_COUNTER = "counter";

    private synchronized long getNextId() 
    {
        Long counter = null;
        try 
        {
            counter = ( Long ) orderFactoryNode.getProperty( KEY_COUNTER );
        }
        catch ( NotFoundException e )
        {
            // Create a new counter
            counter = 0L;
        }
		
        orderFactoryNode.setProperty( KEY_COUNTER, new Long( counter + 1 ) );
        return counter;
    }

The constant KEY_COUNTER holds the property name of the counter as it is found on the subreference node for the OrderFactory. The method getNextId is used to retrieve the next free never used id for orders. This method is synchronized and private so that it is only used by the createOrder method which we now have rewritten to:

    public Order createOrder()
    {
        Node node = graphDb.createNode();
        orderFactoryNode.createRelationshipTo( node,
            MyRelationshipTypes.ORDER );

        Order order = new OrderImpl( node );
        order.setId( getNextId() );

        return order;
    }

Right after creating the Order object, we fetch the next available id number and set it on the new order object (implies that the Order interface has a setId(long) method).

[edit] Summary

This is one way of making your code more structured and resiliant to change of the inner workings. If you go with this approach, your business logic should now exclusively use the general interfaces for the data objects and factories when it needs access to data.

[edit] Search

[edit] Searching by relations

Traversing nodes is where Neo4j shines, so we should use this as much as possible. A lot of searches is for a certain category or characteristic for an object. Basically, if you have a property for an object which has a limited set of possible values, you may consider making that property into a graph itself and then relate your object nodes to the proper value node.

For example: "Find all customers from Sweden."

Consider creating a Countries Node, where you add Country nodes.

(Start Node)--COUNTRIES-->(Countries Node)---COUNTRY-->(Sweden Node)
                                       |
                                       ------COUNTRY-->(Denmark Node)     

Now when you set the country for a customer, you set it to the country node:

(Customer Node)--LIVES_IN-->(Sweden Node)

This can be implemented the same way we did with customers and orders and when LIVES_IN relationship exist between customers and countries getting all customers from a country is easy.

    public Iterable<Customer> getCustomers()
    {
        Iterable<Relationship> rels = countryNode
            .getRelationships( MyRelationshipTypes.LIVES_IN );

        ArrayList<Customer> custs = new ArrayList<Customer>();

        for ( Relationship rel : rels )
        {
            Node customerNode = rel.getStartNode();
            Customer cust = new CustomerImpl( customerNode );
            custs.add(cust);
        }

        return custs;
    }

This method retrieves all relationships "LIVES_IN" where the current country is the end node. We then retrieve the start node and wrap it in a CustomerImpl object and add it to the returned list of customers.

[edit] Searching using traversing

The Traverser API is a very powerful tool that can be used to mine and modify data in the graph. For example to retrieve all customer nodes from the country Sweden we could create a simple traverser:

Traverser trav = swedenNode.traverse( Order.DEPTH_FIRST, StopEvaluator.DEPTH_ONE, 
    ReturnableEvaluator.ALL_BUT_START_NODE, LIVES_IN, Direction.INCOMING );
// iterate over traverser...

Or if we wanted all orders put by customers from Sweden we just add one more relationship type and modify the evaluators some:

Traverser trav = swedenNode.traverse(Order.DEPTH_FIRST, StopEvaluator.END_OF_GRAPH, 
    new ReturnableEvaluator()
    {
        public boolean isReturnableNode( TraversalPosition pos )
        {
            return !pos.isStartNode() && pos.lastRelationshipTraversed().isType( CUSTOMER_TO_ORDER );
        }
    }, 
    LIVES_IN, Direction.INCOMING,
    CUSTOMER_TO_ORDER, Direction.OUTGOING );
// iterate over traverser...

Note that relationships are equally fast traversed irregardless of their directions.

[edit] Searching by index

Sometimes it is necessary perform a lookup of something to get an entry point in the graph. For example an administrator may enter a order id as input and in return expects details about the order tied to that id. If we have a lot of Orders traversing over them to find the right one may not be very efficient, instead we would like to add some type of index on the order id. This is an exact match lookup of the property order id and can be achieved by using various index utilities. Neo4j has an index service with lucene as backend, to access it include this in your pom.xml:

    <dependency>
        <groupId>org.neo4j</groupId>
        <artifactId>neo4j-index</artifactId>
        <version>1.0-b1</version>
    </dependency>

You will now have access to org.neo4j.index.lucene.LuceneIndexService (see Component APIs).

(Start Node)---ORDERS--->(Order Subref Node)---Order--->(Order Node 1)

So our OrderFactory implementation will instantiate (or will be given, rather) a LuceneIndexService that can be used to index the order id in createOrder method and we can add a getOrderById method that will use the same map to retrieve the order.

    private IndexService index; // ...

    public Order createOrder()
    {
        Node node = graphDb.createNode();
        orderFactoryNode.createRelationshipTo( node,
            MyRelationshipTypes.ORDER );

        long orderId = getNextId();

        // add index
        index.index( node, "order", orderId );

        Order order = new OrderImpl( node );
        order.setId( orderId );

        return order;
    }

    public Order getOrderById( int orderId )
    {
        // use index to get order
        Node orderNode = index.getSingleNode( "order", orderId );
        if ( orderNode != null )
        {
            return new OrderImpl( orderNode );
        }
        // handle no such order id here    
    }

[edit] Searching by wildcard

To be able to perform wildcard search such as give me all customers with a first name that starts with A* take a look at lucene fulltext index service.

[edit] Transaction handling

All operations that work with the graph (even read operations) must be wrapped in a transaction. Read more about transaction handling in neo4j.

Neo4j のサイト
ツールボックス