Neo に関してよくある間違い

Neo4j Wiki から

目次

[edit] Common mistakes

[edit] Transactions usage

Always use a try finally block for transactions. During a transaction Neo4j may take locks on nodes and relationships that are modified. Write locks will be held until the transaction commits or rolls back, thus failing to properly close a transaction will cause such locks not to be released. Read more about locks and isolation in transactions.

Neo4j does not support nested transactions, trying to open a new transaction if a transaction is already present will result in a placebo transaction that will forward all work performed to the real parent transaction.

Finally don't make transactions too granular since finishing a transaction have a lot of overhead. Creating 10k nodes opening a new transaction for each one can take a long time (~5s, depends on hard disk). However don't let transactions get too big either since that can consume a lot of memory. Creating nodes and committing after every 10k "node creates" yields a much better performance (~500k node creates in 5s).

[edit] Don't use DTOs

[edit] Unbalanced graph

It is possible to store all you data using only one node and some properties but it is a very bad idea to do so. It would also be possible to store everything using only nodes and relationships but that is probably also a bad idea. Networks are very powerful, as you increase depth you can capture exponentially more data and still have just about the same runtime characteristics (given that you model your network properly). Usually it is very obvious when it is good to model something using a relationship or a property but always keep in mind what happens when the amount of data starts to grow. If data increase would result in nodes or relationships getting more and more properties (same amount of nodes and relationships, just more properties) you are doing something wrong. Same goes for relationship, if data size increase results in a single node getting more and more relationships you are doing something wrong. Rules of thumb:

  • When data size grows you should get more nodes, relationships and properties. Not only one of them.
  • When data size grows single nodes should not just get more and more relationships.

If you manage to accomplish this you have a well modeled network and your application will scale very well.

[edit] Unnecessary synchronization

Neo4j is thread safe. This means that all Neo4j API operations can be invoked concurrently without resulting in non-deterministic behavior. Neo4j automatically detects deadlocks before they happen raising a DeadlockDetectedException. However, real JVM deadlocks may still arise due to over synchronization.

// don't do this
synchronized void methodA()
{
    nodeA.setProperty( "prop1", 1 );
    methodB();
}

synchronized void methodB()
{
    nodeB.setProperty( "prop2", 2 );
}

The code above is very deadlock prone (Neo4j kernel uses read/write locks internaly, see Transactions#Isolation) when methods A and B are called concurrently. First of all the two methods shouldn't be synchronized since they already are thread safe. Secondly, Neo4j will not be able to detect all JVM deadlocks here because Neo4j have no idea about any synchronization performed before a Neo4j API call. Rule of thumb, strive for open calls. Calling a Neo4j API operation with locks held is always risky.

[edit] Further information

See more information about how to design your graph and performance hints.

Neo4j のサイト
ツールボックス