Neo4j Wiki から


[edit] General Neo4j Questions

[edit] Why do I get an OutOfMemoryError injecting data?

You are most likely creating too many new nodes, relationships and properties in a single transaction. Try to split the injection in more than one transaction. If you still get OutOfMemoryError it is probably because you are not calling finish() on the top-level transaction. It can also be that you have too big transactions.

[edit] How do I query/search for a property?

You need the index component. See http://components.neo4j.org/neo4j-index/

[edit] Why do I get a RuntimeException "Could not create data source..."?

You already have a running Neo4j instance to the same physical Neo4j database, i.e. the same physical path. Look at the thrown IllegalStateException and you should see the path in any of the nested exception messages.

[edit] How concurrent is Neo4j, when are locks taken?

By default concurrent reads can be performed, even if the data being read is modified in some other transaction. Concurrent writes to the same nodes or relationships will have to wait for each other. This means no locks during reads while a write operation will take a lock on the node or relationship being modified (and hold that lock until the transaction is committed or rolled back). Read more.

[edit] What's today's practical limit on the number of Nodes, Relationships and Properties on one unsharded Neo4j instace?

Currently, Neoj4 is using Integers as the primary IDs for storage primitives. That gives an ID space of 2^32 (roughly 4Billion) for each of Relationships, Nodes and Properties. This limit will be removed as of version 1.3 of Neo4j.

[edit] How about sharding the data?

Sharding data with Neo4j is possible, the logic has to be maintained on the client side. Neo4j will provide utilities to make the upfront sharding decisions easier as part of version 1.3 . Replication and failover is already handled with the HA setups.

[edit] How long does it take to load a large neo4j database?

The fastest way to load a Neo4j database is to initially populate it using the Batch Inserter. Depending on the circumstances, speeds of loading 500M relationships and nodes in < 1h.

[edit] Why are my changes to the graph not persisted?

You either forgot to call tx.success() before transaction tx.finish() or tx.failure() has been invoked to mark the transaction as rollback only. Read more.

[edit] I am injecting a big data set into Neo4j and injection speed is not that fast. Why?

Neo4j is written to be fast for 1) many concurrent reads and 2) smaller concurrent transactional updates which is the common use case for most applications. Try to group more operations in a single transaction to get higher injection speed. This is typically only a problem during testing (I need to load all this data to test something) and not in production when data growth will be more suited for smaller transactional updates. Maybe the batch inserter can help you here.

[edit] How fast are all the different Neo4j API operations?

Characteristics are

  • constant time for add/remove/get property and create/delete/get node or relationship
  • linear time for getting relationships on a node.

Speed will be very much dependent on hardware and also on graph type. Using todays standard hardware should result in 1000-3000 traversals or property gets per ms (reads) and about 10-100 inserts/updates per ms (writes). Modifying transactions comes with a lot of overhead, so when performing few operations in each transaction your write speed will drop dramatically and be linked to how fast your storage media is at performing flush operations.

Further details of performance tuning are explains in the Neo Performance Guide

[edit] How big graph can machine X handle?

With normal rotating media here are some guidelines: Laptop 1-2 GB RAM handles tens of millions of primitives. A standard server 4-8 GB RAM handles hundreds of millions of primitives. More expensive servers with 16-32GB RAM can handle billions of primitives. With Solid State Drives (SSDs) you can handle larger graphs on less RAM.

[edit] How does Neo4j play in an OSGi environment?

All components in the Neo4j component distribution are behaving nicely in an OSGi environment - the jars are packaged as bundles, there is a sample test OSGi IMDB application in the examples section [1]. However, right now the API for Neo4j is not really tightened up, so from in the neo4j-kernel.jar not only the org.neo4j.graphdb package but all internal packages are exposed. This will be straightened out in a future release.

[edit] Can I access a running Neo4j instance from more than one machine?

Neo4j is at it's core an in-process database, accessible only from the JVM it runs in. However, with remote-graphdb, you can via RMI and a VERY similar client API use the same functionality even from other clients.

[edit] What ways of encoding/holding metadata and type of nodes and properties are there?

There are a number of interesting approaches to this, involving both holding the metadata in the graph and outside the graph (in code):

  • Use the navigational context

This approach builds on the basic assumption that you know the type of the properties if you know the type that the node represents an instance of (the "type of the node"). The type of the node is deduced from how the node is reached. From a node of a known type you know the type of each node at the other end of a relationship based on the type of the relationship. This means that given a start node if a known type you will know the type of all nodes you can reach from it. In order to know the type of a start node you can use different indexes for different types, so that the nodes in one specific index always represents the same type.

To differentiate between subtypes some of the other approches can be used.

See http://lists.neo4j.org/pipermail/user/2008-October/000848.html

  • RDF and OWL

basically, every node will maintain a relationship to its type node (your shadow node), something like x?--RDF:TYPE-->type_node which contains info on what the type is, what properties etc.

this is the concept of describing the type of things in code (Java in this case) and thus in code enforce the restrictions and type conversions on properties through the code. This does not capture any meta info in the graph but is easy to do.

  • Annotate the nodes with type info

in this approach, there is a "type" or "classname" property on any node that is used to derive the type to deserialize/serialize the object into, the rest of the meta info is contained in the upper code layers. Andreas Ronges JRuby bindings are using this approach.

  • Encode everything into a String property

this approach means shuffling everything into a string property, basically treating properties as BLOBs. Works in some cases, but certainly locks down your data in these properties.

[edit] How can I get the total number of nodes and relationships currently in Neo4j?

For the time being you can use the following non-official API:


where the class would be Node.class, Relationship.class or PropertyStore.class

[edit] Checking if a relationship exists between two nodes

Checking if a relationship (with certain properties or not) exists between two nodes will be a linear search from either of the two nodes. So if many relationships (of the given type) exists between the two nodes performance will depend on how many relationships there currently are.

This can be solved more efficiently with relationship indexing:

RelationshipIndex index = graphDb.index().forRelationships( "my-index" );
Relationship myRelationship = node1.createRelationship( node2, KNOWS );
Relationship myOtherRelationship = node1.createRelationship( node2, KNOWS );
index.add( myRelationship, "date", 1234567890 );
index.add( myOtherRelationship, "date", 1234567999 );
// You can also index the types as a normal key/value pair
index.add( myRelationship, "type", myRelationship.getType().name() );
index.add( myOtherRelationship, "type", myOtherRelationship.getType().name() );

// Query the index
boolean exists = index.get( "date", 1234567890, node1, node2 ).getSingle() != null;
// or
boolean exists = index.query( "date:1234567890 AND type:KNOWS", node1, node2 ).getSingle() != null;

[edit] Can Neo4j be used with other Transaction Managers than the one used internally in the Neo4j kernel?

Right now Neo4j (as other NoSQL stores) does not play very well in external JTA transaction managers. This is mainly due to the fact that the JTA specification is not clear enough around the rollback functionality for XA resources. For RDBMSes, that missing part of functionality is sitting in the SQLConnection API, which is outside the spec and thus not applicable for Neo4j.

Work is going on towards a solution for this, first off with integration into Spring transaction management. This will be part of Neo4j 1.3, so hold out, help is coming :)

[edit] License Questions

Neo4j is released under AGPL3.

[edit] I have created a library that uses Neo4j Enterprise or Advanced Edition, can I release it under <insert license here (e.g. ASL2.0)>?

For a short guide to licensing, please see The Licenseing Guide

The short answer for ASL2.0 is 'yes', but you should consult with a lawyer. The same answer applies to other licenses that are accepted to be compatible with AGPLv3.

Our answer here is based on our interpretation of the GNU Affero General Public License, we do not make any guarantees about the legal validity of this response. In order to get legally valid responses you should consult with a lawyer.

You can release your library under any license that is compatible with GPLv3 if you depend on Neo4j Community Edition. Most of the licenses that are compatible with GPLv3 are also compatible with AGPLv3. What makes a license compatible with AGPLv3 is defined in the license text. A list of licenses stating their compatibility with GPL is available at wikipedia Licenses that are compatible with GPLv3 are (generally) compatible with AGPLv3 as well. David Wheeler has published a nice essay on compatibility between FLOSS licenses that features an easy-to-understand graph depicting how these licenses are compatible with each other.

AGPLv3 is essentially GPLv3 with onne addition. This one addition stipulates that if the software is used through a network, the users who access it through the network must also get access to the source code of the software.

Let's look specifically at what it means to release a library that uses Neo4j under The Apache License version 2.0 (ASL2.0):

  • You are allowed to license your code under ASL2.0.
  • Users of your software must adhere to the terms of the ASL2.0 (from your code) as well as the AGPLv3 (from Neo4j).
  • This means that any software that uses your library AND Neo4j must be released under a license that is compatible with AGPLv3.
  • If anyone modifies your library in such a way that it does no longer depend on Neo4j, they will only need to consider what ASL2.0 binds them to, since the AGPLv3 component (Neo4j) is no longer in the mix.

[edit] Can I use Neo4j in a project/product based on Eclipse and EPL plugins?

in principle, the GPL and the AGPL which the different editions of Neo4j are licensed under, is not compatible with the EPL. However, there are three ways to deal with this:

1. Don't distribute Neo4j as part of the Eclipse-product distribution and let the user install the Neo4j plugin manually or semi-manually. Not a great way but doable.

2. Get back to Neo Technology and together out a way to provide your project with a free commercial license to Neo4j. This has been done in a number of cases, even with Eclipse-based products. From a legal perspective that is the cleanest solution because then no AGPL is involved at all.

3. Potentially, the Neo4j licensing could be changed in a way similar to the Aptana License, explicitly making an exception to the AGPL for allowing usage with Eclipse based products and projects. This is not something that can be done immediately. For more information on the Aptana approach to things, see http://www.aptana.com/legal/aplgplex and http://www.aptana.com/legal

[edit] I'd like the developers to be able to run Neo4J locally during development. Do I need 10 basic server licenses for my developers or is there some kind of developer license similar to the developer licenses for WebLogic or WebSphere?

You don't need any licenses from us for any of your internal development, QA etc since that is covered by the AGPL - no external users are accessing these systems, so they are covered by the users of these systems being within the same organization and thus having automatically access to the source code of these systems.

[edit] If I have a completely internal app, say a company directory, that uses Neo4J, and I have a similar server topology -- 1 QA server and 2 production servers -- do I still need a commercial license for the production servers?

For purely internal apps within the same organisation as the one providing the source code, the use is covered by the AGPL, so per se you don't need any commercial license for Neo4j except, as you mention, if you require services, upgrades and other help. Happy hacking in that case!

Neo4j のサイト