構成の設定

Neo4j Wiki から

目次

[edit] How to add configuration settings

When creating the embedded Neo4j instance it is possible to pass in parameters contained in a map where keys and values are strings. A utility method exist to convert a standard Java properties file to this:

    Map<String,String> configuration = EmbeddedGraphDatabase.loadConfigurations( "neo4j_config.props" );
    GraphDatabaseService graphDb = new EmbeddedGraphDatabase( "my-neo4j-db/", configuration );

The default configuration parameters can be found here Neo_default.props. You can even look at the Performance Guide for a further discussion of Neo4j performance.

When using the Neo4j REST server, see Getting Started REST#Supplying configuration file for how to add configuration settings in that case.

[edit] Memory mapped I/O settings

Each file in the Neo4j store can use memory mapped I/O for reading/writing. Best performance is achieved if the full file can be memory mapped but if there isn't enough memory for that Neo4j will try and make the best use of the memory it gets (regions of the file that get accessed often will more likely be memory mapped).

Neo4j makes heavy use of the java.nio package that got introduced in Java 1.4. Native I/O may result in memory being allocated outside the normal Java heap so that memory usage needs to be taken into consideration. A well configured OS with large disk caches will help a lot once we get cache misses in the node and relationship caches, therefore it is not a good idea to use all available memory as Java heap.

If you look into the directory of your Neo4j database, you will find its store files, all prefixed by neostore:

  • nodestore stores information about nodes
  • relationshipstore holds all the relationships
  • propertystore stores information of properties and all simple properties such as primitive types (both for relationships and nodes)
  • propertystore strings stores all string properties
  • propertystore arrays stores all array properties

There are other files there as well, but they are normally not interesting in this context.

This is how the default memory mapping configuration looks:

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=50M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.propertystore.db.arrays.mapped_memory=130M

[edit] Optimizing for traversals example

To tune the memory mapping settings start by investigating the size of the different store files found in the directory of your Neo4j database. Here is an example of some of the files and sizes in a Neo4j database:

 14M neostore.nodestore.db
510M neostore.propertystore.db
1.2G neostore.propertystore.db.strings
304M neostore.relationshipstore.db

In this example the application is running on a machine with 4GB of RAM. We've reserved about 2GB for the OS and other programs. The Java heap is set to 1.5GB, that leaves about 500MB of RAM that can be used for memory mapping.

If traversal speed is the highest priority it is good to memory map as much as possible of the node- and relationship stores.

An example configuration on the example machine focusing on traversal speed would then look something like:

neostore.nodestore.db.mapped_memory=15M
neostore.relationshipstore.db.mapped_memory=285M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=0M

[edit] Batch insert example

The configuration should suit the data set you are about to inject using Batch insert. Lets say we have a random-like graph with 10M nodes and 100M relationships. Each node (and maybe some relationships) have different properties of string and Java primitive types (but no arrays). The important thing with a random graph will be to give lots of memory to the relationship and node store:

neostore.nodestore.db.mapped_memory=90M
neostore.relationshipstore.db.mapped_memory=3G
neostore.propertystore.db.mapped_memory=50M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=0M

The configuration above will fit the entire graph (with exception to properties) in memory. A rough formula to calculate the memory needed for the nodes:

  • number_of_nodes * 9 bytes

and for relationships:

  • number_of_relationships * 33 bytes

Properties will typically only be injected once and never read so a few megabytes for the property store and string store is usually enough. If you have very large strings or arrays you may want to increase the amount of memory assigned to the string and array store files.

An important thing to remember is that the above configuration will need a Java heap of 3.3G+ since in batch inserter mode normal Java buffers that gets allocated on the heap will be used instead of memory mapped ones.

[edit] Cache settings

The type of cache neo4j uses to cache graph primitives (nodes and relationships) from the underlying store can be configured with the cache_type property, such as:

cache_type = soft

The different options are:

  • soft default: LRU cache using soft references. Soft references are cleaned when the GC thinks it's needed. Use if your application load isn't very high.
  • weak: LRU cache using weak references. It is nice to the garbage collector in that GC can clean such a reference whenever it finds it. Use if your application is under heavy load with lots of reads and traversals.
  • none (not recommended): Doesn't cache any nodes or relationships and should only be used in special scenarios.

See more information about LRU cache.

LRU cache behaviour (in comparison with f.ex. using a LinkedHashMap):

  • faster gets and puts
  • it is more memory efficient (more nodes and relationships can be cached using the same amount of memory)
  • it doesn't rely on trying to read the JVM heap size in realtime (which is known to be unreliable)
  • it should use the heap more efficiently (this depends on the JVM implementation, though)

You can read about references and relevant JVM settings for Sun HotSpot here:

[edit] JVM Settings

The JVM is configured by passing command line flags when starting the JVM, the most important configuration parameters for Neo4j are the ones that control the memory and garbage collector, but some of the parameters for configuring the Just In Time compiler are also of interest.

This is an example of starting up your applications main class using 64-bit server VM mode and a heap space of 1GB:

 java -d64 -server -Xmx1024m -cp /path/to/neo4j-kernel.jar:/path/to/jta.jar:/path/to/your-application.jar com.example.yourapp.MainClass

Looking at the example above you will also notice one of the most basic command line parameters: the one for specifying the classpath. The classpath is the path in which the JVM searches for your classes. It is usually a list of jar-files. Specifying the classpath is done by specifying the flag -cp (or -classpath) and then the value of the classpath. For Neo4j applications this should at least include the path to the Neo4j neo4j-kernel.jar and the Java Transaction API (jta.jar) as well as the path where the classes for your application are located. On Linux, Unix and Mac OS X each element in the path list are separated by a colon symbol (:), on Windows the path elements are separated by a semicolon (;).

[edit] Configuring the memory for the JVM

There are two main memory parameters for the JVM, one controls the heap space and the other controls the stack space. The heap space parameter is the most important one for Neo4j, since this governs how many objects you can allocate. The stack space parameter governs the how deep the call stack of your application is allowed to get.

When it comes to heap space the general rule is: the larger heap space you have the better, but make sure the heap fits in the RAM memory of the computer. If the heap is paged out to disk performance will degrade rapidly. Having a heap that is much larger than what your application needs is not good either, since this means that the JVM will accumulate a lot of dead objects before the garbage collector is executed, this leads to long garbage collection pauses and undesired performance behavior.

Having a larger heap space will mean that Neo4j can handle larger transactions and more concurrent transactions. A large heap space will also make Neo4j run faster since it means Neo4j can fit a larger portion of the graph in its caches, meaning that the nodes and relationships your application uses frequently are always available quickly. The default heap size for a 32bit JVM is 64MB (and 30% larger for 64bit), which is too small for most real applications.

Neo4j works fine with the default stack space configuration, but if your application implements some recursive behavior it is a good idea to increment the stack size. Note that the stack size is shared for all threads, so if you application is running a lot of concurrent threads it is a good idea to increase the stack size.

  • The heap size is set by specifying the -Xmx???m parameter to hotspot, where ??? is the heap size in megabytes.
    Default heap size is 64MB for 32bit JVMs, 30% larger (appr. 83MB) for 64bit JVMs.
  • The stack size is set by specifying the -Xss???m parameter to hotspot, where ??? is the stack size in megabytes.
    Default stack size is 512kB for 32bit JVMs on Solaris, 320kB for 32bit JVMs on Linux (and Windows), and 1024kB for 64bit JVMs.

When running Neo4j on Windows the size of the memory-mapped nioneo configurations need to be added to the heap size parameter. On Linux and Unix-systems memory mapped IO is not included in the heap size.

Most modern CPUs implement a Non-Uniform Memory Access (NUMA) architecture, where different parts of the memory have different access speeds. Suns Hotspot JVM is able to allocate objects with awareness of the NUMA structure as of version 1.6.0 update 18. When enabled this can give up to 40% performance improvements. To enabled the NUMA awareness, specify the -XX:+UseNUMA parameter.

[edit] Guidelines for heap size

This table is a guideline for how much memory to assign to various parts of the system for a web application using Neo4j, running on server hardware with mechanical disks.

Number of primitives RAM size Heap configuration Reserved RAM for the OS Memory mapping configuration
10M 2GB 512MB the rest 100-512MB
100M 8GB+ 1-4GB 1-2GB the rest
1B+ 16GB-32GB+ 4GB+ 1-2GB the rest

With a Solid State Drive the heap settings can be configured lower since disk access isn't as expensive, thus making caching less important and memory-mapping more important.

[edit] Selecting VM mode

A JVM generally has two execution modes, client mode and server mode. On computers with 64-bit processors there are also modes for executing the JVM in 32-bit mode and 64-bit mode. Selecting VM modes is done by specifying command line parameters, -client for client mode, -server for server mode, -d32 for 32-bit mode, and -d64 for 64-bit mode. The mode modifying command line arguments must be the first arguments specified to the java command, since they modify the behavior of the entire VM and thus the interpretation of all the other parameters. We recommend always executing Neo4j applications in a server mode JVM.

The server and client mode settings effect the default value for a number of configuration parameters, most notably the Just In Time compiler (JIT) and the Garbage Collector (GC). In Suns Hotspot JVM the client mode uses a JIT compiles to an optimized form earlier, but the JIT used in server mode gathers more information about the running code before optimizing, and thus does a far better job in optimizing the code. Other JVM implementations may have different strategies for how and when code is optimized, so consult the manual for the JVM you are using for details on how to tune your JVM for your code.

[edit] Configuring the Garbage Collector

The Hotspot JVM comes with a number of garbage collectors. Choosing the right one is a delicate but important decision. The default choice of garbage collectors depends on a number of factors, such as which JVM release you are using, client/server mode, whether you are running on a 64-bit or 32-bit CPU and whether you have multiple cores or not. It is possible to introspect which garbage collectors the JVM uses through the system MXBeans, this information is easily accessed for a running JVM through the jconsole application. It is also easy to introspect the values programatically, here is a short Java program that prints out the names of the garbage collectors in use for the current JVM:

import java.lang.management.ManagementFactory;
import java.lang.management.GarbageCollectorMXBean;

public class GcIntrospect
{
    public static void main( String[] args )
    {
        for ( GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans() )
        {
            System.out.println( gc.getName() );
        }
    }
}

The Hotspot JVM usually uses two garbage collectors, one for the young generation and one for the tenured generation. For general advice on selecting an appropriate garbage collectors for your Java Platform application, see this article. These are our advice on how to configure garbage collection for Neo4j.

We have found that the Concurrent Mark and Sweep Compactor gives the best performance for Neo4j Applications.

When having made sure that the heap size is well configured the second thing to tune in order to tune the garbage collector for your application is to specify the sizes of the different generations of the heap. The default settings are well tuned for "normal" applications, and work quite well for most applications, but if you have an application with either really high allocation rate, or a lot of long lived objects you might want to consider tuning the sizes of the heap generation. The ratio between the young and tenured generation of the heap is specified by using the -XX:NewRatio=# command line option (where # is replaced with a number). The default ratio is 1:12 for client mode JVM, and 1:8 for server mode JVM. You can also specify the size of the young generation explicitly using the -Xmn command line option, which works just like the -Xmx option that specifies the total heap space.

GC shortname Generation Command line parameter Comment
Copy Young -XX:+UseSerialGC The Copying collector
MarkSweepCompact Tenured -XX:+UseSerialGC The Mark and Sweep Compactor
ConcurrentMarkSweep Tenured -XX:+UseConcMarkSweepGC The Concurrent Mark and Sweep Compactor
ParNew  Young -XX:+UseParNewGC The parallel Young Generation Collector Can only be used with the Concurrent mark and sweep compactor.
PS Scavenge Young -XX:+UseParallelGC The parallel object scavenger
PS MarkSweep Tenured -XX:+UseParallelGC The parallel mark and sweep collector

These are the default configurations on some platforms according to our non-exhaustive research:

JVM -d32 -client -d32 -server -d64 -client -d64 -server
Mac OS X Snow Leopard, 64-bit, Hotspot 1.6.0_17 ParNew and ConcurrentMarkSweep PS Scavenge and PS MarkSweep ParNew and ConcurrentMarkSweep PS Scavenge and PS MarkSweep
Ubuntu, 32-bit, Hotspot 1.6.0_16 Copy and MarkSweepCompact Copy and MarkSweepCompact N/A N/A

[edit] A note on the Garbage First garbage collector (G1)

Sun has developed a new garbage collector that has been included in the JVM update releases since JVM 1.6_u14, called the Garbage First (or simply G1) collector. The Garbage First garbage collector could prove to be very beneficial for Neo4j applications. Since this is still an experimental garbage collector the Neo4j team has not done enough testing with it yet. To try the G1 GC add the following command line options to the java command:

 -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

[edit] Block size

Neo4j stores data on disk in blocks which have fixed sizes. These are the current record sizes in bytes that can be used to calculate the actual store size:

nodestore 9
relationshipstore 33
propertystore 25
stringstore 133
arraystore 133

All properties except strings and arrays will take a single propertystore record (25 bytes). A string or array property will use one record from the propertystore then as many blocks needed from the string/array store file (each block of 133 bytes can store 120 bytes of data). This means if all your strings are in 120 bytes multiples in size you will make very efficient use of the store file while if they are empty you will not make very good use of the space (exactly like a common file system taking up space for empty files).

Beginning with kernel version 1.1, the block size for string and array store can be configured when the store is created. This is how it's done:

  Map<String,String> config = new HashMap<String, String>();
  config.put( "string_block_size", "60" );
  config.put( "array_block_size", "300" );
  // create a new store with string block size 60 and array block size 300
  new EmbeddedGraphDatabase( "path-to-db-that-does-not-exist", config).shutdown();

The default value (120 bytes) was picked to fit common size string/array properties in one block since it will be slower to load a property that is spread out on many blocks. When tweaking these values remember that strings will consume twice the string length in bytes so a string block size of 60 will be able to fit a string of length 30 in a single block.

Note that the block sizes can only be configured at creation time.
Neo4j のサイト
ツールボックス