RDF / SPARQL クイックスタートガイド

Neo4j Wiki から

Neo4j MetaModel, SPARQL Quick-start guide (Under development)

The Neo4j Graph database is very versatile simply because it is able to connect many different ideas and concepts into one package. This is one of the reasons that application development for the Neo4j platform can be very "kid-in-a-candy store" intriguing.

As the web moves closer and closer to the ideas professed by Tim Berners Lee and the W3C standards committee, the Semantic Web concepts of RDF, OWL and SPARQL are beginning to emerge and gain attention. Neo4j has quite a robust implementation of these ideas in the form of tools and libraries to help developers make an impact in this emerging field.

In this article, we will explore the org.neo4j.rdf and org.neo4j.meta packages to create a simple application in which we create a GraphDatabase, import some simple RDF data to it and query it using SPARQL.

The first thing we do of course is to create a GraphDatabaseService, a LuceneIndexService and a FulltextIndex (in that order).

 GraphDatabaseService graphDb = new EmbeddedGraphDatabase( "var/graphdb" );  
 LuceneIndexService indexService = new LuceneIndexService( graphDb );
 FulltextIndex fulltextIndex = new SimpleFulltextIndex( graphDb, new File( "var/graphdb/fulltext-index" ) );

The graph database is now up and running. Now we have to create the RdfStore and the MetaModel. The RdfStore is an Interface representing a location where RDF triples are stored. In Neo4j, it acts as a wrapper around the GraphDatabase where you are able to read and write RDF data, and indeed, use all other functionality that is available in Neo4j such as Traversals. The MetaModel Interface represents a way to structure the RDF triples in way that enables them to be connected in various ways, which is at the heart of any linked data. The idea of describing RDF data as a schema enables the developer to systematically represent not only relationships, but also restrictions, data types, hierarchies, properties, etc...

For example, a way to think of an RDF schema is to describe a person. A person has various properties like height, weight and hair colour. A person can also be related to other people by being a friend, parent and co-worker. A restriction that we can impose is that a friend must be another person (a person being a friend with a colour or a number would not make sense in our schema). Having this information, we can now think of RDF triples as statements about this schema. We can say that 'Bob' is a person. He has a height property of 180 cm. His hair colour property is 'brown'. He also has a co-worker relationships with other people such as 'Alison', who has her own properties, and so on. Now if we simply give each schema property a URI, we can connect and infer all kinds of information about all kinds of data.

We create an RdfStore and a MetaModel.

 MetaModel model = new MetaModelImpl( graphDb, indexService );
 RdfStore store = new VerboseQuadStore( graphDb, indexService, model, null );

RDF triples consist of three things (of course): A subject, a predicate and an object. Using the previous example, a subject would be the person 'Bob', a predicate would be 'co-worker' and an object would be 'Alison'. A convention in the RDF world is that everything is a URI, so to describe the schema more formally we will attach a URI prefix to each unit in the schema:

 String PERSON = "http://neo4j.org/person";
 String KNOWS = "http://neo4j.org/knows";
 String TYPE = "http://neo4j.org/type"; 
 String NAME = "http://neo4j.org/name"; 
 String NICK = "http://neo4j.org/nickname"; 

The above strings indicate a schema that is different from our example but the idea remains the same. Nontheless, this is our schema!

We also need RDF-type URIs for our resources (instances):

 String BOB = "http://neo4j.org/bob"; 
 String ALISON = "http://neo4j.org/alison";

With this information in hand we can now build a simple model using our schema, make statements about our resources (describe them, connect them) and in the end, query all this information.

At this point, we need to wrap our code inside a Transaction since we will be reading and writing data to the triple store.

 Transaction tx = graphDb.beginTx();

Now we define the MetaModel:

 MetaModelNamespace namespace = model.getGlobalNamespace();
 MetaModelClass personClass = namespace.getMetaClass(PERSON, true );
 MetaModelProperty nameProperty = namespace.getMetaProperty(NAME, true );
 MetaModelProperty typeProperty = namespace.getMetaProperty(TYPE, true );
 MetaModelProperty nickName = namespace.getMetaProperty( NICK, true );
 personClass.getDirectProperties().add( nameProperty );
 personClass.getDirectProperties().add( typeProperty );
 nameProperty.getDirectSubs().add( nickName );

In the first line, we are creating an RDF namespace. Simply put, the URI prefixes for the entities that we described above can be considered the namespace in our model. This leads to a choice: 1) You can create a blank namespace (that we just created) or 2) use the MetaModel's getNamespace method with the prefix argument. Of course, if you are using the getNamespace method, you need to get rid of the prefix for all the entities described.

The next 3 lines create the entities from the namespace. Note how the person is represented as a class and name, type, nickname are represented as properties. Next we associate the properties with the class. In Neo4j, properties are represented as a Collection. Therefore, the Collection's add() method is used to insert them. We can also add sub-properties of properties and sub-classes of classes, creating a hierarchy in our model. Any semi-sophisticated model will have some kind of hierarchy in its schema. Since the nickname property is a kind of name, we can simply add it as a sub-property of name.

We are ready to insert our RDF data into the triple store. To put a triple into a triple store, we need to wrap it into a Statement. There are two types of Statements: CompleteStatement and WildcardStatement. The first one that we will look at is the CompleteStatement. The 'Complete' part means that there must actually exist a real subject, a predicate and an object.

 ArrayList<CompleteStatement> statements = new ArrayList<CompleteStatement>();
 statements.add(new CompleteStatement((Resource) new Uri(BOB), new Uri(TYPE), new Uri(PERSON), Context.NULL));
 statements.add(new CompleteStatement((Resource) new Uri(ALISON), new Uri(TYPE), new Uri(PERSON), Context.NULL));
 statements.add(new CompleteStatement((Resource) new Uri(BOB), new Uri(NAME), new Literal("Bob"), Context.NULL));
 statements.add(new CompleteStatement((Resource) new Uri(ALISON), new Uri(NAME), new Literal("Alison"), Context.NULL));
 statements.add(new CompleteStatement((Resource) new Uri(ALISON), new Uri(NICK), new Literal("Ally"), Context.NULL));

We have introduced 4 new classes, so let's discuss them. Resource is an Interface representing an RDF Resource. Essentially, it is the 'thing' that is being described by the Statement. Notice that all of the constructors for CompleteStatement require a Resource as the subject of the triple. The classes Uri and Context are Resource implementors. In RDF, the subject of a triple is either a URI or a BlankNode. BlankNodes are beyond the scope of this tutorial. Note that even though resources tend to be represented by Internet-based namespaces with the usual 'http://' prefix, they do not necessarily have to be accessible by HTTP. As long as the resources represent data about a 'thing' and it has a unique URI string, it could be anything.

The Literal class represents an rdfs:Literal, which can be either a primitive data type (int, byte, etc...), or a reference data type (an Object). For our case, we will work with String types. An optional argument for Literal is the datatype (represented by a Uri). Even though we will not be using it here, the datatype argument is often necessary for proper data interpretation. Consider a Literal represented by a String: "13". How does an RDF consumer application interpret this?Often the case is that "13" should really by an integer 13. When this is the case, RDF calls for the use of typed literals, which are represented by datatype in Neo4j. A typed literal is simply a URI that associates a literal with a particular data type. So, to represent "13" as an integer, we create a Literal as follows:

 Literal thirteen = new Literal("13", "http://www.w3.org/2001/XMLSchema#integer")

The Context class is used for reification in which a Statement can be assigned a URI and treated as a Resource for the purpose of another Statement, which can then add additional information. We will not be using this in our simple graph so we simply use Context.NULL.

Finally, we add the statements ArrayList into our triple store...

 store.addStatements(statements.toArray(new CompleteStatement[] {}));

... and close the transaction:


The final step of this tutorial is to query our triple store using the Neo4j SPARQL implementation.

Here is where things get a little complicated. Since the SPARQL package used by Neo4j is the external name.levering.ryan.sparql (the website for SPARQL Engine can be found [here](http://sparql.sourceforge.net/), there are a few things that need to happen for the SPARQL Engine to communicate properly with the RdfStore. The communication between the two systems takes place in the MetaModelProxy interface, so we have to create a class that implements that interface. This class is called MetaModelMockup. To avoid confusion we will simply import it and will not be discussing the methods within.

As a final pre-query step, we outline how many Resources we have in our triple store. This is stored in a HashMap as a pair (Resource, count). In our graph, we only have 2 Resources (BOB and ALISON).

 Map<String, Integer> counts = new HashMap<String, Integer>();
 counts.put( PERSON, new Integer( 2 ) );

Now we can create the query executor:

 Neo4jSparqlEngine engine = new Neo4jSparqlEngine(new VerboseQuadStrategy

(new VerboseQuadExecutor(graphDb, indexService, model, null), model), (MetaModelProxy) new MetaModelMockUp(model, counts));

We have mentioned the MetaModelMockup. The VerboseQuadStrategy is a RepresentationStrategy object that deals with representing the graph in the Neo4jSparqlEngine. When we issue a SPARQL query against our graph, the Neo4jSparqlEngine must map that query and select the appropriate components from the triple store that match our specified query.

We also create the Query object:

 Query query = null;

Suppose we want to get the names and nicknames of all the people in our graph that have a nickname:

 try {
   query = engine.parse( new StringReader("SELECT ?p ?n " +
                                          "WHERE { " +
                                          "?p <http://neo4j.org/type> <http://neo4j.org/person> . " +
                                          "?p <http://neo4j.org/nickname> ?n . }") );
 } catch(ParseException e) { 
   System.out.println("Parse Exception: " + e);

We will not be getting into SPARQL here, however, more information can be found in the W3C SPARQL Technical Specification page.

 RdfBindingSet result = ( ( SelectQuery ) query ).execute( new Neo4jRdfSource() );

We finally execute the query into a result set specified by the RdfBindingSet. This contains the result pairs of variables and values which themselves are elements in a 'table' of row results. Therefore, to reference a returned variable's name and value, we can use the .getName() and .getValue() methods, respectively.

 Iterator it = result.iterator();
 while(it.hasNext()) {
   Neo4jBindingRow row = (Neo4jBindingRow)it.next();
   for(Variable var: row.getVariables()) {
     System.out.println(var.getName() + ", " + row.getValue( var ).toString());

Taking a look back at the query, the '?p' variable will bind to subject that is of type 'person' and the '?n' variable will bind to the nickname literal.

The result is as expected:

 p, http://neo4j.org/alison
 n, Ally

You can find the source files on my GitHub page here.

Neo4j のサイト