サイト利用分析

Neo4j Wiki から

This is a small example modeling how users navigate a site so that analyzes can be made about it.

Image:SiteUsage.png

Looking at the image, there's users and pages (which can contains links to other pages). When a user navigates the site the visited pages are stored as a path. With this model we could f.ex. answer some interesting questions after analyzing its data, f.ex:

The model doesn't show a case where a user navigates to the same page more than once in the same path, but that's possible as well... i.e. there can exist any number of relationships (even of the same type) between two nodes.

Over time the number of paths will increase, possibly affecting query performance. One solution is to have an "aggregator" which looks at paths and merges identical paths together and increasing a counter on it. So f.ex. if 100 different users took an identical path (one or more times) through the site those paths can be merged together to one path node, which will additionally have relationships to all those users and a counter for how many times that path has been taken. This will keep the query performance optimized even for very large data sets.

[edit] Which path through my site are users most likely to travel?

Given that you'd like to find a path from a given starting point, where users start their navigation we could use an algo like:

  • Find the page which has the most PAGE_STEP relationships with index 1 connected to it.
  • Traverse one step to the pages linked to that page and see which of those has the most PAGE_STEP relationships with index 2 connected to it.
  • Traverse on until you get to a page which has no links from it or has no PATH_STEPS with previous-index+1 on it.

To find a well-travelled path in the midst of your site the algorithm becomes a bit more complicated and a bit less performant, but that could be done as well.

[edit] How often are users navigating from page X to page Y with a maximum of three steps?

This could be solved with something like:

  • From the node representing page X, traverse all PAGE_STEP relationships to path nodes
  • For each traversed PATH_STEP relationship (let's call it "relationship 1") look at the path node to see if it's connected to the node representing page Y with a PAGE_STEP relationship (called "relationship 2") where the index between "relationship 1" and "relationship 2" is <= 3
Neo4j のサイト
ツールボックス