I’ve been stretching myself even more than usual lately, and I’ve decided to chronicle some of the things I’m adding to my repertoire.

I’ve always been a database disciple. I’m not an engineer, programmer, developer — or any other -er who writes code for living. But even as a 20-something MBA student in the very early 90s, I recognized the value of data, and incorporated it into my work. I go into the how & why of this in more detail in my post, “Lego Strategy”.

I recently learned of Neo4j, an open source “Graph Database”. Graph databases have gained quite a bit of traction in the past several years; you’ve read about them in flurries of articles about “Facebook’s Social Graph”, etc. But it wasn’t until I started digging in that I started to realize just how powerful these things can be. While I’m not going to even scratch the surface, I did want to share a little bit of the learning process I’m going through. I know it’s going to help me think creatively, and I think it will help you too.

Of course, you can get a much better description of Graph Databases on Wikipedia, and you can even geek out on the subtle distinctions between RDBMS, NoSQL, and all that other alphabet soup. But I’m going to show you a super-simple example I built myself, and let it speak for itself. And again, I’m a beginner with this. This is basically what I learned to do by following Chapter 1 of the tutorial.

I’ll summarize, though, one of the main distinguishing features of a Graph database1nota bene: one of the features. There are other features, and I have only begun to get my head around the implications of these differences.: basically, whereas “Relational Databases” store data in tables that are composed of rows and columns, a Graph thinks in terms of nodes and relationships. I’ll show you what I mean.

The Data

For various reasons that are not really germane to this post, I was interested in continents, countries and languages. So I went out to the web and got two lists:

  • Countries in the world by Continent. I limited this list to United Nations member countries only (with two exceptions, which are meaningless)
  • Countries in the world and the languages they speak.

These lists weren’t really all that pretty, but since I’m not writing a post on ETL, I won’t go into how I hammered it into shape. What I will say is that I loaded these lists into two different databases.

  1. A MySQL database with the following tables:
    • continents
    • continent_lookup
    • countries
    • language_lookup
    • languages
  2. A Neo4j database with:
    • Nodes for Continents, Countries, and Languages
    • Relationships between countries and the continents on which they’re situated
    • Relationships between languages and the countries by which they’re spoken. 2Note that a country can speak more than one language, and (of course), a language can be spoken by more than one country.

Now, if you’re really paying attention, you’ll already notice a couple of other key differences between the Graph and the MySQL — i.e., structured — data: in the Graph, there’s no such thing as a “NULL”, and you don’t need to think about one-to-one, one-to-many, many-to-many, and all that noise. But we’re not going into that stuff here. It’s extra credit reading.

The Sample Question

I want to find the answer to a simple question:

What are the countries that speak French, grouped by the continents they’re on?

The MySQL Way

Again, I’m not a database professional. But getting the answer to this simple question was kinda ugly, and I don’t really see a (much) better way to accomplish it:

Query

SELECT C3.`continent`,C1.`country`
FROM country C1

JOIN `continent_lookup` C2
ON C1.`id` = C2.`country_id`

JOIN continents C3 
ON c2.`continent_id` = C3.`id` 

JOIN lang_lookup L1 
ON C1.`id` = L1.`country_id` 

JOIN languages L2 
ON L1.`language_id` = L2.`id` 

WHERE L2.`language` = "French" 
AND C1.`un_joindate` IS NOT NULL
ORDER BY C3.`continent`,C1.`country`

Basically, I have to connect all three “main” tables with a surprising amount of contortions/calisthenics just to get that silly question answered. And here’s the answer, exactly as SQL spits it out:

Output

French-speaking countries by Continent
continentcountry
AfricaAlgeria
AfricaBenin
AfricaBurkina Faso
AfricaBurundi
AfricaCameroon
AfricaCentral African Republic
AfricaChad
AfricaComoros
AfricaCote d'Ivoire
AfricaDemocratic Republic of the Congo
AfricaDjibouti
AfricaEgypt
AfricaEquatorial Guinea
AfricaGabon
AfricaGuinea
AfricaMadagascar
AfricaMali
AfricaMauritania
AfricaMauritius
AfricaMorocco
AfricaNiger
AfricaRwanda
AfricaSenegal
AfricaSeychelles
AfricaTogo
AfricaTunisia
AsiaCambodia
AsiaLaos
AsiaLebanon
AsiaSyria
AsiaVietnam
Central America and the AntillesDominica
Central America and the AntillesGrenada
Central America and the AntillesHaiti
Central America and the AntillesSaint Lucia
Central America and the AntillesSaint Vincent and the Grenadines
Central America and the AntillesTrinidad and Tobago
EuropeAndorra
EuropeBelgium
EuropeFrance
EuropeGreece
EuropeItaly
EuropeLuxembourg
EuropeMonaco
EuropeSwitzerland
North AmericaCanada
OceaniaVanuatu
South AmericaArgentina
South AmericaBrazil

Not very visual, is it? Well, hang on!

The Neo4j Way

Here’s the query language. Yeah, it took (well, is still taking) some time to learn the syntax and grammar of Cypher (the Neo4j query language) but I think you’ll immediately see the difference:

Query

<

p style=”padding-left: 60px;”>MATCH (l:language { name: "French" })
-[r1:SPOKEN_BY]->(c1:country)
-[r2:LOCATED_ON]->(c2:continent)
RETURN l,c1,c2

Crazy, right? If you’re a geek, you immediately see the elegance of this.

But here’s where the beauty immediately transcends the geeky, and gets straight to the stuff that geeks really exist for: making data into information. When I enter those four lines of code into a Neo4j browser (basically, Neo4j can communicate directly through a regular web browser — I used Safari), here’s what I get:

Output

graph-bk

Wow! Now that’s something I can get my head around! You can see, right there, visually, the effect of French Colonialism. By contrast, if I plug “German” into that query, here’s the result:

graph2-bk

Amazing, right? You can see right there that Germany had no Louis!

Not to worry, though: under the hood, Neo4J is still able to understand the full context of the results. In other words, it’s not just pretty pictures. It can also output that ugly table — which actually is important if you want to feed the results into anything other than the diagrams above:

(You can scroll horizontally by click-dragging.)

French-speaking countries by Continent
keys/0keys/1keys/2length_fields/0/identity_fields/0/labels/0_fields/0/properties/name_fields/0/properties/id_fields/0/id_fields/1/identity_fields/1/labels/0_fields/1/properties/name_fields/1/properties/id_fields/1/id_fields/2/identity_fields/2/labels/0_fields/2/properties/continent_fields/2/properties/id_fields/2/id_fieldLookup/l_fieldLookup/c1_fieldLookup/c2
lc1c23317languageFrench126317188countryVietnam189188539continentAsia1539012
lc1c23317languageFrench126317186countryVanuatu187186546continentOceania8546012
lc1c23317languageFrench126317100countryMadagascar101100541continentAfrica3541012
lc1c23317languageFrench12631798countryLuxembourg9998540continentEurope2540012
lc1c23317languageFrench12631792countryLebanon9392539continentAsia1539012
lc1c23317languageFrench12631790countryLaos9190539continentAsia1539012
lc1c23317languageFrench12631781countryItaly8281540continentEurope2540012
lc1c23317languageFrench12631771countryHaiti7271542continentCentral America and the Antilles4542012
lc1c23317languageFrench12631768countryGuinea6968541continentAfrica3541012
lc1c23317languageFrench12631766countryGrenada6766542continentCentral America and the Antilles4542012
lc1c23317languageFrench12631765countryGreece6665540continentEurope2540012
lc1c23317languageFrench12631760countryGabon6160541continentAfrica3541012
lc1c23317languageFrench12631759countryFrance6059540continentEurope2540012
lc1c23317languageFrench12631753countryEquatorial Guinea5453541continentAfrica3541012
lc1c23317languageFrench12631751countryEgypt5251541continentAfrica3541012
lc1c23317languageFrench12631747countryDominica4847542continentCentral America and the Antilles4542012
lc1c23317languageFrench12631746countryDjibouti4746541continentAfrica3541012
lc1c23317languageFrench12631739countryCote d'Ivoire4039541continentAfrica3541012
lc1c23317languageFrench126317174countryTrinidad and Tobago175174542continentCentral America and the Antilles4542012
lc1c23317languageFrench126317175countryTunisia176175541continentAfrica3541012
lc1c23317languageFrench126317168countrySyria169168539continentAsia1539012
lc1c23317languageFrench126317172countryTogo173172541continentAfrica3541012
lc1c23317languageFrench126317151countrySeychelles152151541continentAfrica3541012
lc1c23317languageFrench126317167countrySwitzerland168167540continentEurope2540012
lc1c23317languageFrench126317144countrySaint Vincent and the Grenadines145144542continentCentral America and the Antilles4542012
lc1c23317languageFrench126317149countrySenegal150149541continentAfrica3541012
lc1c23317languageFrench126317141countryRwanda142141541continentAfrica3541012
lc1c23317languageFrench126317143countrySaint Lucia144143542continentCentral America and the Antilles4542012
lc1c23317languageFrench126317115countryMorocco116115541continentAfrica3541012
lc1c23317languageFrench126317124countryNiger125124541continentAfrica3541012
lc1c23317languageFrench126317108countryMauritius109108541continentAfrica3541012
lc1c23317languageFrench126317112countryMonaco113112540continentEurope2540012
lc1c23317languageFrench126317104countryMali105104541continentAfrica3541012
lc1c23317languageFrench126317107countryMauritania108107541continentAfrica3541012
lc1c23317languageFrench12631732countryCentral African Republic3332541continentAfrica3541012
lc1c23317languageFrench12631733countryChad3433541continentAfrica3541012
lc1c23317languageFrench12631737countryComoros3837541continentAfrica3541012
lc1c23317languageFrench12631744countryDemocratic Republic of the Congo4544541continentAfrica3541012
lc1c23317languageFrench12631727countryBurundi2827541continentAfrica3541012
lc1c23317languageFrench12631728countryCambodia2928539continentAsia1539012
lc1c23317languageFrench12631729countryCameroon3029541continentAfrica3541012
lc1c23317languageFrench12631730countryCanada3130545continentNorth America7545012
lc1c23317languageFrench12631716countryBelgium1716540continentEurope2540012
lc1c23317languageFrench12631718countryBenin1918541continentAfrica3541012
lc1c23317languageFrench12631723countryBrazil2423543continentSouth America5543012
lc1c23317languageFrench12631726countryBurkina Faso2726541continentAfrica3541012
lc1c23317languageFrench1263172countryAlgeria32541continentAfrica3541012
lc1c23317languageFrench1263173countryAndorra43540continentEurope2540012
lc1c23317languageFrench1263176countryArgentina76543continentSouth America5543012

I hope this was as enlightening for you as it was for me!

References   [ + ]

1. nota bene: one of the features. There are other features, and I have only begun to get my head around the implications of these differences.
2. Note that a country can speak more than one language, and (of course), a language can be spoken by more than one country.