This content introduces Cypher, a declarative graph query language for Neo4j, explaining its origins, core concepts like pattern matching, and basic operations for creating, querying, and managing data within a property graph.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
hey everyone. Welcome back to the Intro to Graph Databases Series. This is episode five,
and my name is Ryan Boyd from the Neo4j developer relations team. I'll be your guide today to
teach you an introduction to Cypher, the graph query language. In order to understand the
value in Cypher as a graph query language, it's important for you to understand why we
created Cypher. And that has to do with the history of the Neo4j developer surface. When Neo4j
first started in around year 2000, we had an embeddable Java API. This Java API allowed
you to imperatively traverse a graph, create new relationship and nodes, and that sort
of thing, but was only accessible within Java. Come to around the 1X series of Neo4j in
2010 and we wanted to extend these capabilities from Java into other clients - other clients
that are acting outside of the database server and they're on the network. And for that,
we created a REST-based API. And this REST-based API was still pretty low level. We had some
different requirements, though as we expanded Neo4j's adoption and Neo4j became more popular by
different communities. So we really wanted a declarative query language that's readable
and expressive. Very similar to how many developers use SQL when interacting tables, we wanted
a similar declarative language for graphs. We wanted it to be able to do all CRUD operations.
Create, read, update, delete, all the main operations. Not just be a querying language.
We also wanted to base it on patterns. The core patterns that you're looking for in a
graph. And we wanted to make it really powerful so that you could actually convert people
from using the more imperative languages over to this declarative language. And we wanted
to allow it to be opened up and adopted by other graph technology.
So in order to do this, we invented the next set of developer surface for Neo4j and that
was Cypher over HGP, the Cypher query language that we're going to review today. And that
Cypher over HGP library also allowed you to access Neo4j remotely from outside of the
Java environment, but provided this declarative language in order to access it. And then as
we advanced on and did a 3.0, 3.1, 3.2 series of Neo4j, we also launched the Bolt Protocol,
a binary protocol that makes it much more type safe to interact with Neo4j and provides
a series of official language drivers - such as Java, .NET, Python, and JavaScript - and
user-defined procedures and functions so that you could still do those interactions with
the Java API if you want to, but call them from within Cypher. All of the procedures
and functions can be called from within Cypher. And that was very important to us. From an
openist perspective, we did release Cypher under the openCypher Project. openCypher aims
to deliver a full and open specification of the industry's most widely adopted graph database
query language Cypher. You can visit opencypher.org to find out more about the openCypher Project
and other databases such as SAP HANA, which have adopted the Cypher technology.
Now, as I review Cypher, I first want to give you a reminder little bit of a recap on what
property graphs are. A property graph has a concept of a node and a relationship and
properties on both those nodes and relationships. So in the case here, we have Anne, who loves
Dan, Dan who loves Anne back. Anne lives with Dan and Anne drives a car which is owned by
Dan. Dan also drives that same car. So you can see here that it's very easy to read this
graph even without having a full understanding of graph databases, and Cypher, and property
graphs. It's really easy to read what's happening here. And this is an important characteristic
of Neo4j and graph databases is the Whiteboard Model is the physical model. We try to reduce
the number of translations between the business owner, and the developer, and the underlying
system, which is executing and storing data. So we create the nodes and relationships in
the underlying data store as sort of the nouns and the verbs and create the properties on
the nodes as sort of the adjectives and the properties on the relationships as sort of
the adverbs. So that's the overview or recap of the Property Graph Model.
Cypher as a query language is based off of patterns. It's about creating patterns in
a graph, patterns of nodes and relationships, and then it's about finding those patterns
when you're doing your queries. So here is an example of one pattern that you might specify.
It's a fairly complex pattern, but very easy to read, and understand, and even code in
Cypher. So we want to find out who drives a car that is owned by a lover. And in this
case, we just write it out. Match a person who drives a car which is owned by another
person and the original person loves that other person. It's a really simple query to
read and write, even though the question is a little bit more complex. Now, patterns in
Cypher use ASCII-Art. And for those of you who weren't around back in the day, ASCII-Art
is basically using the keys on the keyboard in order to generate graphics. And in this
case, ASCII-Art for nodes means that you're using parentheses to surround nodes. And you
can either just use a blank set of parentheses if you'll never need to refer to that node
again after that part in the query, or you can specify an alias inside the node, such
as P here, in order to refer to that node later in the query. There are also labels
or tags on nodes. These allow you to group nodes together by rolls and types.
So in the case here of a person, a person is also a mammal, so this person has a second
label that is a mammal, and we're going to still refer to that person as P as the alias.
Now, nodes can also have properties. So, for instance, you might want to set the name on
a person. In this case here, we're setting the name on a person as the string value Veronica.
These properties can have a wide variety of different types of values, including a lot
of the basic Java types and arrays of the basic types. Now, ASCII-Art for relationships.
Well, relationships are wrapped with hyphens or square brackets. [So you?] you can see
here. Let's say you were trying to talk about the hired relationships. So Joe hired John.
You can see here that we could either specify the relationship: hired with an alias H to
refer to that relationship later in the query, or we can just say that there is a relationship
without specifying the type and without specifying an alias. The direction of the relationship
is specified with less than and greater than simples. So you could see here person one.
Let's say, Kate, hired person two, let's say, John, or the vice versa. And in the vice versa
case, we have Kate was hired by John or John hired Kate. But it's just showing the opposite
direction with the less than symbol instead of the greater than symbol.
Relationships can also have properties which can be specified using the [stance of light?]
syntax here. In this case, specifying that that person was hired as a type fulltime employee.
Now I've mentioned the words aliases over time here, and I want to reemphasize this.
So the H in hired here, the P-one, the P-two, these all represent references. So you're
defining references or aliases such that, later in the query, you can access those references.
So in this case here, if these were in MATCH statement, we might want to return the person
back to the-- as a response to the query. Or if these were in a MATCH statement, maybe
we want to use that person that we found, and we want to add additional properties
or delete the person, or something along those lines. So these are simply aliases that make
it easier to access the MATCH nodes or relationships later in the query.
Now let's give you a basic create statement as well as a basic query statement. These
are very simple examples here. Later on, we'll get more and more complex, and we'll give
you more complex query operations as well. To create data inside the graph, you simply
specify it very much the same way as you would if you were trying to query for that data.
So if we wanted to create two nodes and a relationship between the nodes, in this case,
we wanted to create two-person nodes as the label, and have named properties on each of
them, and the love's relationship in between, we simply use the CREATE statement in Cypher
and we say, "CREATE: person brace and that [Jade?] unlike syntax name Ann loves another
person brace and the Jade syntax named Dan." And that's all there really is to it. This
is all that it takes to create these two nodes and the corresponding relationship and the
properties on the node in the graph. Now let's say we wanted to run a pretty basic
query and say, "Okay, we know Ann loves someone. Who does Ann love? Or whom does Ann love?"
And that query is actually quite simple here. We just say, "Find me a person named Ann who
loves another person." And then we're-- you can see how we're using the alias here called
OP to return that other person back as a result of the query. And this case, we're returning
the node and we get Dan of course so Ann loves Dan. And this is a really simple
example here, but it does show you the power of Cypher.
What happens if we want to add more properties to our graph? Well, let's find Ann's car.
We're going to find a person whose name is Ann. And you can see here, I use single quotes
here when specifying Ann. You can use either single quotes or double quotes. And let's
find a person whose name is Ann who drives a car, and let's figure out the car that Ann
drives. We can actually do this in another way. The first example was using the JSON-like
syntax for specifying the name of the person that we're searching for. In this case, we're
actually just saying that the pattern that we're searching for is a person who drives
a car, and then we're restricting the traversal by specifying the name as Ann. Both of these
queries should really have the same performance, but different ones are more readable than
others by different people. So take those two different options and understand that
they mean the same thing. But in this case, we're returning the car that Ann drives.
Now, let's say we wanted to add greater description to that car when you wanted to indicate the
brand and the model of the car. We can easily do that with the set operations. So the set
syntax in Cypher allows us to set additional properties on the node that we found. So it's
the same match statement. We're trying to find in the graph a person who drives a car,
where the person's name is Ann. And we're also trying to return that car. But before
we return it, we're going to set those two additional properties, the brand and the model,
and then the returned car will have those properties set. One important aspect of dealing
with graphs is dealing with the integrity of the graph, the integrity of the data, and
Neo4j really focuses on being a transactional database. An OLTP has ACID compliance. We're
all about making sure that the data that you set and the data that you return is all predictable.
And one of the aspects of that is ensuring uniqueness in the graph. We don't want you
to have a bunch of different Anns in your graph. Let's say that your graph looked like
this. How would you know how to differentiate one Ann from another Ann? That would be very
difficult. So if we're going to assume here that the name Ann is unique amongst the population
in our graph, we can actually ensure that, that there can only be one Ann.
We can do that with constraints. In this case here, we're going to say create constraint
on the person-labeled nodes, assert that the person's name is unique. It's a very simple
constraint but it's very powerful to prevent us from adding multiple Anns. So let's say
we did try to add another Ann. If we try to add another Ann by creating another person-labeled
node with the name of Ann, we would actually get an error that looks something like this.
Constraint validation failed, indicating that there's another node in our graph already
with the same label and the same property, and that violates our constraint that we set. Of
course, this is all fine and dandy, but we don't want to actually experience this error.
We want to actually be able to ensure uniqueness at the time that we create our nodes and relationships.
So let's say that we want Ann to have a pet dog. So we want to say Ann has pet and named
Sam. If we did something like this, create a person named Ann, has a pet dog named Sam,
we'd actually experience the same error as we talked about before, a constraint validation
error, because Ann already exsists in our database. So instrad of using two create statements
here, we can, instead, use a merged statement and saying merged person named Ann. Basically,
what this does is it looks in the graph for a person named Ann and operates on that person
node. If it does not exist, it will create it. And that way, when you execute the next
step in terms of creating the relationships, you can create that dog attached to either
the existing person or a new person. In this case here, what we're doing is saying,
"Let me find an existing person in the graph whose name is Ann. And if I do not find that,
then when I create a new person with the name of Ann, set the Twitter property to Ann's
Twitter handle. And then you'll notice the last statement here has actually changed the
create statement for the pet to a merged statement. And what this will do is create a pet-- or
has pet relationship from Ann to a dog named Sam if that exact pattern doesn't already
exist in the graph, and only if that exact pattern does not already exist in the graph.
So this will not ever result in Ann having two pets, both as dogs named Sam.
All right. So we have created our graph, and our graph has our Ann in it, has Sam in it,
saying Ann has that pet Sam, very basic introduction to Cypher and some of the create operations,
a little bit of querying and a little bit of modifying operations. And the next video
in the series-- okay. I probably shouldn't that again considering the delay between the
last couple of videos. But we will teach you, either through the next video or through our
documentation tutorials in the Neo4j sandbox, how you can do many more complicated queries
on your graph. You can also learn through our Neo4j online training at neo4j.com/graphacademy,
or through classroom training which is accessible there as well. So thank you very much and
I hope you have a fantastic day. And feel free to reach out if you have any questions
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.