JSONPath in Python - Examples, Syntax and Tutorial
What is JSONPath?
JSONPath is a query language for selecting and filtering elements of JSON structures. It's inspired by XPath, a query language used for selecting elements of XML documents.
Based on the proposal by Stefan Goessner, JSONPath comes implemented in libraries for many high level programming languages. Python is no exception, with several libraries available.
JSONPath Python Libraries
There's more than one JSONPath packages for Python.
pythonpath-ng
is the most feature-complete. It combines capabilities of pythonpath-rw
and pythonpath-rw-ext
with ability to update or remove nodes.
It's also the most popular JSONPath package on pypi.org
, so we'll use it in our examples below.
You can install the library using pip
:
pip install jsonpath-ng
JSONPath Syntax
Here's a quick reference of JSONPath syntax.
If you're just getting started with JSONPath, you should probably just glance over it and jump straight to examples. You can always return to this table for reference.
If you're familiar with XPath, you'll notice that the syntax and the results are quite similar.
Expression | Example | Result |
---|---|---|
$ |
$ |
Selects the root object |
.property or ['property'] |
$.movies or $['movies'] |
Returns a child element or property by name |
* |
$.movies[*] or $.movies[0].* |
Wildcard. .* returns all fields of an element, [*] selects all members of an array |
.. |
$..year |
Recursive descent - return all values of the given property in the structure. Here: returns all years from all movies. |
[index] |
$.movies[0] |
Returns the child element at index |
[0,1] |
$.movies[0].cast[0,1] |
Returns the first and second child elements |
[start:end] |
$.movies[0].cast[:2] |
Similar to Python list slicing syntax. Return child elements at positions start through end |
@ |
See below | Reference to current object in filtering expressions |
[?(filter)] |
$.movies[?(@.year < 1990)] |
Apply a filter to selected element. Here: Returns all movies where year < 1990 |
Using JSONPath in Python
In order to query a json with JSONPath, we'll first need a json file. Let's use a simple json structure with data about several movies:
{
"movies": [
{
"title": "Reservoir Dogs",
"director": "Quentin Tarantino",
"year": 1992,
"cast": [ "Harvey Keitel", "Tim Roth", "Michael Madsen", "Chris Penn" ]
},
{
"title": "Pulp Fiction",
"director": "Quentin Tarantino",
"year": 1994,
"cast": [ "John Travolta", "Uma Thurman", "Samuel L. Jackson", "Bruce Willis" ]
},
{
"title": "Jackie Brown",
"director": "Quentin Tarantino",
"year": 1997,
"cast": [ "Pam Grier", "Samuel L. Jackson", "Robert Forster", "Bridget Fonda", "Michael Keaton", "Robert De Niro" ]
},
{
"title": "Kill Bill: Vol. 1",
"director": "Quentin Tarantino",
"year": 2003,
"cast": [ "Uma Thurman", "David Carradine", "Daryl Hannah", "Michael Madsen", "Lucy Liu", "Vivica A. Fox" ]
},
{
"title": "Kill Bill: Vol. 2",
"director": "Quentin Tarantino",
"year": 2004,
"cast": [ "Uma Thurman", "David Carradine", "Daryl Hannah", "Michael Madsen", "Vivica A. Fox" ]
},
{
"title": "Taxi Driver",
"director": "Martin Scorsese",
"year": 1976,
"cast": [ "Robert De Niro", "Jodie Foster", "Cybill Schepherd" ]
},
{
"title": "Goodfellas",
"director": "Martin Scorsese",
"year": 1990,
"cast": [ "Robert De Niro", "Ray Liotta", "Joe Pesci" ]
},
{
"title": "The Age of Innocence",
"director": "Martin Scorsese",
"year": 1993,
"cast": [ "Daniel Day-Lewis", "Michelle Pfeiffer", "Winona Ryder" ]
},
{
"title": "Mean Streets",
"director": "Martin Scorsese",
"year": 1973,
"cast": [ "Robert De Niro", "Harvey Keitel", "David Proval" ]
}
]
}
If you want to follow along, you might want to save this into a JSON file movies.json
. We'll load the file into Python and answer some queries using JSONPath.
Let's cut straight to an example:
import json
from jsonpath_ng.ext import parse
with open('./movies.json') as movies_json:
movies = json.load(movies_json)
jsonpath_expression = parse("$.movies[?(@.cast[:] =~ 'De Niro')].title")
for match in jsonpath_expression.find(movies):
print(match.value)
Run it and you'll get the following output:
Jackie Brown
Taxi Driver
Goodfellas
Mean Streets
This example prints titles of all movies from our movies.json
starring Robert De Niro. How it works?
First, we use json
module from the Python Standard Library to load our little movie database from ./movies.json
.
When using jsonpath_ng
we'll first parse our query using a parser.
from jsonpath_ng.ext import parse
...
jsonpath_expression = parse("$.movies[?(@.cast[:] =~ 'De Niro')].title")
In this case, we used the parse
method from the jsonpath_ng.ext
module which provides a more powerful parser than jsonpath_ng.parse
. Specifically, we used it because we wanted to use a regular expression filter, but more on that later.
Nothing much happens when we run the line above. In order to actually execute our JSONPath query on the json object, we called the find
method on our jsonpath_expression
instance. It will return an array of match objects, or an empty array if no matches were found.
In order to get the actual matched element from our original json, we'll access the .value
on match objects. So, in order to print all the matched movie titles, we iterate through matches and print the corresponding values:
for match in jsonpath_expression.find(movies):
print(match.value)
JSONPath Examples
Let's go through some more examples. We'll start with easy and get to more convoluted ones.
Select all movie titles
jsonpath_expression = parse("$.movies[*].title")
Reservoir Dogs
Pulp Fiction
Jackie Brown
Kill Bill: Vol. 1
Kill Bill: Vol. 2
Taxi Driver
Goodfellas
The Age of Innocence
Mean Streets
Here we used $.movies[*]
to select all elements of $.movies
field. Then .title
selects the title field of all returned movies
elements.
Select title of the first movie
jsonpath_expression = parse("$.movies[0].title")
Reservoir Dogs
$.movies[0]
selects the first movie by using [index]
syntax.
Select all the years
jsonpath_expression = parse("$..year")
1992
1994
1997
2003
2004
1976
1990
1993
1973
..
is the recursive descent operator. It searches through all children of an element. Here we're looking for year
property of all sub-elements of the root $
. Hence $..year
.
Select movies filmed before 1990
jsonpath_expression = parse("$.movies[?(@.year < 1990)]")
{'title': 'Taxi Driver', 'director': 'Martin Scorsese', 'year': 1976, 'cast': ['Robert De Niro', 'Jodie Foster', 'Cybill Schepherd']}
{'title': 'Mean Streets', 'director': 'Martin Scorsese', 'year': 1973, 'cast': ['Robert De Niro', 'Harvey Keitel', 'David Proval']}
We used the filter syntax: [?(filter_expression)]
.
Inside the filter expression, @
refers to each element of $.movies
. Hence, we're saying: select all elements of $.movies
such that for each movie year < 1990
.
Note that this returns entire movie
nodes. We can also select a specific field for the filtered nodes - see the next example.
Select titles of movies filmed before 1990
jsonpath_expression = parse("$.movies[?(@.year < 1990)].title")
Taxi Driver
Mean Streets
Select movies starring Uma Thurman AND Samuel L. Jackson
jsonpath_expression = parse(
"$.movies[?(@.cast[*] =~ 'Uma Thurman' & @.cast[*] =~ 'Samuel L\. Jackson')]"
)
{'title': 'Pulp Fiction', 'director': 'Quentin Tarantino', 'year': 1994, 'cast': ['John Travolta', 'Uma Thurman', 'Samuel L. Jackson', 'Bruce Willis']}
Here we're using the regular expression match operator =~
together with logical AND &
, to combine the two regex conditions.
Related articles
Code Review Best Practices - Lessons from the Trenches
Python Real World Applications - How Companies such as Netflix, Instagram and Airbnb Use Python