What is JSONPath?

JSONPath is a query language for selecting and filtering elements of JSON structures. It's inspired by and very similar to XPath query language, used for selecting elements of XML documents.

Based on the proposal by Stefan Goessner, JSONPath comes implemented in  libraries for many high level programming languages. Python is no exception, with a few libraries available.

JSONPath Python Libraries

There are more than one JSONPath packages for Python. pythonpath-ng is the most feature-complete. It combines capabilities of pythonpath-rw and pythonpath-rw-ext with ability to update or remove nodes.

pythonpath-ng is also the most popular JSONPath package on pypi.org, so we'll use it in our examples below.

You can install the library using pip:

pip install jsonpath-ng

JSONPath Syntax

Here's a quick reference of JSONPath syntax.

If you're just getting started with JSONPath, you should probably just glance over it and jump straight to examples. You can always return to this table for reference.

If you're familiar with XPath, you'll notice that the syntax and the results are quite similar.

Expression Example Result
$ $ Selects the root object
.property or ['property'] $.movies or $['movies'] Returns a child element or property by name
* $.movies[*] or $.movies[0].* Wildcard. .* returns all fields of an element, [*] selects all members of an array
.. $..year Recursive descent - return all values of the given property in the structure. Here: returns all years from all movies.
[index] $.movies[0] Returns the child element at index
[0,1] $.movies[0].cast[0,1] Returns the first and second child elements
[start:end] $.movies[0].cast[:2] Similar to Python list slicing syntax. Return child elements at positions start through end
@ See below Reference to current object in filtering expressions
[?(filter)] $.movies[?(@.year < 1990)] Apply a filter to selected element. Here: Returns all movies where year < 1990

Using JSONPath in Python

In order to query a json with JSONPath, we'll first need a json file. Let's use a simple json structure with data about several movies:

{
	"movies": [
		{
			"title": "Reservoir Dogs",
			"director": "Quentin Tarantino",
			"year": 1992,
			"cast": [ "Harvey Keitel", "Tim Roth", "Michael Madsen", "Chris Penn" ]
		},
		{
			"title": "Pulp Fiction",
			"director": "Quentin Tarantino",
			"year": 1994,
			"cast": [ "John Travolta", "Uma Thurman", "Samuel L. Jackson", "Bruce Willis" ]
		},
		{
			"title": "Jackie Brown",
			"director": "Quentin Tarantino",
			"year": 1997,
			"cast": [ "Pam Grier", "Samuel L. Jackson", "Robert Forster", "Bridget Fonda", "Michael Keaton", "Robert De Niro" ]
		},
		{
			"title": "Kill Bill: Vol. 1",
			"director": "Quentin Tarantino",
			"year": 2003,
			"cast": [ "Uma Thurman", "David Carradine", "Daryl Hannah", "Michael Madsen", "Lucy Liu", "Vivica A. Fox" ]
		},
		{
			"title": "Kill Bill: Vol. 2",
			"director": "Quentin Tarantino",
			"year": 2004,
			"cast": [ "Uma Thurman", "David Carradine", "Daryl Hannah", "Michael Madsen", "Vivica A. Fox" ]
		},
		{
			"title": "Taxi Driver",
			"director": "Martin Scorsese",
			"year": 1976,
			"cast": [ "Robert De Niro", "Jodie Foster", "Cybill Schepherd" ]
		},
		{
			"title": "Goodfellas",
			"director": "Martin Scorsese",
			"year": 1990,
			"cast": [ "Robert De Niro", "Ray Liotta", "Joe Pesci" ]
		},
		{
			"title": "The Age of Innocence",
			"director": "Martin Scorsese",
			"year": 1993,
			"cast": [ "Daniel Day-Lewis", "Michelle Pfeiffer", "Winona Ryder" ]
		},
		{
			"title": "Mean Streets",
			"director": "Martin Scorsese",
			"year": 1973,
			"cast": [ "Robert De Niro", "Harvey Keitel", "David Proval" ]
		}
	]
}

If you want to follow along, you might want to save this into a JSON file movies.json. We'll load the file into Python and answer some queries using JSONPath.

Let's cut straight to an example:

import json
from jsonpath_ng.ext import parse

with open('./movies.json') as movies_json:
	movies = json.load(movies_json)

jsonpath_expression = parse("$.movies[?(@.cast[:] =~ 'De Niro')].title")

for match in jsonpath_expression.find(movies):
	print(match.value)
    

Run it and you'll get the following output:

Jackie Brown
Taxi Driver
Goodfellas
Mean Streets

This example prints titles of all movies from our movies.json starring Robert De Niro. How it works?

First, we use json module from the Python Standard Library to load our little movie database from ./movies.json.

When using jsonpath_ng we'll first parse our query using a parser.

from jsonpath_ng.ext import parse

...

jsonpath_expression = parse("$.movies[?(@.cast[:] =~ 'De Niro')].title")

In this case, we used the parse method from the jsonpath_ng.ext module which provides a more powerful parser than jsonpath_ng.parse. Specifically, we used it because we wanted to use a regular expression filter, but more on that later.

Nothing much happens when we run the line above. In order to actually execute our JSONPath query on the json object, we called the find method on our jsonpath_expression instance. It will return an array of match objects, or an empty array if no matches were found.

In order to get the actual matched element from our original json, we'll access the .value on match objects. So, in order to print all the matched movie titles, we iterate through matches and print the corresponding values:

for match in jsonpath_expression.find(movies):
	print(match.value)

JSONPath Examples

Let's go through some more examples. We'll start with easy and get to more convoluted ones.

Select all movie titles

jsonpath_expression = parse("$.movies[*].title")
Reservoir Dogs
Pulp Fiction
Jackie Brown
Kill Bill: Vol. 1
Kill Bill: Vol. 2
Taxi Driver
Goodfellas
The Age of Innocence
Mean Streets

Here we used $.movies[*] to select all elements of $.movies field. Then .title selects the title field of all returned movies elements.

Select title of the first movie

jsonpath_expression = parse("$.movies[0].title")
Reservoir Dogs

$.movies[0] selects the first movie by using [index] syntax.

Select all the years

jsonpath_expression = parse("$..year")
1992
1994
1997
2003
2004
1976
1990
1993
1973

.. is the recursive descent operator. It searches through all children of an element. Here we're looking for year property of all sub-elements of the root $. Hence $..year.

Select movies filmed before 1990

jsonpath_expression = parse("$.movies[?(@.year < 1990)]")
{'title': 'Taxi Driver', 'director': 'Martin Scorsese', 'year': 1976, 'cast': ['Robert De Niro', 'Jodie Foster', 'Cybill Schepherd']}
{'title': 'Mean Streets', 'director': 'Martin Scorsese', 'year': 1973, 'cast': ['Robert De Niro', 'Harvey Keitel', 'David Proval']}

We used the filter syntax: [?(filter_expression)].

Inside the filter expression, @ refers to each element of $.movies. Hence, we're saying: select all elements of $.movies such that for each movie year < 1990 .

Note that this returns entire movie nodes. We can also select a specific field for the filtered nodes - see the next example.

Select titles of movies filmed before 1990

jsonpath_expression = parse("$.movies[?(@.year < 1990)].title")
Taxi Driver
Mean Streets

Select movies starring Robert De Niro AND Uma Thurman

jsonpath_expression = parse(
	"$.movies[?(@.cast[*] =~ 'Uma Thurman' & @.cast[*] =~ 'Samuel L\. Jackson')]"
)
{'title': 'Pulp Fiction', 'director': 'Quentin Tarantino', 'year': 1994, 'cast': ['John Travolta', 'Uma Thurman', 'Samuel L. Jackson', 'Bruce Willis']}

Here we're using the regular expression match operator =~ together with logical AND & , to combine the two regex conditions.


Related articles

Code Review Best Practices - Lessons from the Trenches

Python Real World Applications - How Companies such as Netflix, Instagram and Airbnb Use Python