How to extract data from json files in bash/shell

Json is everywhere these days and perhaps like me, you may find yourself writing some shell scripts and needing to pull some value out of some json file you have.

Jq is a command line utility that makes this straifghtforward, it's a popular solution that's supported on pretty much every unix platform and comes prepackaged on some OSs (not on OSX unfortunately, but installing it is trivial with homebrew: brew install jq).

Of course there are other ways to pull data out of json in shell - you could use sed or awk but most of these solutions tend to be a little on the ugly side because those tools are not really tailored specifically for json.

To give you a taste of jq let's write a shell script that grabs the top 10 hacker news stories and displays the title, author and url.

TOP_10_STORIES=$(curl -s https://hacker-news.firebaseio.com/v0/topstories.json | jq -r '.[0:10] | .[]')
for i in $TOP_10_STORIES
do
STORY=$(curl -s https://hacker-news.firebaseio.com/v0/item/$i.json)
echo "Title: $(echo $STORY | jq -r .title)"
echo "User: $(echo $STORY | jq -r .by)"
echo "User: $(echo $STORY | jq -r .url)"
echo "------------"
done

That's it! Pretty short huh - Ok let's break this down.

We start by curling the hacker news topstories endpoint which returns a json array of story ids which looks something like this:

[
19801708,
19805306,
19806188,
19810163,
19803783,
19802914
// ...
]

First we hand this blob of json of to jq, by piping | it to our jq command, which in turn looks like this: jq -r '.[0:10] | .[].

  • -r is a flag which tells jq to strip quotation marks and give us the raw value.

  • .[0:10] is a range filter which slices the array giving us the first 10 elements. The result of which looks like this:

[
19801708,
19805306,
19806188,
19810163,
19803783,
19802914,
19810901,
19804922,
19785500,
19802137
]

The problem is jq does not stripped out the square brackets, so if we want to iterate over this we will end up looping over the brackets as well. To "unwrap" the values we must pipe the output of the range filter to another filter - .[] - which will unwrap the values for us.

We can then safely iterate over each id:

TOP_10_STORIES=$(curl -s https://hacker-news.firebaseio.com/v0/topstories.json | jq -r '.[0:10] | .[]')
for i in $TOP_10_STORIES
do
# do something with each $i
done

For each id we make another curl request to retrieve the details for the item. The json payload returned to us looks like this:

{
"by": "dhouston",
"descendants": 71,
"id": 8863,
"kids": [
8952,
9224,
8917
// ..
],
"score": 111,
"time": 1175714200,
"title": "My YC app: Dropbox - Throw away your USB drive",
"type": "story",
"url": "http://www.getdropbox.com/u/2/screencast.html"
}

All that's left now is to print out the properties we're interested in. Extracting a property with jq is super simple: jq -r .myProperty. So we don't have to make the same request for each property we assign the curl output to a variable

STORY=$(curl -s https://hacker-news.firebaseio.com/v0/item/$i.json)

Create our little property getter expression: echo $STORY | jq -r .title and interpolate that expression in our prettified string, echoing it out for the user, repeating for each property:

echo "Title: $(echo $STORY | jq -r .title)"
echo "User: $(echo $STORY | jq -r .by)"
echo "User: $(echo $STORY | jq -r .url)"

The final output should look something like this:

Title: Twisted graphene has become the big thing in physics
User: furcyd
User: https://www.quantamagazine.org/how-twisted-graphene-became-the-big-thing-in-physics-20190430/
------------
Title: Huge study finds drugs stop HIV transmission
User: ahakki
User: https://www.theguardian.com/society/2019/may/02/end-to-aids-in-sight-as-huge-study-finds-drugs-stop-hiv-transmission
------------
Title: New physics needed to probe the origins of life
User: headalgorithm
User: https://www.nature.com/articles/d41586-019-01318-z
------------
Title: Verizon reportedly seeking to sell Tumblr
User: doppp
User: https://techcrunch.com/2019/05/02/verizon-reportedly-seeking-to-sell-tumblr/