Traditionally, Linux tools like grep
, awk
, and sed
help you slice and dice log files on the server. This works great if you have ssh
access to the server and you are interested in files generated on that server.
However, as systems become containerised and logs are unified at a central location, the simplicity of handling log files in the command line interface is being lost. This is a challenge for users who live inside their terminals.
Parseable wrote the CLI client pb
to address this challenge. The idea behind pb
is to allow terminal aficionados to analyze their centralised, unified logs in their terminals using the tools they love.
This post shows how to use pb
to analyze logs stored in Parseable using the query subcommand and pipe the results into and through the popular command-line tool jq
to process and format them further.
To make the workflow concrete, the post uses logs from the Nginx Ingress controller on Kubernetes as an example. It then goes over several recipes to see which parameters are most useful in these logs.
Prerequisites
Install pb
pb allows you to manage Streams, Users, and Data on a Parseable server instance. It is available as a single binary for all popular operating systems. Head over to the latest releases page and download the binary for your OS.
Once downloaded, on Unix-like systems, make the binary executable using:
chmod +x ./pb
You should now be able to run the binary. For example, use the --help
flag, and you should see output like the following:
➜ ./pb -h
pb is the command line interface for Parseable
Usage:
pb [flags]
pb [command]
Available Commands:
autocomplete Generate autocomplete script
cluster Cluster operations for Parseable.
help Help about any command
profile Manage different Parseable targets
query Run SQL query on a log stream
role Manage roles
stream Manage streams
tail Stream live events from a log stream
user Manage users
version Print version
Flags:
-h, --help help for pb
-v, --version Print version
Use "pb [command] --help" for more information about a command.
You need a running Parseable cluster with data to work with. If you have your own, then you can use that, saving the configuration for Parseable instances using profiles:
pb profile add local http://localhost:8000 admin admin
If you don't set a profile, pb
defaults to using the Parseable demo cluster, http://demo.parseable.com. Read more about setting up a Parseable server in the docs.
Install jq
jq
is a popular command-line tool for filtering, mapping, and formatting JSON data. As pb
outputs
data in JSON, this tutorial uses jq
to process the JSON output of querying log streams from Parseable.
If you don't already have it installed, follow the installation documentation for jq.
Run queries
You use pb
to run standard SQL queries against Parseable log stream data with an optional time range. For example, the command below returns all columns of all rows in the last minute:
./pb query run "select * from backend" --from=1m --to=now
You can also live tail all the data for a log stream:
./pb tail backend
Stop tailing with Ctrl+C.
If you run either of these queries, the returned results are long and hard to read. This is where jq
comes in.
Processing and formatting results with jq
By piping (a Linux method for directing the output of one command into another) the pb
command into jq
using the '.' argument, you format the JSON data in a more human-readable way:
./pb query run "select * from backend" --from=1h --to=now | jq '.'
Even though it is well-formatted, this is a lot to read. You can also return a subarray of the results, for example, items 10 to 20:
./pb query run "select * from backend" --from=1h --to=now | jq '.[10:20]'
jq
allows for some complex filtering and data transformations using a wide variety of operators and functions, including map functions.
To do this, you use a map() function and pass criteria for mapping into the function.
For example, maybe you're only interested in viewing rows that contain a particular value, such as PATCH, in the method field. Pass the select() method into the map containing a logical expression:
./pb query run "select host, id, method, status from backend where status = 500" | jq '. | map(select(.method == "PATCH"))'
You can use regular expressions for further processing and manipulation. Maybe you want to take the value of a date and time field, and split the field into separate date and time parts, and output them to a new JSON array:
./pb query run "select p_timestamp from backend" | jq '. | map(.p_timestamp | capture("(?<date>[^T]+)T(?<time>.+)"))'
Outputting results
In all these example cases, you can then use output redirection to save the results into a file, for example, to save the last example to a dates.json file:
./pb query run "select p_timestamp from backend" | jq '. | map(.p_timestamp | capture("(?<date>[^T]+)T(?<time>.+)"))' > dates.json
Nginx Ingress Recipes
Now that you have a better idea of using pb
with jq
to query, parse, and filter logs, let's look at a fixed log format, such as logs generated by Nginx Ingress on a Kubernetes cluster, and see how to use pb
and jq
together to achieve the most common analysis tasks.
On the Parseable demo instance is a log stream collecting Nginx Ingress logs live (in the stream ingress-nginx
, so you can point pb
to the demo
instance using
./pb profile default demo
After this, you can run all these commands to see them in action.
- Find all HTTP errors (status codes 4xx and 5xx) in the last 1 hour.
./pb query run "SELECT * FROM \`ingress-nginx\` WHERE status >= 400" --from=1h --to=now | jq '.[]'
- Count of all HTTP errors in the last 1 hour.
./pb query run "SELECT status, COUNT(*) as count FROM \`ingress-nginx\` WHERE status >= '400' GROUP BY status" --from=1h --to=now | jq '.[]'
- Identify slow requests (taking more than 1 second) in the last 1 hour.
./pb query run "SELECT * FROM \`ingress-nginx\` WHERE CAST(request_time AS FLOAT) > 1.0" --from=1h --to=now | jq '.[] | {time: .p_timestamp, endpoint: .request, duration: .request_time, status: .status}'
- Analyse traffic by endpoint in the last 1 hour.
./pb query run "SELECT request, COUNT(*) as count FROM \`ingress-nginx\` GROUP BY request ORDER BY count DESC LIMIT 10" --from=1h --to=now | jq '.[] | "\(.request): \(.count)"'
- Status code distribution over the last 1 hour.
./pb query run "SELECT status, COUNT(*) as count FROM \`ingress-nginx\` GROUP BY status" --from=1h --to=now | jq -r '.[] | [.status, .count] | @tsv' | sort -k1,1
- Find requests to specific Kubernetes service (demo-parseable-ingestor-service-80) in the last 1 hour.
./pb query run "SELECT * FROM \`ingress-nginx\` WHERE proxy_upstream_name = 'demo-parseable-ingestor-service-80'" --from=1h --to=now | jq '.[] | {timestamp: .p_timestamp, method: .method, path: .request, status: .status}'
- Average response time by endpoint in the last 1 hour.
./pb query run "SELECT request, AVG(CAST(request_time AS FLOAT)) as avg_time FROM \`ingress-nginx\` GROUP BY request ORDER BY avg_time DESC LIMIT 10" --from=1h --to=now | jq -r '.[] | "\(.request): \(.avg_time) seconds"'
- Segregate traffic by user agent in the last 1 hour.
./pb query run "SELECT http_user_agent, COUNT(*) as count FROM \`ingress-nginx\` GROUP BY http_user_agent ORDER BY count DESC" --from=1h --to=now | jq '.[] | {agent: .http_user_agent, count: .count}'
- Requests per hour over the last 1 day.
./pb query run "SELECT DATE_TRUNC('hour', p_timestamp) as hour, COUNT(*) as count FROM \`ingress-nginx\` GROUP BY hour ORDER BY hour" --from=1d --to=now | jq -r '.[] | "\(.hour): \(.count) requests"'
- Summary of response codes and times in the last 1 day.
./pb query run "SELECT * FROM \`ingress-nginx\` LIMIT 1000" --from=3h --to=now | jq -r 'group_by(.status) | map({status: .[0].status, count: length, avg_time: (map(.request_time | tonumber) | add / length)}) | sort_by(.count) | reverse | .[] | "\(.status)\t\(.count)\t\(.avg_time)"' | column -t
Summary
This post showed how to use pb
command-line client and jq
to create complex analyses of log streams. jq
has pages of options available, and thanks to the power of piping in Linux, the possibilities it brings when working with Parseable data are endless.
Do you prefer this blog as a video? View this on our YouTube channel here