Parseable Release v1.3.0

July 6, 2024 · 8 min read

Founder

Parseable v1.3.0 is out now! This release includes a good mix of new features, improvements, and bug fixes. In this post, we'll take a a detailed look at what’s new in this release.

We'll love to hear your feedback on this release. Please feel free to create an issue on our GitHub or join our Slack community.

New Features

Saved filters and queries in the explore logs view

A long-pending request from our community has been the ability to save a filter in order to return at a later date to a specific view without having to re-apply the filter from scratch. The same goes for queries.

We initially considered implementing this as a purely client-side feature, i.e. on the Console only, to deliver it more quickly. The idea was to use the browser's local store to keep a saved filter’s details and then load it from there on demand. But this approach would have been too limiting; for instance, the same user would not have been able to see their saved filters when logging in from a different browser or IP address. Also, sharing filters across users would not work and any browser event that cleared local storage would essentially mean the loss of all the saved filters, many of which are carefully created after months of analysis.

The other option was to do it correctly, i.e. a server-driven approach. All filters from a user are now reliably stored on the backend (the same store where log data is kept). There are API calls for all the CRUD operations related to filters, but instead of a database, we use S3 or disk as the store. We’ve seen this approach scale extremely well with the log data, and it made sense to follow a similar approach for user-saved filters.

The explore page now includes the ability to save filters and apply them. A saved filter can also optionally include the time range. If you include the time range, this essentially means you’ll see the exact same data when you apply the filter. Otherwise, the same filter will be applied to whatever time range is already set.

Saved Filters

Auto-completion in pb

If you like the command line, you’ll love pb, our command line client for Parseable. With the latest pb release, we’ve also added auto-completion to pb. You can now use the tab key to automatically complete the commands.

Improvements

S3 Call Optimization

We reduced S3 calls throughout various Parseable subsystems. This is done by keeping an in-memory system. This system is always updated and tries to serve results related to stream and its metadata from in memory, instead of an S3 call. This reduces costs for our customers further.
We also improved the stream creation flow in distributed mode. With this release, the querier node pushes updates to all live ingestor nodes at the time of stream creation and updating. This provides two benefits. First, it reduces the S3 calls needed because previously, the ingestor node had to check for a stream in S3 before starting ingestion. Second, it improves ingestion performance because the stream is already created and does not need to be checked before ingestion.

Bug Fixes

With the last release v1.2.0, there were reports of a few panic/crashes during certain workflows. We’ve taken time and deeply looked into these occurrences. The below issues were reproduced and fixed in this release:

a. panic caused in syncing invalid parquet files from staging to storage b. panic caused in resolving schema from invalid arrow files from staging c. panic caused in data type mismatches d. panic caused when using custom time partition in a stream and log event has null value in the time partition field e. panic caused when using custom column partitions in a stream and log event has null value in the column partition field

Load Testing

Performance under load is a key aspect of any data analytics system, and Parseable is no exception. As we progress, we’ll keep improving to ensure Parseable is the fastest purpose-built log analytics platform on the market. With this release onwards, we’ll publish details on ingestion performance. Even though we’ve been running these tests internally for a while, we’re now making them public.

Parseable Setup

Parseable was deployed on Kubernetes cluster in distributed mode. We set up 3 ingest nodes and 1 query node in the Parseable cluster. Each Parseable pod was allocated 2 vCPU and 4 GiB memory. We also ensured to deploy each pod on a separate node to avoid any resource contention.

Load Generation

We use K6 on Kubernetes for load testing. K6 is a modern load testing tool that allows you to write test scripts in JavaScript. For simpler deployment and ease in scale out, we used the K6 Operator. Refer the steps we followed to set up K6 on Kubernetes in this blog post.

The load testing script is available in the Quest repository. The load testing script generates a single HTTP request with several events batched. This script is configurable with:

Number of schema P_SCHEMA_COUNT - total number of different schema types present in a single HTTP call.
Number of events P_EVENTS_COUNT - total number of events per schema to be sent in a single batch.
Virtual users VUs - the simulated users that run separate and concurrent iterations of this test script. This is a K6 construct.

For example if you’ve set P_SCHEMA_COUNT=30 and P_EVENTS_COUNT=40 and VUs are set to 10, this means there will be 10 virtual processes running this script in parallel. Each iteration of the script will generate a total of 1200 events in a single HTTP call (sent to the Parseable server). In this 1200 events, you’ll have 40 events of a single schema, next 40 events with a different schema and so on.

Results

Test Run 1: 1 Query, 3 Ingestor Nodes. 2 vCPU, 4 Gi Memory each node. 9 k6 clients to ingest data, Number of batches per http requests - 300, Run time: 10 mins

Metric	Ingestor1	Ingestor2	Ingestor3	Total
HTTP Requests Count	81317	73610	83384	238311
Total Events Ingested in 10 mins (HTTP Requests * Number of Batches per Request)	24395100	22083000	25015200	71493300
Total Events Ingested in 1 sec	40658.5	36805	41692	119155.5
Throughput in MB/sec				44.5 MB/s

Test Run 2: 1 Query, 3 Ingestor Nodes, 2 vCPU, 4 Gi Memory each node, 15 k6 clients to ingest data, Number of batches per http requests - 300, Run time: 10 mins

Metric	Ingestor1	Ingestor2	Ingestor3	Total
HTTP Requests Count	123603	118308	107871	349782
Total Events Ingested in 10 mins (HTTP Requests * Number of Batches per Request)	37080900	35492400	32361300	104934600
Total Events Ingested in 1 sec	61801.5	59154	53935.5	174891
Throughput in MB/sec				65.5 MB/s

Test Run 3: 1 Query, 3 Ingestor Nodes, 3 vCPU, 4 Gi Memory each node, 15 k6 clients to ingest data, Number of batches per http requests - 300, Run time: 10 mins

Metric	Ingestor1	Ingestor2	Ingestor3	Total
HTTP Requests Count	136036	138804	124195	399035
Total Events Ingested in 10 mins (HTTP Requests * Number of Batches per Request)	40810800	41641200	37258500	119710500
Total Events Ingested in 1 sec	68018	69402	62097.5	199517.5
Throughput in MB/sec				74.7 MB/s

Test Run 4: 1 Query, 3 Ingestor Nodes, 3 vCPU, 4 Gi Memory each node, 15 k6 clients to ingest data, Number of batches per http requests - 525, Run time: 10 mins

Metric	Ingestor1	Ingestor2	Ingestor3	Total
HTTP Requests Count	83811	86633	74761	245205
Total Events Ingested in 10 mins (HTTP Requests * Number of Batches per Request)	44000775	45482325	39249525	128732625
Total Events Ingested in 1 sec	73334.625	75803.875	65415.875	214554.375
Throughput in MB/sec				76.8 MB/s

New Features​

Saved filters and queries in the explore logs view​

Auto-completion in pb​

Improvements​

S3 Call Optimization​

Bug Fixes​

Load Testing​

Parseable Setup​

Load Generation​

Results​

Get Updates from Parseable