Parseable

What is Parseable?


Parseable is an open source, column-oriented data lake platform, purpose built for observability. It treats growing telemetry data as a data engineering problem rather than a search or time-series problem.

Parseable is written in Rust and built on Apache Arrow and Apache Parquet. Rust gives predictable, low-latency performance with no garbage collection pauses and a small memory footprint.

Arrow is the in-memory columnar representation, which makes query execution vectorized and data exchange zero copy. Parquet is the storage at rest format — open, columnar, and widely supported.

  • Data lake engine: The Parseable data lake engine is built for high-throughput ingestion. Incoming telemetry is staged locally and then forwarded to object storage in batches. This stage-and-forward design absorbs ingestion spikes without pushing backpressure onto the sender, and writes efficiently sized objects to the store. The engine handles compression, object storage prefix and partition layout, and metadata management.

  • Query engine: The query engine supports SQL and PromQL natively. Storage and compute are separate, so query nodes scale independently of ingestion. To keep queries fast, the engine caches frequently accessed data on the query node's local disk, serving hot queries from local NVMe rather than round-tripping to object storage on every call.

As every telemetry signal is stored as standard Parquet on object storage, the data is never trapped inside Parseable: any Parquet compatible engine can read it.

Telemetry signals

Telemetry signals have different shapes, query patterns, and volumes. Logs are wide and dynamic, metrics are dense and numeric, traces are linked trees of spans — and each performs best when its storage layout, schema handling, and query interface match the way that signal is actually produced and read.

Data lake architecture was built for exactly this: holding distinct but related datasets — structured, semi-structured, and unstructured, in one open store. With storage decoupled from compute so each dataset can be queried on its own terms. That makes the data lake the natural foundation for observability.

Logs

Logs are the generally highest-volume signals. They range from unstructured text to semi-structured JSON, with constantly drifting schemas. The common pattern for queries is a filter over a time window — where people are trying to find the needle. Teams are also increasingly using aggregations over wide, high-cardinality events.

Parseable can ingest logs with dynamic or fixed schema. For dynamic schema, new fields are picked up automatically, and sparse or wide events stay cheap because columnar storage only reads the columns a query touches.

For verbose, unstructured log strings, Parseable offers LogIQ, a mechanism to split the log data into separate columns automatically, allowing better compression and performance.

Metrics

Metrics are numeric time series identified by a name and a set of labels. The hard problem is cardinality: every new combination of label values is another series, and traditional TSDBs slow down or fall over as that count climbs.

Parseable stores metrics in the columnar data lake and exposes them through native PromQL. With additional metadata like series hash and frequency, label values are stored as column data rather than expanded into a per-series index, so high-cardinality labels don't carry any penalty. Metrics can be ingested over OpenTelemetry or Prometheus remote write.

Traces

A trace is a tree of spans tied together by a trace ID, where each span is a structured event with its own timing and attributes. Trace workloads mix two access patterns: fetch a single trace by ID, and aggregate across many spans for service-level views — latency percentiles, error rates, dependency maps.

Parseable stores spans as columnar events, so both patterns run on the same data — trace lookup by ID, and the aggregations behind its APM features. Traces are ingested over OTLP.

Use cases

  • Cutting observability costs: Object storage as the primary store, with columnar Parquet compression, ties cost to S3 rather than an always-hot SSD fleet.
  • Observability for sensitive environments: Complete telemetry data sovereignty while maintaining full observability capabilities.
  • High-cardinality and wide-event workloads: Columnar storage and query execution make high-cardinality labels and wide, dynamic logs first-class citizens.
  • Telemetry data as an open, queryable data asset: Whether for ad-hoc analysis, LLM training or fine-tuning, or custom applications, your telemetry data is always open and accessible in Parquet format.
  • Consolidating observability tooling: Get rid of tool sprawl with a unified layer for all your telemetry signals, and a single source of truth for your data.

Key differentiators

Data lake design

All signals share a columnar foundation, with storage decoupled from compute. Up to 90% compression

Run anywhere

Deploy as a managed cloud, in your own cloud account, on-prem, or fully air-gapped.

OpenTelemetry native

Ingest, manage and query OpenTelemetry logs, metrics or traces natively. Zero configuration needed

Zero lock-in

Open source code, open Parquet storage. Zero lock-in of your observability data

Built for speed

Vectorized query execution and millisecond-range responses, backed by NVMe and memory caching

Resource utilization

Up to 75% less memory and 50% less CPU than Elastic at the same ingestion and query load

Next steps

Learn about the telemetry data lake architecture

Start with Parseable Cloud or run it on your laptop

Send telemetry data into Parseable

PromQL, alerting, dashboards, anomaly detection

Integrate with your existing tools and workflows

Was this page helpful?

On this page