23 posts tagged with "Parseable"

Read about Parseable features, integrations, and capabilities for efficient log management and data analysis.

Storage Modes, Configuration Options and Authentication Flavours

November 5, 2024 · 9 min read

Head of Engineering @ Parseable

Parseable is a cloud native log data analytics engine, written in Rust and uses Apache Arrow and Parquet as underlying data structures. Its ability to interface with cloud storage solutions like AWS S3 and Azure Blob Storage enables long-term, cost-effective log retention.

This guide provides an in-depth overview of the configuration parameters and authentication methods Parseable supports for these cloud storage providers, along with advanced settings available with Parseable.

This guide covers each storage provider's connection and authentication settings, making it easy to configure Parseable to meet your needs.

AWS S3 Configuration for Parseable

Parseable provides comprehensive support for connecting to AWS S3 or S3 compatible e.g. MinIO. This section outlines the mandatory parameters, supported authentication methods, and additional configurations to fine-tune the S3 connection.

Mandatory Environment Variables for AWS S3

To establish a connection to AWS S3, you need to set the following mandatory environment variables:

P_S3_URL: The endpoint for AWS S3 or compatible storage. Defaults to the region-based endpoint (e.g., s3.us-east-2.amazonaws.com for the us-east-2 region).
P_S3_REGION: Specifies the AWS region where the S3 bucket is located. Please refer to Amazon Simple Storage Service endpoints and quotas for the regions and their respective endpoints.
P_S3_BUCKET: Defines the S3 bucket name where Parseable will store log data.

Authentication Options for AWS S3

Parseable supports multiple authentication mechanisms for AWS S3, offering flexibility for different deployment environments:

Access Key and Secret Key: Add the environment variables P_S3_ACCESS_KEY and P_S3_SECRET_KEY. The AWS access key paired with the secret key is used to authenticate to AWS and access S3 bucket based on the permissions provided. This is essential if Parseable is running outside of AWS EC2 or ECS, where IAM roles are unavailable.
IMDSV1 Fallback: For Parseable instances running on EC2, AWS credentials can be sourced from the Instance Metadata Service (IMDS), avoiding the need for explicit P_S3_ACCESS_KEY and P_S3_SECRET_KEY. To enable this, set the environment variable P_AWS_IMDSV1_FALLBACK and configure your EC2 instance to allow Instance Metadata (IMDS) access. First, you need to enable Instance Metadata Service (IMDS) when creating your EC2 instance (under Advanced details section) which is required to obtain the credentials.

Secondly, select the Metadata version to V1 and V2 (token optional). Please refer to the metadata service docs for more.

Metadata Endpoint: Add the optional environment variable P_AWS_METADATA_ENDPOINT. This configuration option is used to specify a custom endpoint URL for retrieving instance metadata. By default, Parseable uses the standard AWS metadata endpoint.

The default to the IPv4 endpoint: http://169.254.169.254 The default to the IPv6 endpoint: http://fd00:ec2::254

This configuration is particularly useful when working with a custom setup or infrastructure where metadata needs to be accessed differently than AWS’s default setup.

Container Credentials Relative URI: Add the optional environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI. If you plan to run the Parseable server on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), you can set the container credentials relative URI from the environment variable which is used by AWS to grant temporary, scoped credentials to applications running within a container (like ECS tasks) without requiring hardcoded AWS credentials. The ContainerCredentialsRelativeUri key is used to set the path for obtaining AWS credentials when running in a containerized environment.

Advanced Configurations for AWS S3

Parseable provides several advanced configuration options for AWS S3, which are especially useful for development or custom storage setups:

Allow HTTP: This allows users to use HTTP or HTTPS protocol, this can be useful in development or testing environments where SSL is not configured or for local testing with S3-compatible storage services, like MinIO, that might run on HTTP.
Connect Timeout: Parseable sets the connect timeout of 5 secs which means if the connection from Parseable to your S3 bucket is not successful within 5 secs, the operation timed out and returns an error. This can be critical for ensuring that the server doesn’t hang indefinitely when attempting to connect to the S3 bucket.
Timeout: Parseable sets a maximum duration for any S3 operation to 300 secs after which the operation timed out and throws an error. This setting defines the total allowed time for the request, including connection establishment, data transfer and response handling.
Allow Invalid Certificates: Parseable uses this configuration option to bypass SSL/TLS certificate validation. It is useful in development environments, testing scenarios or when connecting to AWS servers with self-signed or untrusted certificates. You can set the environment variable P_S3_TLS_SKIP_VERIFY to true to enable this setting.
SSE-C Encryption Key: SSE-C allows you to provide your own encryption key for encrypting data instead of letting AWS manage encryption key. The encryption key must be a 256-bit key for AES-256 encryption.

To add SSE-C encryption key, add the environment variable P_S3_SSEC_ENCRYPTION_KEY before starting the server. The value should be in the format - SSE-C:AES256:<base64_encryption_key>. Note that SSE-C requires HTTPS and Amazon S3 or compatible service might reject any requests made over HTTP when using SSE-C.

Send Checksum Header: This config allows you to set the checksum algorithm SHA256 which has to be used for object integrity check during upload. Add the optional environment variable P_S3_CHECKSUM to true to use this setting. By default, the set checksum property is set to false.

You can find more details about the object integrity in below link - https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

Virtual Hosted Style Access: Add the optional environment variable P_S3_PATH_STYLE to false. By default this property is set to true. If set to false, virtual hosted style request is used i.e. the endpoint (P_S3_URL should have bucket name included) else, path style request is used.
Retry Config: Below retry configuration has been added in Parseable in the connection property to AWS S3 -
Max Retries = 5, this sets the maximum number of times to retry a request
Retry Timeout = 120 secs, this sets the maximum length of time from the initial request after which no further retries will be attempted and server throws an error, this also bounds the length of time a request’s credentials must remain valid.
Backoff Config -
- Initial Backoff = 100 ms, this sets the initial delay before the first retry is attempted
- Max Backoff = 15 secs, this sets the maximum length of time between the two retries
- Base = 2, this is a multiplier to use for the next backoff duration i.e. 2s, 4s, 8s etc.

Azure Blob Storage Configuration for Parseable

Parseable also supports multiple authentication options and key configuration parameters for Azure Blob Storage.

Mandatory Environment Variables for Azure Blob Storage

To establish a connection to Azure Blob Storage, you need to set the following mandatory environment variables:

P_AZR_URL: The endpoint for Azure Blob Storage, accessible in the Azure portal under Storage Account > Settings > Endpoints > Primary Endpoint > Blob Service.
P_AZR_ACCOUNT: The Azure Storage Account name as specified in the Azure portal.
P_AZR_CONTAINER: The container name where log data will be stored, as set in your Azure Storage Account.

Authentication Options for Azure Blob Storage

Azure Blob Storage offers both access key-based and Azure AD-based authentication:

Storage Account Access Key: Use P_AZR_ACCESS_KEY in environment variable for access key-based authentication, available under the Security + Networking section of your storage account in the Azure portal.
Client ID, Client Secret, and Tenant ID: For applications registered in Azure Active Directory, add P_AZR_CLIENT_ID, P_AZR_CLIENT_SECRET, and P_AZR_TENANT_ID in environment variables to enable client-secret authorization. Client ID (Application ID) is generated when you create an application in Azure AD. Client Secret is a secret string that the application uses to prove its identity when requesting a token. Also can be referred to as application password.

This can be added from the Manage -> Certificates & Secrets page of the registered app in Azure AD. Every Azure AD instance is identified by a unique identifier called the Tenant ID which is associated with an organization’s account. The Tenant ID can be retrieved from the Manage -> Properties -> Tenant ID section in the Azure AD.

Advanced Configuration for Azure Blob Storage

Parseable provides several advanced configuration options for Azure Blob Storage, which are especially useful for development or custom storage setups:

Allow HTTP: This allows users to use HTTP or HTTPS protocol, this can be useful in development or testing environments where SSL is not configured or for local testing that might run on HTTP.
Connect Timeout: Parseable sets the connect timeout of 5 secs which means if the connection from Parseable to your Azure Blob Storage is not successful within 5 secs, the operation timed out and returns an error. This can be critical for ensuring that the server doesn’t hang indefinitely when attempting to connect to the Azure Blob Storage.
Timeout: Parseable sets a maximum duration for any Blob Store operation to 300 secs after which the operation timed out and throws an error. This setting defines the total allowed time for the request, including connection establishment, data transfer and response handling.
Retry Config: Below retry configuration has been added in Parseable in the connection property to Azure Blob Storage -
Max Retries = 5, this sets the maximum number of times to retry a request
Retry Timeout = 120 secs, this sets the maximum length of time from the initial request after which no further retries will be attempted and server throws an error, this also bounds the length of time a request’s credentials must remain valid.
Backoff Config -
- Initial Backoff = 100 ms, this sets the initial delay before the first retry is attempted
- Max Backoff = 15 secs, this sets the maximum length of time between the two retries
- Base = 2, this is a multiplier to use for the next backoff duration i.e. 2s, 4s, 8s etc.

Conclusion

This guide covers the detailed configurations and options Parseable provides for connecting to AWS S3 and Azure Blob Storage. From authentication and endpoint settings to retry and backoff configurations, Parseable allows seamless integration with cloud providers, making it an optimal choice for scalable and durable log data management.

For more specific environment variable details, refer to Parseable documentation on AWS S3 Configuration and Azure Blob Storage Configuration.

Parseable Release v1.6.0

November 4, 2024 · 4 min read

Nitish Tiwari

Founder

Parseable v1.6.0 is out now. Checkout the new features, improvements and bug fixes in Parseable since the last release.

Features

Azure Blob Storage support

In our quest to be truly multi-cloud, we have now added support for Azure Blob Storage. You can now deploy Parseable on Azure and use Azure Blob Storage as the backend data storage. This is available in the latest release v1.6.0.

With this release, Parseable supports AWS S3, other S3 compatible stores, and Azure Blob Storage - this gives you unparalleled flexibility to choose the cloud provider of your choice.

Shareable URLs for dashboards and explore pages

This was one of most commonly requested features. You can now share your dashboards and explore pages with a simple URL. To share a dashboard or explore page, click on the share icon in the top right corner of the page. This will generate a URL that you can share with your team members.

Note that you can share the URL with anyone who has access to your Parseable instance. This means they will need to login to their Parseable account to view the shared dashboard or explore page.

Click on the share button in the top right corner of the dashboard to generate a shareable URL.

Click on the share button in the top right corner of the explore page to generate a shareable URL.

Generate schema for a log event

One of the common challenges with dynamic schema is that users loose fine grained control over the field types. For example, they can't enforce a field to be a string or a number.

The alternative is to create streams with static schema, but that requires users to define the schema upfront. This can be cumbersome and error prone.

With this release, we have added a way to generate schema for a log event (when creating a static schema stream). To use this feature

Click on the Create Stream button on landing page.
Select Static Schema as the schema type.
Toggle the Auto Detect Fields and Datatype button to ON.
You can now either import a json file or paste a valid json or json array, with sample log events.
Click submit to generate the schema. This will be populated in the schema editor for you to review and edit.
Review the schema and click on Create Stream to create the stream.

schema

Enhancements

Retry object storage calls

Parseable server now ensures any failed object storage API calls will be retried. This is a best effort retry mechanism and will help in cases where the object storage service is temporarily unavailable.

Sync files on Sigterm

With this release, Parseable server will sync all the files to backing storage when it receives a SIGTERM signal. This ensures that all the data is safely stored before the server shuts down.

Detect timestamp in log events

For dynamic schema streams, the server incorrectly detected timestamp fields as string fields. This is now fixed. The server will now correctly detect timestamp fields and store them as timestamp fields. This helps better utilize the timestamp fields in queries and visualizations.

Date range picker improvements and Timezone support

We have made several improvements to the date range picker in the explore page. You can now select a date range with a single click. We have also added timezone support to the identified time columns. This will help you select the correct date range based on your timezone.

Breaking Changes

The new helm chart version 1.6.0 is not backward compatible with the previous versions. Specifically, we've removed the field .Values.local. We've added a new field called .Values.store, which is a required field. Possible values for this field are local-store, s3-store or blob-store.

Please update your helm chart values accordingly.

Parseable Release v1.5.4

October 7, 2024 · 2 min read

Nitish Tiwari

Founder

Parseable v1.5.4 is out now. Checkout the new features, improvements and bug fixes in Parseable since the last release.

dashboards

Features

Custom Dashboards

Search and visualization are key aspects of making sense of ad hoc log events. Search helps you when you know what you're looking for, while visualization helps in cases where you don't know yet. We released the first version of dashboards as a part of version v1.5.0. We made several improvements based on user feedback in last few releases.

Now, with v1.5.4, our fully customizable dashboards are out of beta! Tailor your visualizations with tiles created by SQL queries. Data fetched by a SQL query will be transformed to supply to visualizations, including charts, graphs, and tables.

With this release, you can:

Choose from a selection of visualization options, including colors, sizes, orientations, and custom tick formats.
Export your dashboard views in PNG, CSV, or JSON formats.
Organize tiles with an adjustable, draggable layout to create dashboards that suit your workflow.
Build and share unlimited dashboards, and import tile configurations with just one click.
Access pre-built dashboard templates to get started quickly.

expot

Explore page improvements

The explore page includes an expandable sidebar for streamlined data exploration. This enhancement lets you:

Quickly view columns and schema for each stream.
Pin, drag, and arrange columns with full control over column width, word wrapping, and visibility settings to suit your preferences.

Performance improvements

Version v1.5.4 includes performance improvements in query API specifically. You can expect faster query execution times and improved response times for queries. In our internal benchmarks based on ClickBench, we observed a 30% improvement in query execution times.

Bug Fixes

Role-Based Access Control (RBAC)

We’ve fixed bugs in RBAC, ensuring reliable and secure experience. Console now only exposes an option only if the logged in user has permissions to call the corresponding API. So, if you don't see an option that you expected to see, please check Review detailed API access control documentation here.

We fixed an security issue where redirect URI and base URI should be the same for OAuth login. This ensures that the user is redirected to the correct page after login.

Parseable Release v1.4.0

August 13, 2024 · 5 min read

Nitish Tiwari

Founder

Parseable v1.4.0 is out now! We've added added features and enhancements based on inputs from the community.

This release puts on spotlight on the new SSD / NVMe based hot tier for faster data retrieval and reduced S3 costs, a versatile JSON view from on the explore page, saved filters in pb for streamlined workflows and much more. Let's dive into the details.

New features

Hot Tier in distributed Parseable cluster

Parseable now allows tiering of data, this means you can store data locally on the query node, for faster retrieval and reduced S3 costs. We recommend using dedicated SSD on the query nodes to fully leverage the performance. This feature is especially useful for real-time data analysis and monitoring, in cases where most of the queries are targeted to local Refer the documentation for more details.

JSON View from the Explore Page

Several of our users love the table view for even the unstructured log events, but we also got several requests for a raw view of the data. We have now added a JSON view to the Explore page, with the raw JSON data for each log event.

The JSON view also allows you to run a text search or jq queries on a given query result. This offers a more flexible way to explore the data, especially for the unstructured logs, exactly the way you want.

JSON View

Saved filters in pb

pb, the Parseable CLI client now supports saving, applying or deleting filters. We introduced the saved filter feature on the server in previous release, and it was an instant hit among the users.

The CLI users however felt left out. With this release, we have added the saved filter feature to pb as well. You can now save, apply or delete filters from the CLI itself. This feature is especially useful for the users who prefer to work from the terminal.

Enhancements

Partition management for streams

If you use streams with custom partitions, or have historical data with custom time partitions) at the time of stream creation. You can now manage the columns and partitions for the streams in the stream management page.

Filter Builder and SQL options are now merged into a single modal for easy switching between the two.

Delete offline ingestors

You can now delete all the entries for an offline ingestor from the cluster page. To remove a live ingestor, you need to stop the ingestor first and then delete it from the cluster page.

Copy capabilities for data

Copy column text directly from the explore page from both table and JSON view.

Honor new line characters in table view

The table view now respects new line characters in log event, displaying field data in multiple lines if applicable.

CORS configuration

The server now supports configurable Cross-Origin Resource Sharing (CORS) through the environment variable P_CORS. By default, CORS is disabled, but users can enable it by setting P_CORS=false. Refer the documentation for more details.

Security

One of our dependencies, the object_store crate, was detected with a security vulnerability. Link to the relevant CVE.

With the version v1.4.0, we have updated the object_store crate to the 0.10.2, which has the fix for the vulnerability. We recommend all users to upgrade to the latest version to avoid any security risks.

Load testing

Since the last release, we have made it a point to include ingestion (load testing) performance in the release notes. We have tested the ingestion performance for this release as well, and we are happy to report that the performance has improved by 10% compared to the last release.

Parseable setup

Parseable v1.4.0 was deployed on Kubernetes cluster in distributed mode. We set up 3 ingest nodes and 1 query node in the Parseable cluster. Each Parseable pod was allocated 2 vCPU and 4 GiB memory. We also ensured to deploy each pod on a separate node to avoid any resource contention.

Load generation

We use K6 on Kubernetes for load testing. K6 is a modern load testing tool that allows you to write test scripts in JavaScript. For simpler deployment and ease in scale out, we used the K6 Operator. Refer the steps we followed to set up K6 on Kubernetes in this blog post.

The load testing script is available in the Quest repository.

Results

Test Run 1: 1 Query, 3 Ingestor Nodes. 2 vCPU, 4 Gi Memory each node. 15 k6 clients to ingest data, Number of batches per http requests - 300, Run time: 10 mins

Test Run 1

Test Run 2: 1 Query, 3 Ingestor Nodes, 3 vCPU, 4 Gi Memory each node, 15 k6 clients to ingest data, Number of batches per http requests - 525, Run time: 10 mins

Test Run 2

Note: We're hard at work to run a better, standardized load test for Parseable. We will share the results in the upcoming release notes. Also, we'll add the query performance results in next releases for a better overview.

Five Drawbacks of CloudWatch - How to Switch to Parseable

July 25, 2024 · 5 min read

Shivam Soni

Guest Author

AWS CloudWatch is a popular choice for log management and monitoring particularly for those deeply integrated into the AWS ecosystem. However despite its widespread use several drawbacks make it less appealing for specific applications especially those requiring flexibility cost-efficiency and high customizability.

In this article we'll consider when to use AWS CloudWatch versus Parseable and explain how to make the switch to Parseable.

Optimize Data Transfer from Parseable with Apache Arrow Flight

July 17, 2024 · 6 min read

Nikhil Sinha

Head of Engineering @ Parseable

Written in Rust, Parseable leverages Apache Arrow and Parquet as its underlying data structures, offering high throughput and low latency without the overhead of traditional indexing methods. This makes it an ideal solution for environments that require efficient log management, whether deployed on public or private clouds, containers, VMs, or bare metal environments. This guide will delve into the integration of Arrow Flight with Parseable, providing a comprehensive setup for your client.

How to monitor your Parseable metadata in a Grafana dashbaord

July 10, 2024 · 5 min read

Nitish Tiwari

Founder

Shantanu Vishwanadha

Guest Author

As Parseable deployments in the wild are handling larger and larger volumes of logs, we needed a way to enable users to monitor their Parseable instances.

Typically this would mean setting up Prometheus to capture Parseable ingest and query node metrics and visualize those metrics on a Grafana dashboard. We added Prometheus metrics support in Parseable to enable this use case.

But we wanted a simpler, self-contained approach that allows users to monitor their Parseable instances without needing to set up Prometheus.

This led us to figuring out a way to store Parseable server's internal metrics in a special log stream called pmeta. This stream keeps track of important information about all of the ingestors in the cluster. This includes information like the URL of the ingestor, Commit id of that ingestor, number of events processed by the ingestor, and staging file location and size.

Load testing Parseable with K6

July 6, 2024 · 6 min read

Nitish Tiwari

Founder

Integrating K6 with Kubernetes allows developers to run load tests in a scalable and distributed manner. By deploying K6 in a Kubernetes cluster, you can use Kubernetes orchestration capabilities to manage and distribute the load testing across multiple nodes. This setup ensures you can simulate real-world traffic and usage patterns more accurately, providing deeper insights into your application's performance under stress.

Parseable Release v1.3.0

July 6, 2024 · 8 min read

Nitish Tiwari

Founder

Parseable v1.3.0 is out now! This release includes a good mix of new features, improvements, and bug fixes. In this post, we'll take a a detailed look at what’s new in this release.

We'll love to hear your feedback on this release. Please feel free to create an issue on our GitHub or join our Slack community.

New Features

Saved filters and queries in the explore logs view

A long-pending request from our community has been the ability to save a filter in order to return at a later date to a specific view without having to re-apply the filter from scratch. The same goes for queries.

We initially considered implementing this as a purely client-side feature, i.e. on the Console only, to deliver it more quickly. The idea was to use the browser's local store to keep a saved filter’s details and then load it from there on demand. But this approach would have been too limiting; for instance, the same user would not have been able to see their saved filters when logging in from a different browser or IP address. Also, sharing filters across users would not work and any browser event that cleared local storage would essentially mean the loss of all the saved filters, many of which are carefully created after months of analysis.

Build a robust logging system with Temporal and Parseable

July 5, 2024 · 6 min read

Shantanu Vishwanadha

Guest Author

Temporal is a leading workflow orchestration platform. Temporal offers a robust, durable execution platform that offers guarantees on workflow execution, state management, and error handling.

One of the key aspect for a production grade application is the ability to reliably log and monitor the workflow execution. The log and event data can be used for debugging, auditing, custom behavior analysis, and more. Temporal applications are no different. Temporal provides logging capabilities, but it can be challenging to manage and analyze logs at scale.

In this post, we'll see how to extend the default Temporal logging to ship logs to a Parseable instance. By integrating Parseable with Temporal you can:

Ingest workflows logs to create a centralized data repository.
Co-relate events, errors, and activities across your workflows.
Analyze and query logs in Parseable for debugging and monitoring.
Setup reliable audit trails for your workflows using Parseable.
Setup alerts and notifications in Parseable for critical events in your workflows.

and much more.

AWS S3 Configuration for Parseable​

Mandatory Environment Variables for AWS S3​

Authentication Options for AWS S3​

Advanced Configurations for AWS S3​

Azure Blob Storage Configuration for Parseable​

Mandatory Environment Variables for Azure Blob Storage​

Authentication Options for Azure Blob Storage​

Advanced Configuration for Azure Blob Storage​

Conclusion​

Features​

Azure Blob Storage support​

Shareable URLs for dashboards and explore pages​

Generate schema for a log event​

Enhancements​

Retry object storage calls​

Sync files on Sigterm​

Detect timestamp in log events​

Date range picker improvements and Timezone support​

Breaking Changes​

Features​

Custom Dashboards​

Explore page improvements​

Performance improvements​

Bug Fixes​

Role-Based Access Control (RBAC)​

OAuth login issue​

New features​

Hot Tier in distributed Parseable cluster​

JSON View from the Explore Page​

Saved filters in pb​

Enhancements​

Partition management for streams​

Unified filter and SQL modal​

Delete offline ingestors​

Copy capabilities for data​

Honor new line characters in table view​

CORS configuration​

Security​

Load testing​

Parseable setup​

Load generation​

Results​

New Features​

Saved filters and queries in the explore logs view​

Get Updates from Parseable

AWS S3 Configuration for Parseable

Mandatory Environment Variables for AWS S3

Authentication Options for AWS S3

Advanced Configurations for AWS S3

Azure Blob Storage Configuration for Parseable

Mandatory Environment Variables for Azure Blob Storage

Authentication Options for Azure Blob Storage

Advanced Configuration for Azure Blob Storage

Conclusion

Features

Azure Blob Storage support

Shareable URLs for dashboards and explore pages

Generate schema for a log event

Enhancements

Retry object storage calls

Sync files on Sigterm

Detect timestamp in log events

Date range picker improvements and Timezone support

Breaking Changes

Features

Custom Dashboards

Explore page improvements

Performance improvements

Bug Fixes

Role-Based Access Control (RBAC)

OAuth login issue

New features

Hot Tier in distributed Parseable cluster

JSON View from the Explore Page

Saved filters in pb

Enhancements

Partition management for streams

Unified filter and SQL modal

Delete offline ingestors

Copy capabilities for data

Honor new line characters in table view

CORS configuration

Security

Load testing

Parseable setup

Load generation

Results

New Features

Saved filters and queries in the explore logs view