Eltropy Kept Observability Costs Steady Despite 100x More Data

Eltropy Kept Observability Costs Steady Despite 100x More Data

Eltropy Kept Observability Costs Steady Despite 100x More Data

High Cardinality Blog Series - Part 2

High Cardinality Blog Series - Part 2

Benefits and Costs of High Cardinality Data in Observability

Benefits and Costs of High Cardinality Data in Observability

By

Pankaj Thakkar

By

Ashish Hanwadikar

Published on

Jul 26, 2024

High cardinality data plays a pivotal role in enhancing observability by providing detailed insights into system performance and user behavior. Its ability to offer granularity across various attributes allows for thorough analysis, slicing, and dicing of data, enabling organizations to delve deep into their operational metrics and event logs. In this blog, we'll explore the multifaceted advantages of high cardinality data, along with the challenges it presents in terms of costs and resource management within observability systems. 

Benefits of High Cardinality

The benefits of high cardinality in observability are multifaceted and pivotal for gaining comprehensive insights into system performance and user behavior. Here are some key advantages:


  1. Granular Analysis

    High cardinality enables detailed granularity across multiple dimensions, supporting comprehensive analysis, segmentation and slicing and dicing of data based on various attributes. For example, in analyzing application performance, trace or span events capture detailed information about individual transactions or requests within a distributed system. High cardinality attributes such as service name, endpoint, user ID, and operation type provide fine-grained dimensions for analyzing the performance and behavior of the system, revealing the various factors influencing its operation. Capturing high cardinality attributes also uncovers unexpected issues (aka "unknown unknowns") during troubleshooting efforts, helping in the identification and resolution of complex problems.


  2. Precise Troubleshooting 

    With high cardinality data, it's easier to pinpoint the root cause of issues or anomalies. High cardinality enables correlation and contextual analysis by incorporating multiple dimensions into the investigation process. Contextual analysis involves examining events within their broader context to understand the sequence of events leading up to a particular outcome or behavior. 

    For example, consider this trace data:


    {traceID=”3ca7188ed2774f2e”, spanID=”7b3f080f9c16292d”, service=”order-service”, operation=”validateOrder”, duration=”50ms”, httpMethod=”POST”, url=”/order/submit”, statusCode=”200", customerID=”xyz789", paymentMode=”credit_card”}


    Using this data, when investigating a slowdown in duration, you can analyze the trace or span events by attributes such as service, operation, or customerID to identify the specific service, operation, or user responsible for the degradation. Additionally, you can correlate this span with others using the traceID to see how the request has moved through a distributed system and has faced various bottlenecks. 


  3. Predictability of Future Behavior 

    High cardinality data, with its diverse range of distinct values, enhances the precision of statistical and machine learning models, especially in recommendation systems and forecasting applications. By mitigating the risk of overgeneralization, it generates insights grounded in a richer and more diverse dataset. For example, high cardinality data regarding user interactions with a web application can lead to more accurate predictions of future system behavior and potential bottlenecks during certain timeframes. 

While high cardinality data presents numerous benefits, its predominant challenge revolves around elevated costs. If the observability system is not optimized effectively, expenses related to data ingestion, analysis, and storage can escalate significantly. These aspects will be examined in the next section.

Cost of High Cardinality 

High cardinality can lead to substantial costs, especially for metrics, because of the continuous emission or scraping of metric series at regular intervals, resulting in large data volumes. Even when metric values remain unchanged, sometimes it's essential to record this information because the absence of a series itself can convey important insights or signal an expected state.

The cost implications of high cardinality data need to be addressed across various aspects of observability systems:


  • Edge Costs

    High cardinality metrics increase CPU, memory, and resource usage in both the originating application and collection agents (e.g., Prometheus, Datadog). In edge applications, relying on events or logs rather than high cardinality metrics also requires more buffer space and resources.


  • Network Egress Costs

    High cardinality metrics lead to higher egress costs due to the data transfer across the network to observability platforms. These costs are particularly elevated when data is sent to SaaS observability vendors, as opposed to hosting observability in a private cloud environment and managing internal data transfers.


  • Data Ingestion Costs

    After the data reaches the Observability platform, high cardinality directly impacts the cost of ingestion. As the number of data points increases, so does the demand on CPU usage and memory during ingestion. Scaling up resources becomes essential to handle this increased workload efficiently.


  • Query Costs

    High cardinality can pose challenges during query time, particularly when performing data aggregates or transformations that process a large number of individual data points to compute the results.


  • Storage Costs

    Storage concerns arise because high cardinality increases the number of unique data points that need to be stored. Each unique value requires additional storage space, especially if the system needs to maintain indexes or other data structures for efficient querying.

The cost of high cardinality during data ingestion, processing, query execution, and storage requires more computational and storage resources. However, with proper strategies in place, organizations can manage these costs effectively.

We will discuss these strategies in our next blogs.

Observe. Analyze. Automate

Observe. Analyze. Automate

Observe. Analyze. Automate