By
Ashish Hanwadikar
By
Harold Lim
Published on
Jul 23, 2024
Welcome to our blog series on high cardinality data in observability! In the upcoming posts, we'll explore the details of handling and analyzing high cardinality data within observability. We'll explore its impact on monitoring, troubleshooting, and discuss effective strategies for managing it to leverage its full potential. Throughout this series, we'll cover a range of topics including:
Defining High Cardinality Data: What is high cardinality, and why has it gained importance recently?
Advantages and Challenges: The benefits and drawbacks of high cardinality data in observability.
Cardinality in Metrics: Common pitfalls in high cardinality metrics and practical examples to help you build the right strategy for managing them.
Cardinality in Events, Logs, and Traces: Real-world scenarios illustrating how cardinality impacts various types of observability data.
Data Analysis Techniques: Techniques for reducing and shaping cardinality to improve data analysis.
Efficient Storage Strategies: Indexing strategies for efficiently store high cardinality data.
To kick things off, our first post will tackle the basics: defining high cardinality data and exploring why it has risen to prominence in recent years. Stay tuned for insights that will enhance your understanding of this critical aspect of observability!
High Cardinality: What is It?
The dimensionality of the observability data is the number of different attributes that it has. A high-dimensionality dataset has many different attributes. It is usual to have hundreds of dimensions in the observability dataset to explore every possible facet of data.
The cardinality of a data attribute refers to the number of distinct values that it can have. High cardinality occurs when either individual attributes within a dataset have a large number of unique values, and/or when combinations of dimensions lead to many unique permutations.
For example, in a dataset tracking website visits, the combination of “email ID”, "user ID" and "page ID" can lead to thousands or even millions of unique combinations.
Consider a microservices setup where monitoring CPU usage involves attributes such as service name, namespace, and Kubernetes cluster name. This granularity can extend further to nodes within each cluster and pods within each node. For instance, across 100 services distributed among 10 Kubernetes clusters—each with 500 nodes, 3 namespaces per node, and 100 pods per node—the cardinality would reach 150 million unique combinations.
Diagram Illustrating How Monitoring Distributed Services Across Clusters, Nodes, Namespaces, and Pods Can Grow Exponentially
This level of detail is crucial for identifying issues within microservice architectures, allowing for precise analysis of performance issues at the individual service, cluster, node, and pod levels.
However, as demonstrated, cardinality can rapidly escalate in such complex environments. This presents challenges in data ingestion, storage, processing, and analysis. In this white paper, we explore the significance of high cardinality in observability, discussing its advantages and drawbacks. Additionally, we propose strategies for effectively managing these datasets to optimize the diagnosis costs in cloud-native environments.
Why Has Cardinality Emerged as a Critical Topic in Observability?
In cloud-native distributed systems, substantial amounts of fine-grained data is continuously generated. High-cardinality data plays a crucial role in observability and troubleshooting, and when combined with distributed tracing and service level objectives, it meets the key metrics defined in DORA for faster Mean Time to Recovery (MTTR) and Mean Time to Detection (MTTD). The significance of cardinality in observability has increased in recent years, largely due to the distinctive characteristics of microservices and serverless based cloud native applications and the associated opportunities and costs they entail.
Dynamic Microservices Environments
In dynamic microservices environments, the increase in cardinality is driven by the nature of cloud-native distributed systems and Kubernetes. As demonstrated in the introduction above, these systems host a multitude of resources such as servers, containers, and microservices, each of which can be characterized by diverse attributes or tags. The dimensionality and cartesian product of attributes of the observability dataset emitted from such microservices would add another angle to cardinality explosion. Managing and tracking each combination can rapidly strain the observability system, potentially causing scale issues and performance degradation.
The Shift from Monitoring to Observability
While earlier tools focused on monitoring, requiring prior knowledge of the signals to monitor (the "known unknowns"), observability is focused on discovering "unknown unknowns" in real-time. The detailed information available in high cardinality data provides a magnifying glass into unknown unknowns, making it possible to look at anomalies in dataset, outlier events and correlated dimension that can significantly influence the reliability, availability, serviceability (RAS) and performance of applications and infrastructure.
Cost and Scalability Considerations
To help improve RAS and analyze performance, organizations have embraced high cardinality signals. This demands observability systems to adopt new ways for collecting, storing, and analyzing this data to fully leverage its benefits. Storing and analyzing high-cardinality data can be resource-intensive and expensive. Therefore, understanding and managing cardinality has become an important topic to ensure scalability of observability systems while optimizing costs.
In the upcoming blog post, we'll explore the costs and benefits of high cardinality data and provide practical examples and strategies for effectively managing high cardinality across these various observability data streams.
Stay tuned!