Tata 1mg, a leading digital healthcare platform, faced challenges with fragmented observability tools. By adopting Kloudfuse, they accelerated troubleshooting, reduced costs, and gained widespread adoption across their engineering and DevOps teams.
Cost Reduction Despite Doubling Data Volume
40%
Broad Adoption by 300+ Engineers and DevOps
300+
Faster Troubleshooting With Improved Incident Response
Tata 1mg is a leading digital healthcare platform in India, offering a comprehensive range of services including e-pharmacy, diagnostic, e-consultations, B2B solutions, and retail operations. Known for its engineering-driven approach, 1mg is committed to building robust systems that enhance the overall healthcare experience for consumers and businesses alike.
Pankaj Pandey, Director of Engineering at Tata 1mg, emphasized the importance of observability in enhancing system reliability and performance. His responsibilities spanned platform architecture, system design, devops, supply chain engineering, security, among others. Pankaj noted that the lack of a unified observability solution hindered their ability to troubleshoot issues efficiently and maintain system integrity.
Pankaj Pandey
Director of Engineering, Tata 1mg
Tata 1mg employed a custom-built observability stack that included:
Elastic APM: For application performance monitoring.
Fluentd and Kibana: For log aggregation and visualization.
AWS Cloudwatch: For monitoring infrastructure.
This setup proved to be complex and costly, with maintenance burdens that impacted the team’s efficiency.
Operating within a microservices architecture and a heterogeneous environment utilizing various programming languages and frameworks, Tata 1mg faced several challenges due to its complex and diverse technology stack. Key issues included:
Complex Monitoring Needs: The complexity of multiple microservices made it difficult to monitor system performance across different environments.
Lack of Unified Insights: Observability data was scattered across various open source tools, making it difficult to correlate events and gain deep insights into system performance.
High Maintenance Costs: The use of open-source observability tools like Elastic Kibana and Logstash resulted in high infrastructure and maintenance costs, despite the licensing being free. Keeping these tools up to date, managing upgrades, and ensuring compatibility among various frameworks and versions created a significant operational burden.
Limited Insights and Visibility: Existing observability tools lacked the granular insights needed for effective monitoring in Kubernetes, making it difficult to identify issues like pods reaching 100% CPU usage and causing request timeouts. Log analysis was also limited, hindering comprehensive understanding.
Tata 1mg turned to Kloudfuse to address these challenges. The implementation of Kloudfuse offered several key benefits:
Unified Observability: Kloudfuse provided a centralized platform that consolidated observability across metrics, events, logs, and traces, allowing over 300 Tata 1mg engineers and DevOps to monitor and troubleshoot all in one location.
Cost Efficiency: The implementation resulted in a remarkable 40% reduction in overall costs, even as the volume of monitored data doubled.
Enhanced Troubleshooting: With Kloudfuse, the engineering team established real-time alerts for various metrics, which greatly improved incident response times and reduced downtime. This increased responsiveness helped maintain a stable Mean Time to Recovery (MTTR) and allowed the team to identify issues more quickly, even with added complexity.
Standardized Logging and Traceability: Kloudfuse enabled Tata 1mg to establish a consistent log message format across all services, simplifying the management of diverse log formats in their expanding technology stack. More about this implementation here.
Deeper Insights: The new observability platform provided granular visibility into Kubernetes events and pod performance, empowering the team to proactively address performance bottlenecks.
Pankaj Pandey
Director of Engineering, Tata 1mg
Tata 1mg’s partnership with Kloudfuse marks a significant step forward in their observability journey. By transitioning to a unified platform, they have not only optimized their operational costs but also enhanced their ability to monitor, analyze, and respond to system performance issues in real-time. As Tata 1mg continues to scale its operations, the insights gained from Kloudfuse will be pivotal in maintaining a high-performance digital healthcare environment.