Case studies

Tata 1mg’s Transformative Observability Journey with Kloudfuse

Tata 1mg’s Transformative Observability Journey with Kloudfuse

Tata 1mg’s Transformative Observability Journey with Kloudfuse

Executive Summary

Executive Summary

Tata 1mg, a leading digital healthcare platform, faced challenges with fragmented observability tools. By adopting Kloudfuse, they accelerated troubleshooting, reduced costs, and gained widespread adoption across their engineering and DevOps teams.

Cost Reduction Despite Doubling Data Volume

40%

Broad Adoption by 300+ Engineers and DevOps

300+

Faster Troubleshooting With Improved Incident Response

About Tata 1mg

About Tata 1mg

Tata 1mg is a leading digital healthcare platform in India, offering a comprehensive range of services including e-pharmacy, diagnostic, e-consultations, B2B solutions, and retail operations. Known for its engineering-driven approach, 1mg is committed to building robust systems that enhance the overall healthcare experience for consumers and businesses alike.

Leadership Perspective

Leadership Perspective

Pankaj Pandey, Director of Engineering at Tata 1mg, emphasized the importance of observability in enhancing system reliability and performance. His responsibilities spanned platform architecture, system design, devops, supply chain engineering, security, among others. Pankaj noted that the lack of a unified observability solution hindered their ability to troubleshoot issues efficiently and maintain system integrity.

“Kloudfuse has transformed the way we monitor and manage our infrastructure and applications. It has not only enhanced our operational efficiency but also empowered our teams to proactively address issues before they impact our customers. Kloudfuse 3.0 introduces powerful new features that will undoubtedly be a game changer for our organization."

“Kloudfuse has transformed the way we monitor and manage our infrastructure and applications. It has not only enhanced our operational efficiency but also empowered our teams to proactively address issues before they impact our customers. Kloudfuse 3.0 introduces powerful new features that will undoubtedly be a game changer for our organization."

Pankaj Pandey

Director of Engineering, Tata 1mg

Observability Infrastructure Prior to Kloudfuse

Observability Infrastructure Prior to Kloudfuse

Tata 1mg employed a custom-built observability stack that included:

  • Elastic APM: For application performance monitoring.

  • Fluentd and Kibana: For log aggregation and visualization.

  • AWS Cloudwatch: For monitoring infrastructure.

This setup proved to be complex and costly, with maintenance burdens that impacted the team’s efficiency.

Challenges

Challenges

Operating within a microservices architecture and a heterogeneous environment utilizing various programming languages and frameworks, Tata 1mg faced several challenges due to its complex and diverse technology stack. Key issues included:

  1. Complex Monitoring Needs: The complexity of multiple microservices made it difficult to monitor system performance across different environments.

  2. Lack of Unified Insights: Observability data was scattered across various open source tools, making it difficult to correlate events and gain deep insights into system performance.

  3. High Maintenance Costs: The use of open-source observability tools like Elastic Kibana and Logstash resulted in high infrastructure and maintenance costs, despite the licensing being free. Keeping these tools up to date, managing upgrades, and ensuring compatibility among various frameworks and versions created a significant operational burden.

  4. Limited Insights and Visibility: Existing observability tools lacked the granular insights needed for effective monitoring in Kubernetes, making it difficult to identify issues like pods reaching 100% CPU usage and causing request timeouts. Log analysis was also limited, hindering comprehensive understanding.

Solution: Transition to Kloudfuse

Solution: Transition to Kloudfuse

Tata 1mg turned to Kloudfuse to address these challenges. The implementation of Kloudfuse offered several key benefits:

  1. Unified Observability: Kloudfuse provided a centralized platform that consolidated observability across metrics, events, logs, and traces, allowing over 300 Tata 1mg engineers and DevOps to monitor and troubleshoot all in one location.

  2. Cost Efficiency: The implementation resulted in a remarkable 40% reduction in overall costs, even as the volume of monitored data doubled.

  3. Enhanced Troubleshooting: With Kloudfuse, the engineering team established real-time alerts for various metrics, which greatly improved incident response times and reduced downtime. This increased responsiveness helped maintain a stable Mean Time to Recovery (MTTR) and allowed the team to identify issues more quickly, even with added complexity.

  4. Standardized Logging and Traceability: Kloudfuse enabled Tata 1mg to establish a consistent log message format across all services, simplifying the management of diverse log formats in their expanding technology stack. More about this implementation here.

  5. Deeper Insights: The new observability platform provided granular visibility into Kubernetes events and pod performance, empowering the team to proactively address performance bottlenecks.

"Embracing Kloudfuse’s unified observability has been transformative for us. It not only streamlined our monitoring processes but also empowered our team to gain deeper insights into system performance, ultimately enhancing our ability to respond to challenges in real-time."

"Embracing Kloudfuse’s unified observability has been transformative for us. It not only streamlined our monitoring processes but also empowered our team to gain deeper insights into system performance, ultimately enhancing our ability to respond to challenges in real-time."

Pankaj Pandey

Director of Engineering, Tata 1mg

Conclusion

Conclusion

Tata 1mg’s partnership with Kloudfuse marks a significant step forward in their observability journey. By transitioning to a unified platform, they have not only optimized their operational costs but also enhanced their ability to monitor, analyze, and respond to system performance issues in real-time. As Tata 1mg continues to scale its operations, the insights gained from Kloudfuse will be pivotal in maintaining a high-performance digital healthcare environment.

All Rights Reserved ® Kloudfuse 2024

Terms and Conditions

All Rights Reserved ® Kloudfuse 2024

Terms and Conditions

All Rights Reserved ® Kloudfuse 2024

Terms and Conditions

Kloudfuse 3.0 is here—10 Capabilities, Limitless Possibilities.

Learn More!

Kloudfuse 3.0 is here—10 Capabilities, Limitless Possibilities.

Learn More!

Kloudfuse 3.0 is here—10 Capabilities, Limitless Possibilities.

Learn More!