Last9 named a Gartner Cool Vendor in AI for SRE Observability for 2025! Read more →
Last9

Housing.com Replaces ELK Stack with Last9, Eliminating Infra Overhead While Cutting Costs

Download PDF
  • Real Estate Technology
  • 200 engineers
  • APAC
  • Amazon Web Services

Housing.com operates one of India’s leading real estate platforms, serving millions of users with property listings, virtual tours, and transaction services. With 200 engineers managing a complex microservices architecture on AWS, maintaining system reliability and debugging production issues quickly is critical to user experience.

The platform processes 2TB of daily logs across 20+ technology pods, each with different log formats. This diversity, combined with unpredictable traffic spikes from sources like Google’s web crawler, created significant observability challenges that consumed valuable engineering resources.

Growing Pains

High Operational Overhead

Self-managed ELK cluster with 8 data nodes, 30 log fleet servers, and Kafka brokers required one engineer monthly for maintenance

Fragile Log Pipeline

Grok filter failures when developers changed log formats led to missing logs and pipeline breakages

Inadequate Retention

3-4 day retention meant logs were gone by the time teams completed immediate fixes and could perform detailed RCA

Scaling Challenges

Infrastructure couldn’t handle unpredictable traffic spikes, causing log lag and missing data during critical periods

Housing’s self-managed Elasticsearch cluster struggled with the complexity of standardizing logs across multiple business units and technology stacks. The team spent significant effort managing Grok filters at the Logstash level to parse raw application logs from diverse sources.

When developers changed log formats — adding new fields or modifying existing ones — the pipeline would break unless the team manually updated Grok patterns. This created a constant firefighting cycle. Over four years, Housing revamped their internal stack 2-3 times, each time encountering new scaling and reliability challenges.

Last9 allowed us to offload the operational overhead of scaling so we could focus on business alerting rather than infrastructure management. Earlier, one manpower monthly was dedicatedly working on our ELK setup, and my team was quite occupied with firefighting the self-managed stack.

Upendra Singh

Associate Director DevOps, Housing.com

The 3-4 day retention window created a particularly painful problem: teams would fix production issues first (the priority), but by the time engineers were freed up to conduct thorough root cause analysis, the relevant logs were already deleted. This incomplete evidence trail reduced RCA completion rates and prevented the team from identifying underlying systemic issues.

The Last9 Advantage

Managed Infrastructure

Eliminated dedicated engineering time for ELK maintenance and scaling

Dynamic Remapping

Regex-based log parsing with updates reflected in 10-15 minutes without service restarts

Extended Retention

14-day retention within budget enables complete RCA with historical correlation

Built-in Alerting

Replaced custom scripts with scheduled searches and Slack integration for anomaly detection

Housing evaluated multiple managed observability solutions but chose Last9 for its combination of cost-effectiveness, technical capabilities, and engineering-focused support. The support quality was a decisive factor — Housing needed a vendor that understood complex log aggregation requirements and could provide hands-on consultation rather than just documentation.

Last9’s remapping feature replaced Housing’s complex Grok filter management with a more maintainable approach. The team uses regex-based parsing with Last9’s testing tool to quickly validate patterns before deployment. If remapping fails, logs remain available in raw format rather than being lost entirely — a critical safety net that didn’t exist with their ELK setup.

The migration involved close collaboration between Last9’s success engineering team and Housing’s DevOps engineers. Multiple deep-dive sessions helped transition Housing’s team from four years of ELK patterns to Last9’s different visualization and querying approach.

Industry’s Most Complex Remapping Rules

Housing has one of the most complex remapping rule sets. This complexity stems from managing multiple business units with different logging mechanisms and no standardization across teams.

The remapping configuration handles:

  • Service name standardization across inconsistent naming patterns
  • URL extraction for API performance tracking in their tech scorecard
  • Log routing between business units (Housing vs PropTiger) using field attributes
  • Dynamic URL pattern aggregation using regex transformation to normalize URLs containing UUIDs and project IDs

For their Databricks integration powering tech scorecard analytics, Housing needed aggregated 5xx error counts and URL analysis. The team migrated from fetching data from ELK to Last9, extracting fields like URL, URL_path, and app_name for service-to-service communication tracking.

Last9’s remapping handles log parsing dynamically — no service restarts when patterns change. With ELK, grok filter changes required pipeline restarts and risked losing logs entirely.

Ashish Mishra

DevOps Engineer, Housing.com

This standardization happens at ingestion without touching application code, and changes reflect within 10-15 minutes — compared to their ELK setup that required modifying Grok patterns and restarting services, often breaking the entire pipeline.

Built-in Alerting Replaces Custom Scripts

Housing previously ran auxiliary services that queried ELK data and sent alerts to stakeholders — additional infrastructure they had to maintain. Last9’s alerting capabilities eliminated this overhead entirely.

The team now uses:

  • Scheduled searches for log pattern monitoring
  • Service-level alerts with customizable thresholds
  • No-data scenario detection to catch missing log indexes
  • Slack integration for immediate team notification

All alert configuration happens within Last9 without external scripts or coordination with additional services.

Key Results

Zero Infrastructure Toil

Eliminated monthly dedicated engineering time previously spent maintaining ELK, Kafka, and Logstash

Higher RCA Completion Rate

14-day retention enables teams to perform detailed root cause analysis after resolving immediate production issues

Improved Reliability

Dynamic remapping prevents log loss from application format changes, with raw format fallback protection

Cost-Effective Scale

Managed solution fits within budget constraints while handling 2TB daily log volume with extended retention

Housing successfully deprecated their in-house ELK implementation, allowing their platform engineering team to focus on product development rather than observability infrastructure maintenance. The combination of extended retention, reliable ingestion, and built-in alerting has improved their ability to debug production issues and complete thorough post-mortems.

The team continues to work with Last9’s support on ongoing optimizations, including their Databricks integration for tech scorecard metrics.


Schedule a demo to understand how engineering teams at Housing, Replit, Brightcove, Clevertap, and more are choosing Last9 over their existing legacy observability tools to achieve a single pane of observability.