Supercharged Performance at Indmoney: How We Slashed Latency and Errors

Sunny Shah Image

Sunny Shah

Last updated:
4 min read
INDmoney Slashed Latency & Errors with AWS Global Accelerator
Table Of Contents
  • Introduction
  • Our Previous Setup and Its Challenges
  • The Solution: A High-Performance Setup with AWS Global Accelerator
  • Unmistakable Advantages & Impressive Results
  • Addressing Common Questions
  • Conclusion

Introduction

In the fast-paced world of financial applications, speed and reliability are critical. At Indmoney, we recently overhauled our network setup to profoundly enhance our app’s performance. The results are in: a ~12% improvement in app-to-backend latenciesP99 latencies cut in half, and an over 50% reduction in DNS resolution errors. This blog post explores how we transitioned from a Cloudflare-based setup to a high-performance solution with AWS Global Accelerator, the challenges we faced, the solutions we implemented, and the impressive results we achieved.

Our Previous Setup and Its Challenges

Our initial configuration relied on Cloudflare to connect our app to the backend. Here’s how it worked:

  • App to Cloudflare: The app connected to Cloudflare over the public internet.
  • Cloudflare’s Role: Cloudflare, with edge servers in major countries, acted as a reverse proxy. Requests from the app were signed with mTLS certificates, prompting Cloudflare’s Web Application Firewall (WAF) to apply minimal checks before forwarding them to our AWS Application Load Balancer (ALB).
  • DNS Management: Cloudflare also handled DNS with a short 30-second caching time (TTL).

While this setup provided some security and routing benefits, it had significant drawbacks:

  • High Latency: The Dual Internet Hop Dilemma: Multiple hops over the public internet—from the app to Cloudflare and then from Cloudflare to AWS—resulted in P99 latencies often exceeding one second. This was particularly challenging in India, where public internet reliability can be inconsistent.
  • DNS Issues: Frequent Resolutions and Errors: A short 30-second DNS cache time led to frequent DNS resolutions. This was exacerbated by unreliable ISP-provided DNS servers, which often prioritize data collection over performance. Crucially, because Cloudflare did not provide us with public static IPs, we couldn't confidently keep the cache higher than 30 seconds.

These issues impacted our power users—those spending hours on the app daily—resulting in a suboptimal experience marked by noticeable delays and errors.

The Solution: A High-Performance Setup with AWS Global Accelerator

To tackle these challenges, we transitioned to AWS Global Accelerator for our app-to-backend API connections, and it has been a game-changer. Here's what the new setup looks like:

  • Anycast Networking with Static IPs: We now have two static and dedicated anycast IPs provided by AWS Global Accelerator. This crucial change allows us to set our DNS TTL to 24 hours (1 day), dramatically reducing DNS resolution errors for our users. AWS Global Accelerator leverages anycast to direct traffic to the nearest AWS edge location globally.
  • Private Network Advantage: From the nearest edge location, traffic travels to our AWS Load Balancer via AWS’s high-speed, reliable private network, completely bypassing the public internet for this crucial segment. For example, when a user from a city like Patna makes a request, it's routed to the nearest AWS edge location (like Kolkata) and then travels to our Mumbai data center over AWS's high-speed, reliable private network line. This has significantly reduced latency and improved stability.
  • mTLS on Application Load Balancer: We implemented mTLS directly on our Application Load Balancer. This ensures that only signed traffic from our mobile apps reaches the backend, and all other traffic is securely dropped, maintaining robust protection.

Unmistakable Advantages & Impressive Results

This strategic shift brought immediate and significant benefits:

  • Quick DNS Resolution: With a 24-hour DNS TTL and static IPs, most users now resolve DNS just once daily. For our power users, this minimizes disruptions, as DNS errors are far less likely during their extended sessions, ensuring greater stability.
  • Enhanced Latency and Reliability: By leveraging anycast and AWS’s private network, we've dramatically reduced latency and significantly improved reliability compared to unpredictable public internet routing.

The impact was undeniable:

  • ~12% reduction in average app-to-backend latencies.
  • P99 latencies halved, dropping from over one second to far more manageable levels.
  • Over 50% reduction in 5XX errors, greatly enhancing app stability and user trust.

These improvements have profoundly elevated the user experience, particularly for latency-sensitive financial transactions and real-time data access.

Addressing Common Questions

We shared this journey on LinkedIn, sparking insightful questions from our community. Here are some highlights:

  • Did mTLS on the ALB pose challenges for mobile app compatibility or certificate management?

    No issues so far—mTLS implementation was seamless and scales effortlessly, working exactly as expected.

  • What’s the cost impact of switching to AWS managed services? Does it compromise web security without Cloudflare’s WAF?

    While AWS Global Accelerator involves costs, the performance gains and enhanced reliability justified the investment. It's crucial to understand we still use Cloudflare for website traffic and CDN needs (e.g., image resizing, caching), where its strengths truly shine. For backend APIs, robust security checks (e.g., SQL injection, XSS) are handled at our API gateway level, maintaining comprehensive protection. This highlights that different problems require different specialized solutions.

  • Why were DNS issues so prevalent in the old setup?

    Frequent DNS resolutions (every 30 seconds) exacerbated issues with unreliable ISP DNS servers. These ISPs often inject their own DNS to collect user data, leading to instability. By implementing static IPs and a 24-hour TTL, we drastically cut down on these errors by minimizing lookup frequency.

  • What’s the performance boost for latency-sensitive use cases like algo trading?

    Benchmarking and A/B testing definitively showed P99 latencies halved compared to Cloudflare. This is a game-changer for time-critical applications. We strongly recommend conducting your own benchmarking in your specific context.

  • How does AWS Global Accelerator compare to Cloudflare’s static IPs or advanced routing options like Argo?

    While Cloudflare offers static IPs (often on enterprise plans) and features like Argo routing to improve performance, it fundamentally still relies on public internet connections to reach your origin data center. AWS Global Accelerator's core advantage is its private network link to AWS origins. This dedicated private connection provides superior, consistent latency and reliability, especially for dynamic, latency-sensitive APIs, which Cloudflare cannot match for traffic destined for AWS. Global Accelerator also provides dedicated static IPs at a negligible cost.

  • How is DNS handled now?

    We continue to use Cloudflare DNS to resolve Global Accelerator’s static IPs, effectively blending Cloudflare’s robust DNS strengths with AWS’s unparalleled routing efficiency.

Conclusion

By strategically adopting AWS Global Accelerator and anycast networking, Indmoney has transformed our app’s performance, slashing latencies and errors while boosting reliability. Cloudflare remains an invaluable tool for CDN and website use cases, but for our critical, latency-sensitive backend APIs, AWS Global Accelerator proved to be the ideal solution. If your application struggles with similar network performance issues, this approach could be worth exploring—because in the world of fintech, every millisecond truly matters

Share: