India processes over 12 billion UPI transactions per month, and integrating with this payment infrastructure is a non-negotiable requirement for any consumer-facing application in the market. However, building a payment system that handles high volume reliably is far more complex than wiring up a Razorpay SDK. In this post, we share the architectural patterns and hard-won lessons from building payment systems that process over two crore rupees in monthly GMV.
The first challenge is webhook reliability. Payment confirmations arrive asynchronously, and Razorpay webhooks can occasionally be delayed or delivered out of order. We implement an idempotency layer using a combination of transaction IDs and state machines. Every payment goes through a defined lifecycle — initiated, authorized, captured, settled — and each transition is validated before being applied. This prevents double-charges, missed confirmations, and the dreaded ghost transactions that erode user trust.
UPI-specific edge cases deserve special attention. Auto-debit mandates, collect requests, and intent flows each have unique failure modes. We have encountered scenarios where a UPI payment is debited from the user's bank but the PSP callback never arrives. For these cases, we run a reconciliation job every 15 minutes that cross-references our pending transactions against the Razorpay API, automatically resolving discrepancies and flagging genuine failures for manual review.
Scaling the system requires careful attention to database design. We partition transaction tables by date and use read replicas for analytics queries that would otherwise impact write performance. Redis serves as our real-time cache for active sessions, while PostgreSQL handles the persistent state. This architecture has allowed us to scale from a few hundred daily transactions to tens of thousands without any meaningful increase in latency or infrastructure costs.