What Caused The Kuda App Downtime On January 31 And How We Fixed It
A database backup triggered an app downtime on January 31. Here’s how we fixed it and what we’ve done to prevent a recurrence.
From the morning of Tuesday, January 31 to the evening of Wednesday, February 1, we had an app downtime that we didn’t anticipate. Important app services including transfers, bill payments and transaction notifications were unavailable to a large number of our customers. We know that this made things difficult for many people and we understand how frustrating it is to not be able to use your money when you need to.
In case you missed the messages we sent on the app and posted on social media on Thursday morning, we’ve fixed the downtime and all services have been working fine since then.
As much as we’d like to carry on with business as usual, you deserve a full explanation for the service outage and a clear picture of what we’ve done to make sure that such a thing doesn’t happen again. Here’s everything you need to know, in the simplest terms possible.
What caused the downtime?
Banks and fintechs like Kuda rely heavily on database platforms that are hosted by third party service providers. We use these platforms to store customer data securely and manage most of our processes, and our service provider is one of the largest in the world.
On Tuesday morning, our database service provider started a database backup outside our standard backup window. While it’s true that database backups are routine operations and necessary, this particular one didn’t go smoothly. It limited our database’s capacity to handle app services and eventually triggered a downtime that caused most of those services to fail, beginning with card transactions.
Even though the database backup was unscheduled and our service provider should have let us know before starting it, backups are important for keeping our services running at all times. Let’s explain this a bit:
During a backup, we duplicate our database and store it in case we need it for any recovery process. Having a copy prevents the loss of our customers’ vital information and helps us recover in the event of a data glitch.
We use three different kinds of operational database backups at Kuda — full backups that create a copy of the entire data set, differential (incremental) backups that copy data added after the last full backup we’ve done, and transaction log backups that create a copy of our customers’ transactions and app actions.
During a backup process, the database infrastructure can be stretched to its capacity which is why we plan properly for each one and do it when a minimal number of customers are using their Kuda app.
How did we fix it?
We reached out to our service provider immediately and, together, we got to work on managing the backup to reduce its impact on app services.
As a part of the management process, our service provider upscaled our database’s capacity so that services could run smoothly but this had the opposite effect — it knocked our database offline, which meant that even the most basic app processes like sign-ins and account balance updates were either slow or unavailable for most customers.
When our database came back online, it was in recovery mode and its performance was still not at its best. Some customers could access their accounts and use some app services but most customers couldn’t.
At around 11:00 pm on Wednesday, we were able to restore app services for more customers but we still weren’t satisfied with the quality of those services — they were slow and failed intermittently.
From Wednesday night till Thursday morning, we continued working to restore and improve app services for all customers and speed up transactions while monitoring the database until we were confident enough to announce that we had fixed the downtime.
Throughout the downtime, everyone’s money was safe in their accounts and there was no data breach. We understand that seeing an account balance of ₦0.00 during the downtime was very scary, but that only happened because the service that loads account balances failed so your Kuda app defaulted to showing zero — not your true balance.
What have we done to make sure it doesn’t happen again?
1. We’ve asked our provider to increase the size and expertise of the technical support team assigned to us right away so that if there’s any issue with their service in the future, they’ll sort it out in the shortest time possible.
2. We’ve improved communications between our provider and our internal technical team to make sure that we get clear information well ahead of any technical updates or maintenance on our provider’s end.
3. We’re exploring the option of adding another world-class service provider to make sure that we can always provide stable services.
4. Even though this downtime wasn’t caused by an internal fault, we’ve expanded the capacity of our internal backup systems to handle our growing needs.
We’re confident that with these steps we’ve taken, we’ve minimised the chance of this ever happening but if there’s ever a downtime or glitch, we’ve built a status page that’ll let you know what’s happening and what we’re doing to fix it.
You can subscribe to get updates on the status page or you can check the page any time you need up-to-date information about the condition of the services on your Kuda app. We promise not to keep you in the dark, even if that means telling you an inconvenient truth.
Thank you for your patience and once again, we’re sorry for the stress that the downtime may have caused you.
The Kuda Team