If you use KeyChest - major incident made us realize how it grew


#1

This post is first and foremost a way of me saying sorry to all users of our KeyChest monitoring for a significant loss of data we experienced last Saturday. I say it as the CEO of Enigma Bridge, the company which develops KeyChest and runs its free cloud service.

We lost around 40% of all production data which we were unable to recover. This may impact all its users as even for user accounts we recovered, there may be domain names and “active domains” which may have been lost.

We decided to keep running this free service of KeyChest but we also reviewed how we operate it and got really serious about it. We want to make it a reliable, service. We already implemented several measures and we hope to turn it into a high-availability system very soon.

We want to be as transparent as possible, so our new status page at https://keychest.status.io will show planned maintenance but also every single remote (ssh) access to the server(s). These are now logged automatically.

I have described the details of the incident and what we changed on my blog:

Let me just once again say how sorry we are for this. While it’s a free service, without any guarantee, I personally take it as a serious blunder that should have never happened.


#2

So, a dev of yours dropped some user data, right?
Nowadays, when I read major incident I assume data loss in form of a hack and serious trouble for me.
But your post mortem only sounds like, serious trouble for you.

Hope you (and your devs) learned something…


#3

That’s an accurate summary.

and a trouble for guys who depend on it. Plus we have large users with 1,000+ servers in.

I guess it all depends on what is a bigger trouble - getting data stolen, or losing data/functionality you assume is there.


#4

Hi @DanCvrcek

Great summary in the blog and thanks for the transparency

You are not the first SaaS product to have an incident nor will you be the last.

Keep up the good work and the improvement plan looks good :smiley:

Andrei


#5

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.