We had a series of outages yesterday. The first one started at 8:29 AM EDT and lasted for 4 hours. It was caused by the partial power outage that shut down our database server. By 12:25 PM EDT the power was restored and Toggl was operational again.
The unexpected shutdown caused problems with our application servers which started to run slower over time. Servers crashed completely on 7:24 PM EDT. We got the system back to work by 12:14 AM EDT.
After that we have had several short periods of downtime, mainly because the system was not stable after the crashes.
The whole happening was the worst outage in Toggl’s history.
First, we are very sorry for all the trouble this has caused! We are analyzing the case very closely, it is our first priority to be better prepared for possible power/hardware/network outages in the future.
There are 3 things that we will improve:
1. Redundancy. Right now we had a system of several application servers and single database server. DB was obviously a single failure point. Regular backups at every 12 hour are stored offsite, but in case of DB power outage, the whole system was not available.
2. Offline support for Toggl. Toggl Desktop and mobile apps support offline work so that Toggl’s server is not required for time-tracking. We will improve this support over time and will make it also available for Toggl website itself.
3. Communication. We will improve our crisis communications, so that you will be informed consistently in any case of problems that may arise. This also includes our internal communication, e.g. automated notifications in the case of system slowdown.
Once again, we apologize for the outage. Your patience is much appreciated and we promise to deliver the service better and better over time.
UPDATE: Our service provider announced they are doing further repairs which will cause some brief downtime, sometime between 8/09/2011 23:59hrs until 8/10/2011 0600 CST. The maintenance window is for worst case, all thing going to plan the downtime should be significantly less. We will announce when the site is back up.
UPDATE 2: Power is restored and site is back up. We are running checks to see if everything is working correctly.