Previous incidents
API unavailable
Resolved Jan 23 at 11:13pm CET
We experienced an outage affecting api.botrains.io. The incident was caused by a regression in the latest Gunicorn release, which we upgraded to earlier today to benefit from the latest security improvements. The issue surfaced several hours after deployment (21:34) and resulted in a worker startup failure that caused the service to stop accepting traffic.
During the incident, the chatbot was temporarily hidden, and API requests were unavailable. Salesforce case processing continued to opera...
Topic Migration
Resolved Jan 21 at 03:17pm CET
We've updated how we handle topics internally to improve metrics speed and enable continuous topic assignment. This required a brief maintenance window of downtime; over the next few hours, topic metrics may be incomplete as data migrates to the new format.
Database Overload
Resolved Jan 19 at 04:53pm CET
Starting this morning we had occasional downtimes in the platform due to expensive analytics queries. End-users using us via the chat bubble were mostly not seeing errors.
We're intermittedly experiencing downtime through high read load on the database.
We are actively investigating offending SQL queries.
Apperently the provisioned IOPS are not sufficient for increased load. We've also identified costly queries. Remediation pending.