OneSchema Web Server 502s
Incident Report for OneSchema
Postmortem

Impact

  • Starting at 11:06 AM, OneSchema experienced an outage across US, EU, and CA regions after an update to our internal queuing service caused the web service app.oneschema.co to go down in those regions. A build that had an error only got partially caught by CI, leading to some regions being impacted. At 11:21 AM, the rollback to the queuing service was deployed and the incident was resolved. Users opening embeds in those regions did not open during the outage and return 502 responses and users visiting the admin dashboard during the outage would also see a 502 error page.

Path forward

  • We have prioritized updates to our CI system to prevent any builds with end to end test failures from reaching any of our production regions.
  • We will be investigating making our CI system atomic, immediately rolling back all regions if one region experiences a failure as part of the deploy process.

We apologize for the downtime caused by this outage. Please reach out to your dedicated OneSchema Support or your dedicated Account Manager if you have any specific questions about the impact of this outage on your system.

Posted Jun 15, 2023 - 21:20 UTC

Resolved
Roll out complete -- Systems functioning
Posted Jun 14, 2023 - 18:24 UTC
Identified
The issue has been identified and a fix is rolling out.
Posted Jun 14, 2023 - 18:22 UTC
Investigating
We are currently investigating this issue.
Posted Jun 14, 2023 - 18:07 UTC
This incident affected: Web Application, Embed Application, and API.