Business Continuity and Disaster Recovery management
This section outlines Business Continuity Plans (BCP) and Disaster Recovery (DR) agreements and documentation available for GovWifi service. BCP and DR documentation is stored within team drive and is created to supports existing P1 procedures and comms.
Strategy for dealing with outages
- Rely on a resilient architecture
- Triage and fix the service if possible
- Migrate faulty services to a secondary region if the primary region is offline
- Complete environment rebuild in the new AWS account
Service ownership and decision-makers
The Product Manager decides when to invoke internal BCP procedures and migrate faulty service or recover environmnet in the new region. In his absence, Technical Lead, Infrastructure Lead and Technical Architect can make this decision.
Documentation
Business Continuity and Disaster Recovery document is a primary reference for:
- Existing support agreements and escalation paths for AWS services, GOV.UK Notify, Cyber team and GitHub.
- Communication templates pre-configured within Status.io and GOV.UK Notify to support region level outages
- Service dependencies along with possible scenarios, agreed timelines and steps required to bring service online and restore after a disaster.
Other essential documents to support BCP and DR plan are:
- GovWifi service playbook - common troubleshooting steps helpful during P1-P3 incidents
- GovWifi data recovery playbook - procedures required to recover GovWifi datasets from onsite and offsite backups