Monitoring
Grafana
Grafana is an open source analytics and monitoring platform. We use it for monitoring the health of GovWifi in realtime.
Every GovWifi environment has its own Grafana instance. Which runs on AWS EC2. This Google Document contains in-depth information on the technical setup (you must be a member of the Govwifi team to view this document).
You can access the dashboard using the links below (you must be logged into the VPN and signed into the dashboard to do this):
Where The Data Comes From / Grafana Data Sources
The data in our Grafana is primarily comes from Prometheus and Elasticsearch. Both of these services are hosted in AWS. Prometheus collects data relating to our Radius servers. For example how many authentication requests are happening over a given period. The fine grained data from Prometheus tends to be more useful to engineers, whilst the data from Elasticsearch provides a high level overview of how the system is being used. The Elasticsearch data is also used to generate monthly reports that are sent to GPA.
Elasticsearch
The admin and logging-api have functions that collect and push a range of metrics (like active users in a specific time period) to our Elasticsearch cluster in AWS. This data is pulled from our databases at various intervals (hourly, daily, monthly etc) and sent to Elasticserch via an ECS scheduled job. These scheduled jobs run rake tasks that push the data to Elasticsearch at specified intervals. This is an example of such a job in our terraform code.
The metrics are also backed up in an S3 bucket in each GovWifi environment. This is configured by the govwifi-dashboard module in terraform. The diagram below shows the resources that Elasticsearch interacts with. A scalable version is available in our team drive:
Using Grafana Data To Generate Montly Reports
The GovWifi team uses the metrics in Grafana to generate monthly reports. In depth instructions on generating the monthly reports can be found in this document (you will need to be a member of the GovWifi team to see it).
Hosted on GOV.UK PaaS (Platform as a Service)
Prior to November 2023 there was an additional Grafana which was hosted on GOV.UK PaaS. It was used for monitoring performance of GovWifi Product Pages and Tech Docs. This data is now collected by Google Analytics. The PaaS is scheduled to be decommissioned at the end of December 2023 and the Product Pages and Techdocs are now hosted on Github Pages.
Google Analytics
We currently have a Google Analytics dashboard which shows a summary of visits to our Product Page and Admin site.
There is an additional dashboard which used to allow for more detailed investigations of how people used these pages. However, this dashboard is currently broken.
Prometheus
Prometheus is an open source software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.
We run a Prometheus server which scrapes metrics from Prometheus log exporters running on the FreeRADIUS containers.
These Prometheus exporters provide a wide range of information about the actual FreeRADIUS server state and the packages being processed.
The information is used for diagnostics and tracking service availability.
If you have SSM access, you can run the commands below to see the dashboard. If not, please speak to the reliability engineers on the team about access.
The previous SSH method is in the process of being deprecated and will be removed soon, advise to setup SSM access.
ssh -L 9090:127.0.0.1:9090 prometheus.<env>.govwifi
The below code gets the instance ID and uses it to start tunnel session via SSM, update the example with the Server name and region.
INSTANCE_ID=$(gds aws govwifi-development -- aws ec2 describe-instances --filter "Name=tag:Name,Values=<ENV> Prometheus-Server" --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" --region <region> --output text)
gds aws govwifi-<env> -- aws ssm start-session --target $INSTANCE_ID --document-name AWS-StartPortForwardingSession --parameters '{"portNumber":["9090"],"localPortNumber":["9090"]}' --region <region>
eg for Dev London
INSTANCE_ID=$(gds aws govwifi-development -- aws ec2 describe-instances --filter "Name=tag:Name,Values=Alpaca Prometheus-Server" --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" --region eu-west-2 --output text)
gds aws govwifi-development -- aws ssm start-session --target $INSTANCE_ID --document-name AWS-StartPortForwardingSession --parameters '{"portNumber":["9090"],"localPortNumber":["9090"]}' --region eu-west-2
After running the command you should be able to access the Prometheus dashboard by entering the following address in your browser: