updated postmortem

This commit is contained in:
Nicholas Orlowsky 2023-11-16 16:41:01 -05:00
parent 10782342f2
commit afdf8615fc
No known key found for this signature in database
GPG key ID: 58832FD3AC16C706
2 changed files with 6 additions and 6 deletions

View file

@ -1,7 +1,7 @@
<div> <div>
<h1 style="margin-bottom: 0px;">Blog</h1> <h1 style="margin-bottom: 0px;">Blog</h1>
<p style="margin-top: 0px;">A collection of my thoughts, some of them may be interesting</p> <p style="margin-top: 0px;">A collection of my thoughts, some of them may be interesting</p>
<p><a href="./blogs/nws-postmortem-11-8-23.html">[ NWS Postmortem 11/08/23 ]</a> - November, , 2023</p> <p><a href="./blogs/nws-postmortem-11-8-23.html">[ NWS Postmortem 11/08/23 ]</a> - November, 16th, 2023</p>
<p><a href="./blogs/side-project-10-20-23.html">[ Side Project Log 10/20/23 ]</a> - October 20th, 2023</p> <p><a href="./blogs/side-project-10-20-23.html">[ Side Project Log 10/20/23 ]</a> - October 20th, 2023</p>
<p><a href="./blogs/side-project-8-15-23.html">[ Side Project Log 8/15/23 ]</a> - August 15th, 2023</p> <p><a href="./blogs/side-project-8-15-23.html">[ Side Project Log 8/15/23 ]</a> - August 15th, 2023</p>
<p><a href="./blogs/side-project-8-8-23.html">[ Side Project Log 8/08/23 ]</a> - August 8th, 2023</p> <p><a href="./blogs/side-project-8-8-23.html">[ Side Project Log 8/08/23 ]</a> - August 8th, 2023</p>

View file

@ -8,7 +8,7 @@
</p> </p>
<p> <p>
The incident lasted 28 minutes after which it was automatically The incident lasted 38 minutes after which it was automatically
resolved and all services were restored. This is NWS' first resolved and all services were restored. This is NWS' first
outage event of 2023. outage event of 2023.
</p> </p>
@ -29,9 +29,9 @@
At around 09:47 UTC, Cloudflare detected that our servers in At around 09:47 UTC, Cloudflare detected that our servers in
Texas (Austin and Hill Country) were down. It did not detect an Texas (Austin and Hill Country) were down. It did not detect an
error, but rather an HTTP timeout. This is an indication that the error, but rather an HTTP timeout. This is an indication that the
server has lost network connectivity. When it detected that the server may have lost network connectivity. When Cloudflare detected that the
servers were down, it removed their A records from the servers were down, it removed their A records from the
entry.nws.nickorlow.com domains. Since NWS' Pennsylvania servers entry.nws.nickorlow.com domain. Since NWS Pennsylvania servers
have been undergoing maintenance since August 2023, this left no have been undergoing maintenance since August 2023, this left no
servers able to serve requests routed to entry.nws.nickorlow.com, servers able to serve requests routed to entry.nws.nickorlow.com,
resulting in the outage. resulting in the outage.
@ -52,12 +52,12 @@
</p> </p>
<p> <p>
No firewall rules existed that could have blocked this traffic No firewall rules existed that could have blocked the healthcheck traffic from Cloudflare
for either of the NWS servers. There was no other configuration for either of the NWS servers. There was no other configuration
found that would have blocked these requests. As these servers found that would have blocked these requests. As these servers
are on different networks inside different buildings in different are on different networks inside different buildings in different
parts of Texas, their networking equipment is entirely separate. parts of Texas, their networking equipment is entirely separate.
This rules out any hardware failure of networking equipment owned This rules out any failure of networking equipment owned
by NWS. This leads us to believe that the issue may have been by NWS. This leads us to believe that the issue may have been
caused due to an internet traffic anomaly, although we are currently caused due to an internet traffic anomaly, although we are currently
unable to confirm that this is the cause of the issue. unable to confirm that this is the cause of the issue.