updated postmortem
This commit is contained in:
parent
10782342f2
commit
afdf8615fc
|
@ -1,7 +1,7 @@
|
|||
<div>
|
||||
<h1 style="margin-bottom: 0px;">Blog</h1>
|
||||
<p style="margin-top: 0px;">A collection of my thoughts, some of them may be interesting</p>
|
||||
<p><a href="./blogs/nws-postmortem-11-8-23.html">[ NWS Postmortem 11/08/23 ]</a> - November, , 2023</p>
|
||||
<p><a href="./blogs/nws-postmortem-11-8-23.html">[ NWS Postmortem 11/08/23 ]</a> - November, 16th, 2023</p>
|
||||
<p><a href="./blogs/side-project-10-20-23.html">[ Side Project Log 10/20/23 ]</a> - October 20th, 2023</p>
|
||||
<p><a href="./blogs/side-project-8-15-23.html">[ Side Project Log 8/15/23 ]</a> - August 15th, 2023</p>
|
||||
<p><a href="./blogs/side-project-8-8-23.html">[ Side Project Log 8/08/23 ]</a> - August 8th, 2023</p>
|
||||
|
|
|
@ -8,7 +8,7 @@
|
|||
</p>
|
||||
|
||||
<p>
|
||||
The incident lasted 28 minutes after which it was automatically
|
||||
The incident lasted 38 minutes after which it was automatically
|
||||
resolved and all services were restored. This is NWS' first
|
||||
outage event of 2023.
|
||||
</p>
|
||||
|
@ -29,9 +29,9 @@
|
|||
At around 09:47 UTC, Cloudflare detected that our servers in
|
||||
Texas (Austin and Hill Country) were down. It did not detect an
|
||||
error, but rather an HTTP timeout. This is an indication that the
|
||||
server has lost network connectivity. When it detected that the
|
||||
server may have lost network connectivity. When Cloudflare detected that the
|
||||
servers were down, it removed their A records from the
|
||||
entry.nws.nickorlow.com domains. Since NWS' Pennsylvania servers
|
||||
entry.nws.nickorlow.com domain. Since NWS Pennsylvania servers
|
||||
have been undergoing maintenance since August 2023, this left no
|
||||
servers able to serve requests routed to entry.nws.nickorlow.com,
|
||||
resulting in the outage.
|
||||
|
@ -52,12 +52,12 @@
|
|||
</p>
|
||||
|
||||
<p>
|
||||
No firewall rules existed that could have blocked this traffic
|
||||
No firewall rules existed that could have blocked the healthcheck traffic from Cloudflare
|
||||
for either of the NWS servers. There was no other configuration
|
||||
found that would have blocked these requests. As these servers
|
||||
are on different networks inside different buildings in different
|
||||
parts of Texas, their networking equipment is entirely separate.
|
||||
This rules out any hardware failure of networking equipment owned
|
||||
This rules out any failure of networking equipment owned
|
||||
by NWS. This leads us to believe that the issue may have been
|
||||
caused due to an internet traffic anomaly, although we are currently
|
||||
unable to confirm that this is the cause of the issue.
|
||||
|
|
Loading…
Reference in a new issue