??
This commit is contained in:
parent
29aec69d4a
commit
10782342f2
111
out/blogs/nws-postmortem-11-8-23.html
Normal file
111
out/blogs/nws-postmortem-11-8-23.html
Normal file
|
@ -0,0 +1,111 @@
|
||||||
|
<head>
|
||||||
|
<title>Nicholas Orlowsky</title>
|
||||||
|
<link rel="stylesheet" href="/style.css">
|
||||||
|
<link rel="icon" type="image/x-icon" href="/favicon.ico">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<nav>
|
||||||
|
<a href="/">[ Home ]</a>
|
||||||
|
<a href="/blog.html">[ Blog ]</a>
|
||||||
|
<a href="/projects.html">[ Projects ]</a>
|
||||||
|
<a href="/extra.html">[ Extra ]</a>
|
||||||
|
<hr/>
|
||||||
|
</nav>
|
||||||
|
|
||||||
|
<h1>NWS Incident Postmortem 11/08/2023</h1>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
On November 8th, 2023 at approximately 09:47 UTC, NWS suffered
|
||||||
|
a complete outage. This outage resulted in the downtime of all
|
||||||
|
services hosted on NWS and the downtime of the NWS Management
|
||||||
|
Engine and the NWS dashboard.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The incident lasted 28 minutes after which it was automatically
|
||||||
|
resolved and all services were restored. This is NWS' first
|
||||||
|
outage event of 2023.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Cause</h2>
|
||||||
|
<p>
|
||||||
|
NWS utilizes several tactics to ensure uptime. A component of
|
||||||
|
this is load balancing and failover. This service is currently
|
||||||
|
provided by Cloudflare at the DNS level. Cloudflare sends
|
||||||
|
health check requests to NWS servers at specified intervals. If
|
||||||
|
it detects that one of the servers is down, it will remove the
|
||||||
|
A record from entry.nws.nickorlow.com for that server (this domain
|
||||||
|
is where all services on NWS direct their traffic via a
|
||||||
|
CNAME).
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
At around 09:47 UTC, Cloudflare detected that our servers in
|
||||||
|
Texas (Austin and Hill Country) were down. It did not detect an
|
||||||
|
error, but rather an HTTP timeout. This is an indication that the
|
||||||
|
server has lost network connectivity. When it detected that the
|
||||||
|
servers were down, it removed their A records from the
|
||||||
|
entry.nws.nickorlow.com domains. Since NWS' Pennsylvania servers
|
||||||
|
have been undergoing maintenance since August 2023, this left no
|
||||||
|
servers able to serve requests routed to entry.nws.nickorlow.com,
|
||||||
|
resulting in the outage.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
NWS utilizes UptimeRobot for monitoring the uptime statistics of
|
||||||
|
services on NWS and NWS servers. This is the source of the
|
||||||
|
statistics shown on the NWS status page.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
UptimeRobot did not detect either of the Texas NWS servers as being
|
||||||
|
offline for the duration of the outage. This is odd, as UptimeRobot
|
||||||
|
and Cloudflare did not agree on the status of NWS servers. Logs
|
||||||
|
on NWS servers showed that requests from UptimeRobot were being
|
||||||
|
served while no requests from Cloudflare were shown in the logs.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
No firewall rules existed that could have blocked this traffic
|
||||||
|
for either of the NWS servers. There was no other configuration
|
||||||
|
found that would have blocked these requests. As these servers
|
||||||
|
are on different networks inside different buildings in different
|
||||||
|
parts of Texas, their networking equipment is entirely separate.
|
||||||
|
This rules out any hardware failure of networking equipment owned
|
||||||
|
by NWS. This leads us to believe that the issue may have been
|
||||||
|
caused due to an internet traffic anomaly, although we are currently
|
||||||
|
unable to confirm that this is the cause of the issue.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
This is being actively investigated to find a more concrete root
|
||||||
|
cause. This postmortem will be updated if any new information is
|
||||||
|
found.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
A similar event occurred on November 12th, 2023 lasting for 2 seconds.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Fix</h2>
|
||||||
|
<p>
|
||||||
|
The common factor between both of these servers is that they both use
|
||||||
|
Spectrum for their ISP and that they are located near Austin, Texas.
|
||||||
|
The Pennsylvania server maintenance will be expedited so that we have
|
||||||
|
servers online that operate with no commonalities.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
NWS will also investigate other methods of failover and load
|
||||||
|
balancing.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>Last updated on November 16th, 2023</p>
|
||||||
|
|
||||||
|
<footer>
|
||||||
|
<hr />
|
||||||
|
<p style="margin-bottom: 0px;">Copyright © Nicholas Orlowsky 2023</p>
|
||||||
|
<p style="margin-top: 0px; margin-bottom: 0px;">Hosting provided by <a href="https://nws.nickorlow.com">NWS</a></p>
|
||||||
|
<p style="margin-top: 0px;">Powered by <a href="https://github.com/nickorlow/anthracite">Anthracite Web Server</a></p>
|
||||||
|
</footer>
|
||||||
|
</body>
|
121
out/blogs/side-project-10-20-23.html
Normal file
121
out/blogs/side-project-10-20-23.html
Normal file
|
@ -0,0 +1,121 @@
|
||||||
|
<head>
|
||||||
|
<title>Nicholas Orlowsky</title>
|
||||||
|
<link rel="stylesheet" href="/style.css">
|
||||||
|
<link rel="icon" type="image/x-icon" href="/favicon.ico">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<nav>
|
||||||
|
<a href="/">[ Home ]</a>
|
||||||
|
<a href="/blog.html">[ Blog ]</a>
|
||||||
|
<a href="/projects.html">[ Projects ]</a>
|
||||||
|
<a href="/extra.html">[ Extra ]</a>
|
||||||
|
<hr/>
|
||||||
|
</nav>
|
||||||
|
|
||||||
|
<h1>Side Project Log 10/20/2023</h1>
|
||||||
|
<p>This side project log covers work done from 8/15/2023 - 10/20/2023</p>
|
||||||
|
|
||||||
|
<h2 id="anthracite">Anthracite</h2>
|
||||||
|
<a href="https://github.com/nickorlow/anthracite">[ GitHub Repo ]</a>
|
||||||
|
<p>
|
||||||
|
Anthracite is a web server written in C++. The site you're reading this on
|
||||||
|
right now is hosted on Anthracite. I wrote it to deepen my knowledge of C++ and networking protocols. My
|
||||||
|
main focus of Anthracite is performance. While developing anthracite,
|
||||||
|
I have been exploring different optimization techniques and benchmarking
|
||||||
|
Anthracite against popular web servers such as NGINX and Apache.
|
||||||
|
Anthracite supports HTTP/1.1 and only supports GET requests to request
|
||||||
|
files stored on a server.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Anthracite currently performs on par with NGINX and Apache when making
|
||||||
|
1000 requests for a 50MB file using 100 threads in a Docker container.
|
||||||
|
To achieve this performance, I used memory profilers to find
|
||||||
|
out what caused large or repeated memory copies to occur. I then updated
|
||||||
|
those sections of code to remove or minimize these copies. I also
|
||||||
|
made it so that Anthracite caches all files it can serve in memory. This
|
||||||
|
avoids unnecessary and costly disk reads. The implementation of this is
|
||||||
|
subpar, as it requires that the server be restarted whenever the files
|
||||||
|
it is serving are changed for the updates to be detected by Anthracite.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
I intend to make further performance improvements, specifically in the request
|
||||||
|
parser. I also plan to implement HTTP/2.0.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2 id="yacemu">Yet Another Chip Eight Emulator (yacemu)</h2>
|
||||||
|
<a href="https://github.com/nickorlow/yacemu">[ GitHub Repo ]</a>
|
||||||
|
<p>
|
||||||
|
YACEMU is an interpreter for the CHIP-8 instruction set written in C. My main
|
||||||
|
goal when writing it was to gain more insight into how emulation works. I had
|
||||||
|
previous experience with this from when I worked on an emulator for a slimmed-down
|
||||||
|
version of X86 called <a href="https://web.cse.ohio-state.edu/~reeves.92/CSE2421sp13/PracticeProblemsY86.pdf">Y86</a>.
|
||||||
|
So far, I've been able to get most instructions working. I need to work on adding
|
||||||
|
input support so that users can interact with programs running in yacemu. It has
|
||||||
|
been fairly uncomplicated and easy to write thus far. After I complete it, I would
|
||||||
|
like to work on an emulator for a real device such as the GameBoy (This might be
|
||||||
|
biting off more than I can chew).
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2 id="nick-vim">Nick VIM</h2>
|
||||||
|
<p>
|
||||||
|
Over the summer while I was interning, I began using VIM as my primary
|
||||||
|
text editor. I used a preconfigured version of it (<a href="https://nvchad.com/">NvChad</a>) to save time, as
|
||||||
|
setting everything up can take a while. After using it for a few months, I began
|
||||||
|
making my own configuration for VIM, taking what I liked from NvChad and leaving
|
||||||
|
behind the parts that I didn't like as much.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<img src="/blog-images/NickVIM_Screenshot.png" alt="Screenshot of an HTML file open for editing in NickVIM"/>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
One important part of Nick VIM was ensuring that it was portable between different
|
||||||
|
machines. I wanted the machine to have as few dependencies as possible so that I
|
||||||
|
could get NickVIM set up on any computer in a couple of minutes. This will be especially
|
||||||
|
useful when working on my School's lab machines and when switching to new computers
|
||||||
|
in the future. I achieved this by dockerizing Nick VIM. This is based on what one of
|
||||||
|
my co-workers does with their VIM setup. The Docker container contains
|
||||||
|
all the dependencies for each language server. Whenever you edit a file with Nick Vim,
|
||||||
|
the following script runs:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<code lang="bash">
|
||||||
|
echo Starting container...
|
||||||
|
cur_dir=`pwd`
|
||||||
|
container_name=${cur_dir////$'_'}
|
||||||
|
container_name="${container_name:1}_$RANDOM"
|
||||||
|
docker run --name $container_name --network host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --mount type=bind,source="$(pwd)",target=/work -d nick-vim &> /dev/null
|
||||||
|
|
||||||
|
echo Execing into container...
|
||||||
|
docker exec -w /work -it $container_name bash
|
||||||
|
|
||||||
|
echo Stopping container in background...
|
||||||
|
docker stop $container_name &> /dev/null &
|
||||||
|
</code>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
This code creates a new container, forwards the host's clipboard to the container, and
|
||||||
|
mounts the current directory inside the container for editing.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2 id="secane">Secane</h2>
|
||||||
|
<p><a href="https://www.youtube.com/watch?v=tKRehO7FH_s">[ Video Demo ]</a></p>
|
||||||
|
<p>
|
||||||
|
Secane was a simple ChatGPT wrapper that I wrote to practice for the behavioral part of
|
||||||
|
job interviews. It takes your resume, information about the company, and information about
|
||||||
|
the role you're interviewing for. It also integrates with OpenAI's whisper, allowing you
|
||||||
|
to simulate talking out your answers. I made it with Next.JS.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<hr/>
|
||||||
|
<p><strong>These projects had minimal/no work done on them:</strong> NWS, RingGold, SQUIRREL</p>
|
||||||
|
<p><strong>These projects I will no longer be working on:</strong> Olney</p>
|
||||||
|
|
||||||
|
<footer>
|
||||||
|
<hr />
|
||||||
|
<p style="margin-bottom: 0px;">Copyright © Nicholas Orlowsky 2023</p>
|
||||||
|
<p style="margin-top: 0px; margin-bottom: 0px;">Hosting provided by <a href="https://nws.nickorlow.com">NWS</a></p>
|
||||||
|
<p style="margin-top: 0px;">Powered by <a href="https://github.com/nickorlow/anthracite">Anthracite Web Server</a></p>
|
||||||
|
</footer>
|
||||||
|
</body>
|
89
src/blogs/nws-postmortem-11-8-23.filler.html
Normal file
89
src/blogs/nws-postmortem-11-8-23.filler.html
Normal file
|
@ -0,0 +1,89 @@
|
||||||
|
<h1>NWS Incident Postmortem 11/08/2023</h1>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
On November 8th, 2023 at approximately 09:47 UTC, NWS suffered
|
||||||
|
a complete outage. This outage resulted in the downtime of all
|
||||||
|
services hosted on NWS and the downtime of the NWS Management
|
||||||
|
Engine and the NWS dashboard.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The incident lasted 28 minutes after which it was automatically
|
||||||
|
resolved and all services were restored. This is NWS' first
|
||||||
|
outage event of 2023.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Cause</h2>
|
||||||
|
<p>
|
||||||
|
NWS utilizes several tactics to ensure uptime. A component of
|
||||||
|
this is load balancing and failover. This service is currently
|
||||||
|
provided by Cloudflare at the DNS level. Cloudflare sends
|
||||||
|
health check requests to NWS servers at specified intervals. If
|
||||||
|
it detects that one of the servers is down, it will remove the
|
||||||
|
A record from entry.nws.nickorlow.com for that server (this domain
|
||||||
|
is where all services on NWS direct their traffic via a
|
||||||
|
CNAME).
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
At around 09:47 UTC, Cloudflare detected that our servers in
|
||||||
|
Texas (Austin and Hill Country) were down. It did not detect an
|
||||||
|
error, but rather an HTTP timeout. This is an indication that the
|
||||||
|
server has lost network connectivity. When it detected that the
|
||||||
|
servers were down, it removed their A records from the
|
||||||
|
entry.nws.nickorlow.com domains. Since NWS' Pennsylvania servers
|
||||||
|
have been undergoing maintenance since August 2023, this left no
|
||||||
|
servers able to serve requests routed to entry.nws.nickorlow.com,
|
||||||
|
resulting in the outage.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
NWS utilizes UptimeRobot for monitoring the uptime statistics of
|
||||||
|
services on NWS and NWS servers. This is the source of the
|
||||||
|
statistics shown on the NWS status page.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
UptimeRobot did not detect either of the Texas NWS servers as being
|
||||||
|
offline for the duration of the outage. This is odd, as UptimeRobot
|
||||||
|
and Cloudflare did not agree on the status of NWS servers. Logs
|
||||||
|
on NWS servers showed that requests from UptimeRobot were being
|
||||||
|
served while no requests from Cloudflare were shown in the logs.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
No firewall rules existed that could have blocked this traffic
|
||||||
|
for either of the NWS servers. There was no other configuration
|
||||||
|
found that would have blocked these requests. As these servers
|
||||||
|
are on different networks inside different buildings in different
|
||||||
|
parts of Texas, their networking equipment is entirely separate.
|
||||||
|
This rules out any hardware failure of networking equipment owned
|
||||||
|
by NWS. This leads us to believe that the issue may have been
|
||||||
|
caused due to an internet traffic anomaly, although we are currently
|
||||||
|
unable to confirm that this is the cause of the issue.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
This is being actively investigated to find a more concrete root
|
||||||
|
cause. This postmortem will be updated if any new information is
|
||||||
|
found.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
A similar event occurred on November 12th, 2023 lasting for 2 seconds.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>Fix</h2>
|
||||||
|
<p>
|
||||||
|
The common factor between both of these servers is that they both use
|
||||||
|
Spectrum for their ISP and that they are located near Austin, Texas.
|
||||||
|
The Pennsylvania server maintenance will be expedited so that we have
|
||||||
|
servers online that operate with no commonalities.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
NWS will also investigate other methods of failover and load
|
||||||
|
balancing.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>Last updated on November 16th, 2023</p>
|
Loading…
Reference in a new issue