diff --git a/out/blogs/nws-postmortem-11-8-23.html b/out/blogs/nws-postmortem-11-8-23.html new file mode 100644 index 0000000..d15d790 --- /dev/null +++ b/out/blogs/nws-postmortem-11-8-23.html @@ -0,0 +1,111 @@ +
++ On November 8th, 2023 at approximately 09:47 UTC, NWS suffered + a complete outage. This outage resulted in the downtime of all + services hosted on NWS and the downtime of the NWS Management + Engine and the NWS dashboard. +
+ ++ The incident lasted 28 minutes after which it was automatically + resolved and all services were restored. This is NWS' first + outage event of 2023. +
+ ++ NWS utilizes several tactics to ensure uptime. A component of + this is load balancing and failover. This service is currently + provided by Cloudflare at the DNS level. Cloudflare sends + health check requests to NWS servers at specified intervals. If + it detects that one of the servers is down, it will remove the + A record from entry.nws.nickorlow.com for that server (this domain + is where all services on NWS direct their traffic via a + CNAME). +
+ ++ At around 09:47 UTC, Cloudflare detected that our servers in + Texas (Austin and Hill Country) were down. It did not detect an + error, but rather an HTTP timeout. This is an indication that the + server has lost network connectivity. When it detected that the + servers were down, it removed their A records from the + entry.nws.nickorlow.com domains. Since NWS' Pennsylvania servers + have been undergoing maintenance since August 2023, this left no + servers able to serve requests routed to entry.nws.nickorlow.com, + resulting in the outage. +
+ ++ NWS utilizes UptimeRobot for monitoring the uptime statistics of + services on NWS and NWS servers. This is the source of the + statistics shown on the NWS status page. +
+ ++ UptimeRobot did not detect either of the Texas NWS servers as being + offline for the duration of the outage. This is odd, as UptimeRobot + and Cloudflare did not agree on the status of NWS servers. Logs + on NWS servers showed that requests from UptimeRobot were being + served while no requests from Cloudflare were shown in the logs. +
+ ++ No firewall rules existed that could have blocked this traffic + for either of the NWS servers. There was no other configuration + found that would have blocked these requests. As these servers + are on different networks inside different buildings in different + parts of Texas, their networking equipment is entirely separate. + This rules out any hardware failure of networking equipment owned + by NWS. This leads us to believe that the issue may have been + caused due to an internet traffic anomaly, although we are currently + unable to confirm that this is the cause of the issue. +
+ ++ This is being actively investigated to find a more concrete root + cause. This postmortem will be updated if any new information is + found. +
+ ++ A similar event occurred on November 12th, 2023 lasting for 2 seconds. +
+ ++ The common factor between both of these servers is that they both use + Spectrum for their ISP and that they are located near Austin, Texas. + The Pennsylvania server maintenance will be expedited so that we have + servers online that operate with no commonalities. +
+ ++ NWS will also investigate other methods of failover and load + balancing. +
+ +Last updated on November 16th, 2023
+ + + diff --git a/out/blogs/side-project-10-20-23.html b/out/blogs/side-project-10-20-23.html new file mode 100644 index 0000000..d003846 --- /dev/null +++ b/out/blogs/side-project-10-20-23.html @@ -0,0 +1,121 @@ + +This side project log covers work done from 8/15/2023 - 10/20/2023
+ ++ Anthracite is a web server written in C++. The site you're reading this on + right now is hosted on Anthracite. I wrote it to deepen my knowledge of C++ and networking protocols. My + main focus of Anthracite is performance. While developing anthracite, + I have been exploring different optimization techniques and benchmarking + Anthracite against popular web servers such as NGINX and Apache. + Anthracite supports HTTP/1.1 and only supports GET requests to request + files stored on a server. +
+ ++ Anthracite currently performs on par with NGINX and Apache when making + 1000 requests for a 50MB file using 100 threads in a Docker container. + To achieve this performance, I used memory profilers to find + out what caused large or repeated memory copies to occur. I then updated + those sections of code to remove or minimize these copies. I also + made it so that Anthracite caches all files it can serve in memory. This + avoids unnecessary and costly disk reads. The implementation of this is + subpar, as it requires that the server be restarted whenever the files + it is serving are changed for the updates to be detected by Anthracite. +
+ ++ I intend to make further performance improvements, specifically in the request + parser. I also plan to implement HTTP/2.0. +
+ ++ YACEMU is an interpreter for the CHIP-8 instruction set written in C. My main + goal when writing it was to gain more insight into how emulation works. I had + previous experience with this from when I worked on an emulator for a slimmed-down + version of X86 called Y86. + So far, I've been able to get most instructions working. I need to work on adding + input support so that users can interact with programs running in yacemu. It has + been fairly uncomplicated and easy to write thus far. After I complete it, I would + like to work on an emulator for a real device such as the GameBoy (This might be + biting off more than I can chew). +
+ ++ Over the summer while I was interning, I began using VIM as my primary + text editor. I used a preconfigured version of it (NvChad) to save time, as + setting everything up can take a while. After using it for a few months, I began + making my own configuration for VIM, taking what I liked from NvChad and leaving + behind the parts that I didn't like as much. +
+ ++ One important part of Nick VIM was ensuring that it was portable between different + machines. I wanted the machine to have as few dependencies as possible so that I + could get NickVIM set up on any computer in a couple of minutes. This will be especially + useful when working on my School's lab machines and when switching to new computers + in the future. I achieved this by dockerizing Nick VIM. This is based on what one of + my co-workers does with their VIM setup. The Docker container contains + all the dependencies for each language server. Whenever you edit a file with Nick Vim, + the following script runs: +
+ +
+echo Starting container...
+cur_dir=`pwd`
+container_name=${cur_dir////$'_'}
+container_name="${container_name:1}_$RANDOM"
+docker run --name $container_name --network host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --mount type=bind,source="$(pwd)",target=/work -d nick-vim &> /dev/null
+
+echo Execing into container...
+docker exec -w /work -it $container_name bash
+
+echo Stopping container in background...
+docker stop $container_name &> /dev/null &
+
+
++ This code creates a new container, forwards the host's clipboard to the container, and + mounts the current directory inside the container for editing. +
+ ++ Secane was a simple ChatGPT wrapper that I wrote to practice for the behavioral part of + job interviews. It takes your resume, information about the company, and information about + the role you're interviewing for. It also integrates with OpenAI's whisper, allowing you + to simulate talking out your answers. I made it with Next.JS. +
+ +These projects had minimal/no work done on them: NWS, RingGold, SQUIRREL
+These projects I will no longer be working on: Olney
+ + + diff --git a/src/blogs/nws-postmortem-11-8-23.filler.html b/src/blogs/nws-postmortem-11-8-23.filler.html new file mode 100644 index 0000000..dfccc2b --- /dev/null +++ b/src/blogs/nws-postmortem-11-8-23.filler.html @@ -0,0 +1,89 @@ ++ On November 8th, 2023 at approximately 09:47 UTC, NWS suffered + a complete outage. This outage resulted in the downtime of all + services hosted on NWS and the downtime of the NWS Management + Engine and the NWS dashboard. +
+ ++ The incident lasted 28 minutes after which it was automatically + resolved and all services were restored. This is NWS' first + outage event of 2023. +
+ ++ NWS utilizes several tactics to ensure uptime. A component of + this is load balancing and failover. This service is currently + provided by Cloudflare at the DNS level. Cloudflare sends + health check requests to NWS servers at specified intervals. If + it detects that one of the servers is down, it will remove the + A record from entry.nws.nickorlow.com for that server (this domain + is where all services on NWS direct their traffic via a + CNAME). +
+ ++ At around 09:47 UTC, Cloudflare detected that our servers in + Texas (Austin and Hill Country) were down. It did not detect an + error, but rather an HTTP timeout. This is an indication that the + server has lost network connectivity. When it detected that the + servers were down, it removed their A records from the + entry.nws.nickorlow.com domains. Since NWS' Pennsylvania servers + have been undergoing maintenance since August 2023, this left no + servers able to serve requests routed to entry.nws.nickorlow.com, + resulting in the outage. +
+ ++ NWS utilizes UptimeRobot for monitoring the uptime statistics of + services on NWS and NWS servers. This is the source of the + statistics shown on the NWS status page. +
+ ++ UptimeRobot did not detect either of the Texas NWS servers as being + offline for the duration of the outage. This is odd, as UptimeRobot + and Cloudflare did not agree on the status of NWS servers. Logs + on NWS servers showed that requests from UptimeRobot were being + served while no requests from Cloudflare were shown in the logs. +
+ ++ No firewall rules existed that could have blocked this traffic + for either of the NWS servers. There was no other configuration + found that would have blocked these requests. As these servers + are on different networks inside different buildings in different + parts of Texas, their networking equipment is entirely separate. + This rules out any hardware failure of networking equipment owned + by NWS. This leads us to believe that the issue may have been + caused due to an internet traffic anomaly, although we are currently + unable to confirm that this is the cause of the issue. +
+ ++ This is being actively investigated to find a more concrete root + cause. This postmortem will be updated if any new information is + found. +
+ ++ A similar event occurred on November 12th, 2023 lasting for 2 seconds. +
+ ++ The common factor between both of these servers is that they both use + Spectrum for their ISP and that they are located near Austin, Texas. + The Pennsylvania server maintenance will be expedited so that we have + servers online that operate with no commonalities. +
+ ++ NWS will also investigate other methods of failover and load + balancing. +
+ +Last updated on November 16th, 2023