init commit

This commit is contained in:
Nicholas Orlowsky 2024-05-15 22:12:14 +02:00
commit edc0dd01e8
Signed by: nickorlow
GPG key ID: 838827D8C4611687
20 changed files with 2920 additions and 0 deletions

18
templates/blog.html Normal file
View file

@ -0,0 +1,18 @@
<h1>Blog</h1>
{% for blog in blogs %}
<div style="display:flex;justify-content:space-around; padding-bottom: 5px;">
<p style="margin-bottom: 0px; padding-right: 8px;">
<a
{% if blog.url.contains("https://") %}
href="{{ blog.url }}"
{% else %}
href="/blogs/{{ blog.url }}"
{% endif %}
>
[ {{ blog.title }} ]
</a></p>
<div style="flex-grow: 1; border-bottom: 1px dotted black;"></div>
<p style="margin-bottom: 0px; padding-left: 8px;"><i>{{ blog.date }}</i></p>
</div>
{% endfor %}

View file

@ -0,0 +1,89 @@
<h1>SMC Incident Postmortem 11/08/2023</h1>
<p>
On November 8th, 2023 at approximately 09:47 UTC, SMC suffered
a complete outage. This outage resulted in the downtime of all
services hosted on SMC and the downtime of the SMC Management
Engine and the SMC dashboard.
</p>
<p>
The incident lasted 38 minutes after which it was automatically
resolved and all services were restored. This is SMC' first
outage event of 2023.
</p>
<h2>Cause</h2>
<p>
SMC utilizes several tactics to ensure uptime. A component of
this is load balancing and failover. This service is currently
provided by Cloudflare at the DNS level. Cloudflare sends
health check requests to SMC servers at specified intervals. If
it detects that one of the servers is down, it will remove the
A record from entry.nws.nickorlow.com for that server (this domain
is where all services on SMC direct their traffic via a
CNAME).
</p>
<p>
At around 09:47 UTC, Cloudflare detected that our servers in
Texas (Austin and Hill Country) were down. It did not detect an
error, but rather an HTTP timeout. This is an indication that the
server may have lost network connectivity. When Cloudflare detected that the
servers were down, it removed their A records from the
entry.nws.nickorlow.com domain. Since SMC Pennsylvania servers
have been undergoing maintenance since August 2023, this left no
servers able to serve requests routed to entry.nws.nickorlow.com,
resulting in the outage.
</p>
<p>
SMC utilizes UptimeRobot for monitoring the uptime statistics of
services on SMC and SMC servers. This is the source of the
statistics shown on the SMC status page.
</p>
<p>
UptimeRobot did not detect either of the Texas SMC servers as being
offline for the duration of the outage. This is odd, as UptimeRobot
and Cloudflare did not agree on the status of SMC servers. Logs
on SMC servers showed that requests from UptimeRobot were being
served while no requests from Cloudflare were shown in the logs.
</p>
<p>
No firewall rules existed that could have blocked the healthcheck traffic from Cloudflare
for either of the SMC servers. There was no other configuration
found that would have blocked these requests. As these servers
are on different networks inside different buildings in different
parts of Texas, their networking equipment is entirely separate.
This rules out any failure of networking equipment owned
by SMC. This leads us to believe that the issue may have been
caused due to an internet traffic anomaly, although we are currently
unable to confirm that this is the cause of the issue.
</p>
<p>
This is being actively investigated to find a more concrete root
cause. This postmortem will be updated if any new information is
found.
</p>
<p>
A similar event occurred on November 12th, 2023 lasting for 2 seconds.
</p>
<h2>Fix</h2>
<p>
The common factor between both of these servers is that they both use
Spectrum for their ISP and that they are located near Austin, Texas.
The Pennsylvania server maintenance will be expedited so that we have
servers online that operate with no commonalities.
</p>
<p>
SMC will also investigate other methods of failover and load
balancing.
</p>
<p>Last updated on November 16th, 2023</p>

View file

@ -0,0 +1,9 @@
<h1>Goodbye, NWS</h1>
<p>
<b>
Nick Web Services (NWS) is now Sharpe Mountain Compute (SMC).
</b>
</p>
<p>That is all</p>

2
templates/dashboard.html Normal file
View file

@ -0,0 +1,2 @@
<h1>Under Construction</h1>
<p>The dashboard isn't ready yet! Use the <a href="https://nws.nickorlow.com/dashboard">old website</a> for now!</p>

30
templates/index.html Normal file
View file

@ -0,0 +1,30 @@
{%- import "uptime_table.html" as scope -%}
<div>
<div style="display: flex; align-items: baseline;">
<h1 style="margin-bottom: 0px;">Sharpe Mountain Compute</h1>
<p style="margin-bottom: 0px; margin-left: 2px;">fka Nick Web Services</p>
</div>
<p style="margin-top: 0px;">Pottsville, PA - Philadelphia, PA - Austin, TX</p>
<a href="https://nws.nickorlow.com">[ Old Website (NWS Branded) ]</a>
<p>
Sharpe Mountain Compute is a hosting service based out of the Commonwealth of Pennsylvania
and the State of Texas.
We are committed to achieving maximum uptime with better performance and a lower
cost than any of the major cloud services.
</p>
<p>
We operate four datacenters located across three cities in two states. This infrastructure setup ensures redundancy and failover capabilities, minimizing downtime risks. Additionally, the geographical distribution enhances speed and accessibility, reducing latency for users across different regions.
</p>
<p>
This has led to us maintaining four nines availability (99.9931% ; 38 minutes of downtime
all year) for 2023 and 100% uptime for 2024 (YTD).
</p>
<h3>Compare us to our competitors!</h3>
{% call scope::uptime_table(uptime_infos) %}
</div>

38
templates/layout.html Normal file
View file

@ -0,0 +1,38 @@
<head>
<title>Sharpe Mountain Compute</title>
<link rel="stylesheet" href="/assets/style.css">
<link rel="icon" type="image/x-icon" href="/assets/favicon.ico">
</head>
<body>
<nav>
<div style="display: flex; justify-content: space-between;">
<div>
<a href="/">[ Home ]</a>
<a href="/system_status">[ System Status ]</a>
<a href="/blog">[ Blog ]</a>
</div>
<div>
<a href="/dashboard">[ Dashboard ]</a>
</div>
</div>
</nav>
<hr/>
{{ content|safe }}
<footer>
<hr />
<div style="display: flex; justify-content: space-between;">
<div>
<p style="margin-bottom: 0px; margin-top:0px;"><b>Sharpe Mountain Compute</b></p>
<p style="margin-bottom: 0px; margin-top:0px;"><i>formerly Nick Web Services (NWS)</i></p>
<p style="margin-bottom: 0px;margin-top: 0px;">Copyright &#169; <a href="https://nickorlow.com">Nicholas Orlowsky</a> 2024</p>
<p style="margin-top: 0px;"><i>"We're getting there" - SEPTA</i></p>
</div>
<div>
<img class="flag-img" src="/assets/flag-images/us.png" title="The United States of America"/>
<img class="flag-img" src="/assets/flag-images/us-pa.png" title="The Commonwealth of Pennsylvania"/>
<img class="flag-img" src="/assets/flag-images/us-tx.png" title="The State of Texas"/>
</div>
</footer>
</body>

View file

View file

@ -0,0 +1,21 @@
{%- import "uptime_table.html" as scope -%}
<h1>System Status</h1>
<h2>Datacenter Status</h2>
<p>
The status of each of Sharpe Mountain Compute's 4
datacenters.
</p>
{% call scope::uptime_table(dctr_uptime_infos) %}
<h2>Service Status</h2>
<p>
The status of services people host on Sharpe Mountain Compute.
Note that the uptime and performance of services hosted on
Sharpe Mountain Compute may be affected by factors not controlled by us such as
bad optimization or buggy software.
</p>
{% call scope::uptime_table(svc_uptime_infos) %}

View file

@ -0,0 +1,35 @@
{% macro uptime_table(uptime_infos) %}
<table style="width: 100%;">
<tr>
<th>Name</th>
<th>Uptime YTD</th>
<th>Response Time 24h</th>
<th>Current Status</th>
</tr>
{% for uptime_info in uptime_infos %}
<tr>
<td>
{% if let Some(click_url) = uptime_info.url %}
<a href="{{click_url}}">
{% endif %}
{{uptime_info.name}}
{% if let Some(click_url) = uptime_info.url %}
</a>
{% endif %}
</td>
<td>{{uptime_info.uptime}}</td>
<td>{{uptime_info.response_time}}</td>
<td
{% if uptime_info.status != "Up" %}
style="color: red;"
{% endif %}
>
{{uptime_info.status}}
</td>
</tr>
{% endfor %}
</table>
<p style="margin-top: 0px;"><i>Data current as of {{last_updated}}</i></p>
{% endmacro %}