Latest Post
Automating 24/7 Monitoring for a Weekend Image

Automating 24/7 Monitoring for a Weekend

On Call Support Automation

This is the story of how I was tasked with monitoring our website all weekend, but I wrote a script to do it instead. I did not realize it at the time, but one Friday morning, a storm was brewing. We received an innocuous ticket from a client that orders appeared to be stuck. At this point, the ticket was assigned to my colleague, and I didn't poke around much further. Introduction At around 3:00 p.m., however, the ticket was transferred to me, as my colleague was planning on logging off early. Naturally, the ticket was to be prioritized, but there was no cause for alarm. During the knowledge transfer, we noticed that the scheduled job that was supposed to be processing orders appeared to be stuck. The job that usually only takes a few seconds had been running for over an hour and had still not completed. When checking the queue, we noticed that there was an abnormal number of orders. What we later found out was that the client had a special promotion where certain products were free with the sign-up of a subscription. This resulted in an absurd number of new orders coming in. I double-checked to make sure that orders were still being processed, albeit a little slowly. I still wasn't sure that there was a bigger problem at hand, and in addition, there was a senior dev looking into the root cause. One particularly unfortunate behavior was that when the scheduled job ran, it would first grab all the orders and then process them first in, last out (FILO). Orders that were added after the scheduled job started would not be processed until the job was run again. Since there were thousands of orders coming in, some unlucky orders had to wait longer and longer, especially because the scheduled job had been restarted several times (we were unsure if it was stuck or not). Come 5:00 p.m., I sync with the senior developer along with the project manager to ensure that we are all on the same page. The job is running and processing as expected, but it is a little slow. Doing the math, it will take over 1.5 days to finish processing, even if there are no more new orders. We are all on the same page, and the PM lets us know that he will talk to the client and let them know what we have found. The senior developer works in a different timezone, so his shift had actually been over for more than four hours. With that in mind, I tell him to log off and reiterate that if there are any further problems, I will handle them. Of course, at this point, there is nothing to do but wait, so I monitored for maybe another 10 minutes before stepping away from my computer to take a break. I also made dinner plans with my significant other, and at around 6:00 p.m., I began getting ready to leave for dinner. I reminded myself that I should check my messages before I leave, but I was not too worried. In addition, I had my phone with me, so they could always reach me that way. Where the Trouble Begins I picked up my significant other and was on my way to the restaurant when I got a message from the Big Boss (my boss's boss). He was trying to figure out what was going on and put out a fire that had been raging. I realized I also had a missed call. I was still driving at this point but ended up pulling into a parking lot in order to see what was going on. I ended up calling the Big Boss back to see what was going on. It turns out that after I had left, the PM was not able to explain what had happened to the client or provide assurance that the orders were still being processed. I realized at this point that I had failed to remember to check my computer before I left for dinner. Once home, I jumped on a call with the senior developer and the Big Boss and saw that I had missed about a dozen messages. I was told that the client was having a meltdown and was not convinced that orders were going through. They even suggested that they start manually processing orders to get through the whole backlog (this would have taken more than a week and been prone to errors). The Big Boss was barely able to talk the client down and assured them that we would handle the situation. We again confirmed that the orders were being processed, even though it was slow. To ensure that the client was happy, the Big Boss told me that I would have to take turns monitoring the orders to make sure that they were still being processed. We were to give status updates every hour until the orders were completely processed.  Long term, we needed to speed up the order processing times and ensure that the queue is FIFO instead of FILO.  We never had an issue before because the client would have at most 10 orders in a day.  This particular lucky Friday, we had received well in the thousands. In the short term however, this meant that we would have to take turns staying up all night in order to ensure that the client was happy. At this point, I was just happy that I was not getting fired (as it was mostly my fault that the issue escalated this far). As such, I offerred to take the graveyard shift. The Automation In the middle of the night, I realized that it was somewhat stupid to stay up all night just to count how many orders were remaining by hand. I noticed that to login to the backofice/admin section of the website, it only required basic auth. Naturally, I started working on a Puppeteer script that logs in, goes to the right page/tab, counts how many orders are remaining, and logs the number into a Google Sheets document. From there, I used the timestamp and number of orders to graph it in a chart using Sheets. In order to automate the script runs, I created a scheduled task in Windows to run a batch script that, in turn, runs Puppeteer. At the end of the day, I probably should have just checked my messages before I left for dinner. Writing the script honestly took the better part of a working day and was totally not worth it, but at the very least, it was fun. Lessons Learned: Always have your phone with you when you are on call Always double check that an issue is completely resolved from the client's perspective before assuming all work is done

Jan 29, 2026

Older Posts

The Case of the Free Products Image
Sirius Free Products Data Integration

The Case of the Free Products

This is a post-analysis of an issue that arose with a former client. It wasn't a difficult bug to diagnose as much as it was comical. Introduction It all started one morning when we received a ticket from the client stating that some orders were not totaling correctly. Upon further investigation, we realized that some products were not being charged at all. They were receiving the products completely free. What made it bad was that these items could range from $400 to well over $1,500. Naturally, we tried recreating the issue by adding the same products to our cart, but no matter what we did, we could not reproduce the error. In the old version of the website, to get any prices or information from the store, it would make API calls to the pricing software called Siriusware. This software is quite literally some ancient legacy software that comes to a crawl during peak hours of business. In order to remedy this, we created a synchronization job that runs once a day during off-peak hours. This would synchronize everything, including products, prices, and availabilities, from Sirius into our own database. What I have come to realize is that a lot of business logic is simply synchronizing data from one system to another. This meant that the prices in our store were stored in our database, even though the source of truth came from Siriusware. This meant that the next step was to verify the prices stored in the database. Were there any prices that were incorrect or missing? To properly understand how the prices are saved in the DB, I should preface that the prices are not stored based on product, but based on variants. The same product could technically have different prices. For example, the same pair of headphones can have different colors, and each color could have a different price. What was silly was that, in rare cases, a variant could also be broken into variant items, and each variant item would have its own price. For the most part, prices stay the same each day. However, there are times when a product will be on sale for a short period of time. In order to account for this, we end up storing each product, variant, and variant item by day. Now, that might not seem so bad at first, but if you have 100 products, and each product has five variants, and each variant has another three variant items, and you save a whole year’s worth of products, we are looking at a cool 100 × 5 × 3 × 365 = 5.5 million entries in the DB. While 5.5 million entries isn't the worst, it is not the only table that needs to be accounted for. So, in order to reduce the size of the prices table, we would periodically prune prices from older days. This made sense because we don't need to check the prices of products from three weeks ago if we are making a purchase today, right? … RIGHT? Well, that is where we were wrong. When you add an item to the cart, it saves the date as well. This means that if you save an item in your cart, when we do the price lookup, it will request the price of the product, variant, or variant item using the date that the product was added to your cart. This was by design because the site administrators wanted users to be able to add products to their cart and have the price locked in until they make the payment. However, since we were pruning older prices, what ended up happening was that getting the price would fail. Instead of throwing an exception, if the price lookup fails, it simply replaces the price with a null value, which conveniently gets converted to a big, fat zero. This meant that clients who left items in their cart for three weeks or more would automatically be able to check out their items for free. And, of course, Siriusware, being the great software it is, has no guards in place to actually check whether the prices we are sending are in sync with the prices in their system and simply blindly approves whatever we send them. Conclusion Ultimately, the issue was fixed by simply getting the product's price for today in the event that the price lookup fails for that particular day. I recall needing to update the pricing information in the cart logic. Unsurprisingly, although the fix was seemingly simple, the checkout and cart flow was quite difficult to follow. Regardless, we got the job done. Perhaps I am wiser now, but at the time, I do not recall being intimidated or scared, even though I was updating the checkout logic, which could technically adversely affect all purchases made on the site. While this was technically a small change, in hindsight, the issue could have easily become quite large if done incorrectly. In addition, I think I was always a little trigger-happy when it came to deploying to production. You could say that I was younger and less experienced. There is nothing quite like the boldness of a junior developer.   Lessons Learned: Make sure the fallback for null prices is not 0 (aka free) If you price your products by date, make sure you have a fallback for older carts/dates When changing checkout cart logic, probably spend a bit more time testing (there were no issues this time, but still)

Jan 21, 2026
Zendesk Discord Chatbot using Webhooks Image
Zendesk Discord Webhook Pocketbase

Zendesk Discord Chatbot using Webhooks

Two years ago, the support team at Yaksa (before Verndale had purchased the company) used a Microsoft Teams bot application that was integrated with our Zendesk ticketing platform. The chatbot messages you in real time anytime your ticket is updated. Although we do get emails in near real time, I always find the frequency of the emails to be too noisy, and Outlook/Windows would not properly send notifications when a new email came in. I really appreciated the Zendesk Teams bot, as the instant notifications improved my response time and productivity. In the last year (although unrelated), around the time our company got bought out by Verndale, the Teams bot began misbehaving. It would start double-messaging and phantom-messaging the team about tickets that had not been updated at all. It also sent messages much later than they were updated, defeating the original purpose of the bot. It had gotten so bad that everyone on the support team muted the Zendesk bot and completely stopped using it. Later that year, we eventually migrated our messaging platform from Teams to Slack, and the Zendesk bot had become a memory of the past. Six months ago, there was an in-house AI hackathon that was hosted by Verndale. The support team decided to create a bot that integrated with Zendesk using webhooks. Ultimately, we didn't spend enough time working on the hackathon, which led to a presentation that was less than stellar. It did, however, give us access to various Zendesk APIs and webhooks. Funnily enough, in the back of my mind, I did consider the possibility of creating my own bot that used the webhooks, but the hackathon was over and our access, as well as any enabled webhooks, would soon be disabled—or so I thought. Just last month, I began playing with Docker, and one of the new applications that I discovered was Uptime Kuma. It is a site monitoring tool that can integrate into various chat platforms, notifying you when a site becomes unavailable. This was quite useful for me to check the uptime of my Plex and Immich sites. Naturally, I integrated it with Discord, but it also got me thinking again about the possibility of integrating Zendesk with Discord. I checked if the webhooks were still active, and to my excitement, they were! I immediately knew that I wanted to recreate the chatbot, but I also had to ensure that the uptime would be reasonable. I initially thought of hosting the bot using Vercel because of the generous free tier and ease of integrating with Next.js, but I quickly became afraid that it would be violating their terms of service. This bot would technically no longer qualify as a "hobby" project, as it is an internal tool for "commercial use." What I ended up going with was creating a website using Pocketpages on Pockethost. In the end, my solution was quite hacky, as the Pocketpages framework is not very well known, but the documentation was "good enough." The most important part was that it was free and the uptime was not handled by myself. I won't really bore you with the details, but effectively, the webhook that is sent by Zendesk includes all the ticket information, such as ticket number, assignee ID, actor ID, message, ticket type, and custom statuses. With this information, I am able to identify who updated the ticket, who the ticket is assigned to, if the ticket is about to breach, the ticket number, and the client organization. Using that information, I am able to send the ticket to the appropriate Discord user along with essential information. I had a small issue with duplicate messages because multiple webhooks would come in from the same ticket (sometimes within the span of 3 ms), but I finally sorted out the issue by grabbing all the records in the last ~10 seconds and only sending a message if it is the oldest message in the 10-second window. Now, I am able to relax and not worry about whether or not I have missed an update on any of the tickets assigned to me.

Nov 08, 2025
Automating My Plex Server Image
Plex Automation Sonarr Radarr

Automating My Plex Server

I have been running my Plex server since approximately 2015 and I have always been too lazy to automate the workflow.  Often, it is jokingly said that developers spend countless hours automating workflows that takes a few minutes to complete by hand.  Ironically, this is probably one of the few cases where I could have benefited by setting up the automation much sooner.  I am mildly ashamed to say that I have wasted countless hours over the past 10 years by not automating requests and downloads. Introduction Initially, I installed Plex on a Chinese Windows 2-in-1 Tablet in order to share my downloaded content for online Movie Nights. If I remember correctly, it must have been running Windows 8.  Even from the start, the storage was mounted remotely via Google Cloud Drive with rclone.  In those days rclone could not mount on Windows, because WinFSP had not been developed yet.  This meant that we ran a Linux Virtual Machine with VMware just so that we could take advantage of the unlimited Google drive storage provided by Gsuite.   Eventually, the unlimited storage had come to an end, but I was able to pool my local drives with Drivepool for Windows.  Luckily the desktop tower case that I had at the time (the Fractal R2) could fit something like 14 3.5" hard drives as well as 2 2.5" drives.  With that realization, I added a PCIE card Mini-SAS connector so that I could add an addition 8 hard drives.  As Google wisened up and closed their unlimited storage service plan, I migrated my content locally.  Unfortunately, I didn't have enough local storage and ended up losing more than half my library at the time.  I knew the day would come, but it was still a sad day. That brings me to my current setup, which is still running on Windows 10 and is still just a ghetto JBOD (just a bunch of drives).  What makes it great, however, is the recently added power of automation through docker. Motivation So what finally got me off my lazy ass and down this rabbithole of Sonarr, Radarr, and Overseer?  It was a combination of several factors.  Earlier this year, I had discovered Coolify and finally figured out how Nginx works.  I recently discovered Portainer, and docker desktop for Windows finally fixed several known memory leaks.  Unironically, discovering Portainer changed the way I run apps and kicked off my self hosting journey. Prior to using Portainer, I had a lot of concerns regarding data loss and recovery when running docker. Putting it all together Finally, after figuring out a consistent way to run docker containers without the risk of losing data, I was ready to start the setup.  In the past, I had tried setting my homelab on a separate machine using proxmox, but found that it was incredibly hard to change my IP.  As we do frequently change ISPs I figured I would skip out on proxmox this time.  What I do miss, however, is how easy it was to remote desktop into a proxmox environment out of the box.  So naturally, you would think that I just decided to host it on simple distro like Ubuntu server or just plain Ubuntu.  What I ended up doing was going the lazy route, which was to host sonarr and radarr on my existing plex server.  In hindsight, what would have made more sense was to install the *arr application on a separate Linux machine and map the existing network drive, but don't fix what isn't broken. So I finally setup my instance following the TechHutTv guides.  The guides are a fantastic resource and very flexible. In his particular example, he tunnels all his traffic through Wireguard, whereas I am not.  Overall, there were a few hiccups, but once I got it setup, it worked pretty flawlessly. (Hopefully I am not jinxing myself) I was able to setup: Sonarr - for shows Radarr - for movies Prowlarr - for indexing torrents Overseerr - for making requests Flaresolverr - for bypassing cloudflare Qbittorrent - as the torrent client The largest issue that I ran into was due to pathing, as I was originally trying to use the Windows qbittorrent client instead of the docker image.  With that out of the way, all that was left was some tweaking to the quality settings and preferences in codec types.  For the most part, I am able to sit back and let Sonarr and Radarr handle incoming requests with little fuss or intervention.  Of course, I still do occasionally check other sources in the event a show/movie is not available, but for now, I am quite pleased that the setup is working after a weekend's worth of effort. Lessons Learned Don't be afraid to automate things, it will save you time (sometimes)  

Oct 15, 2025
Saying Goodbye to Windows (Windows 10 EOL) Image
Windows 10 Linux Fedora

Saying Goodbye to Windows (Windows 10 EOL)

With the end of life of Windows 10 and the stringent hardware requirements of Windows 11, users are left stranded without a secure path forward.  Introduction With the new online account requirements during Windows 11 installation, forced ads within the operating system,  Windows recall, and the ever increasing telemetry data mining, Microsoft is doing their best to alienate their user base.  Ironically, statcounter was reporting that they had a spike in Windows 7 devices on various website they are tracking.  (Unsure of how accurate this really is, however) Some users are opting to try and bypass the TPM 2.0 requirements of Windows 11, while others are opting for Linux as an alternate operating system.  With the steady progress that Valve has been making towards Wine and Proton, Linux has become a more viable solution for gamers as well as the average Windows users. Personally, on my Plex server, out of sheer laziness, I have opted to extend my Windows 10 support for another year, as I do not want to: Upgrade my hardware to support Windows 11  Completely reinstall a Linux distro and refomat the 100+ TB of data Despite my laziness to change the OS of my media server, (my mentality at this point is to not "fix" anything that is not broken), I have tried out a few distros as my daily driver to see what the switching out of Windows would look like.  On said journey to replace Windows, it turns out that I am fairly unopinionated in what my desktop experience is like as long as I am able to run these few applications. Visual studio code Postman Plex media player Parsec Brave Steam Discord Even prior to the Windows 10 EOL date, I have been using Linux Mint, which is a debian based distro that is very user friendly.  Since then, I have also installed Fedora with the KDE plasma desktop environment, and aside from using a different package manager to install applications, my user experience has been mostly the same. Honestly, my workflow does not require a specific distro or operating system as I am effectively just working with Visual Studio Code and the browser. (The soy boy development environment) My active personal projects are using Next.js and pocket pages with turbopack, which have had no issues on Windows as well as on Linux.  Frankly speaking, it may be even running better on Linux. My biggest issue right now is not having a great remote desktop application to remotely access my personal machine.  On Windows, Parsec has been a great way for me to remote into various machines, but Parsec unfortunately does not have a hosting feature on Linux.  As such, I have been looking into Rustdesk to see if that would adequately support use cases and it appears that tentatively, it will.  I will still need additional configuration in order to able to remotely access my machines from outside the network, but this is a decent start. Aside from my remote desktop woes, Fedora, so far seems to be stable and working well.  Although I have only been using it less than a week, I am finding that it is more stable than Linux Mint, which is somewhat surprising.  Another distro I am planning on checking out would be Arch, but for now, Fedora and Linux Mint are both serving me well. I would honestly recommend looking into Linux as a viable alternative (Assuming you are not mainly using proprietary software in your daily workflow that is not available outside of the Windows/Mac ecosystem) Lessons Learned Fuck Microsoft

Oct 10, 2025
Third Time is the Charm Image

Third Time is the Charm

Just doing one final test before really posting frfr.

Sep 05, 2025
Second blog post test Image
lifestyle random thoughts

Second blog post test

Second blog post test

Sep 05, 2025