21+ years across Linux, virtualization, networking, and observability โ with a current obsession for making AI safely useful for real operations.
I started on the data-center floor โ racking and cabling servers, planning power and cooling for 42U racks, running cage consolidations. Two decades later I'm architecting high-availability clusters, writing automation in Bash, Python, and Go, and building tooling that lets AI operate production systems without breaking them. I've done the full stack of infrastructure, top (OS) to bottom (hardware).
Today I run day-to-day operations for roughly 215 production servers as a remote systems administrator โ web, database, Node, API, and Redis fleets โ building new environments, HA web clusters, and the scripts that keep the lights on. Before that I led a NOC team of eight and oversaw 500+ servers through the kind of 2 a.m. incidents you only forget if you were lucky.
My approach is boring on purpose: build on the tools a team already trusts instead of forcing a migration, make the safeguards visible, and let the documentation write itself. The flashy part is what that approach lets you do safely โ like the projects below.
A system I designed that gives AI a persistent memory and safe, gated access to real infrastructure tools. Instead of a chat assistant that forgets everything between sessions, Maven loads the full state of the work each time โ so any task can be started or resumed instantly. It connects to existing tools through visible safeguards: every change is validated, recorded in version control, and rolled back automatically if it fails. The documentation gets produced as a byproduct of the work, not as a chore.
An atomic deploy pipeline for an enterprise monitoring fleet: stage โ validate against a copy โ commit โ deploy โ reload, with automatic rollback if validation fails. A real config change rolled out across 200+ hosts and 3,000+ services in about one second with zero errors โ and a deliberately broken change was rejected and rolled back before it could ever touch production. Safety you can watch happen.
Forked and extended an open-source asset-management tool to add a REST API, an OS-lifecycle / package model, and human-in-the-loop confirmation dialogs that show a clear before/after diff on every change. Then built automation on top that can resolve, create, and update hosts safely. Built on what the team already used โ no migration tax.
Stood up a real-time video processing pipeline on a Tesla V100 โ GPU passthrough into a VM, NVIDIA DeepStream / Triton / TensorRT โ and tuned it for stable single-stream throughput. A deep dive into GPU performance and the realities of ML infrastructure.
A production-grade lab I run to keep my edge: a Proxmox HA hypervisor with ZFS storage pools, self-hosted Git (Forgejo), a reverse proxy with automated TLS, a full observability stack (Prometheus / Grafana / Loki), and a security stack (Wazuh SIEM + Suricata IDS). Where I prototype everything before it's real.
References available upon request.