Building HarchOS: Architecture Decisions Behind Africa's Sovereign Compute Platform
From distributed scheduling to GPU-aware orchestration, we walk through the key technical choices that shaped HarchOS and why we rejected conventional cloud architectures.

When we set out to build HarchOS, we had a choice: adopt an existing orchestration platform or build from scratch. Kubernetes, Slurm, and Ray each solved parts of our problem, but none solved all of it. We needed to schedule GPU workloads across geographically distributed data centers with heterogeneous hardware, manage a three-stage inference pipeline (SENSE, THINK, ACT), enforce sovereign data residency by default, and deliver sub-12ms inference latency across the African continent. No off-the-shelf system could do all four. So we built one. This article walks through the architecture decisions that defined HarchOS and explains why each one matters for sovereign AI at continental scale.
The first and most fundamental decision was to reject the single-cluster model. Traditional orchestration systems assume a single, well-connected data center with uniform networking. Our reality is different: five hubs across Morocco, Senegal, and Cote d'Ivoire, connected by fiber links with latencies ranging from 8ms to 45ms. A single-cluster scheduler would treat inter-hub links as failures, constantly rescheduling workloads that were simply waiting on cross-border network round-trips. Instead, we implemented a federated scheduling model where each hub runs an independent scheduler that cooperates with peers through a gossip protocol. Workloads are placed locally by default and migrated only when capacity demands it, respecting both latency constraints and data sovereignty requirements.
The second decision was GPU topology awareness. In a cluster with 1,798 GPUs spanning multiple hardware generations — NVIDIA A100s, H100s, and custom inference accelerators — naive scheduling leads to catastrophic performance fragmentation. A training job that requires eight interconnected GPUs cannot be split across two racks with different NVLink topologies without suffering a 4-6x throughput penalty. HarchOS maintains a real-time topology graph of every GPU in every hub, including NVLink bandwidth, PCIe lane assignments, and cooling capacity. When a workload requests GPU resources, the scheduler performs a constrained optimization that minimizes inter-GPU latency while maximizing overall cluster utilization. The result: 94% GPU utilization across the fleet, compared to the 60-70% typical of naively scheduled clusters.
The third decision was the SENSE-THINK-ACT pipeline architecture. Rather than treating inference as a monolithic request-response cycle, we decomposed it into three distinct stages with independent scaling, fault isolation, and resource allocation. The SENSE layer ingests real-time data at 10M events per second. The THINK layer runs inference on that data using models optimized for African contexts. The ACT layer translates inference outputs into automated actions — adjusting irrigation systems, optimizing power grid distribution, or flagging anomalous financial transactions. Each stage scales independently, fails independently, and can be updated without disrupting the others. This separation of concerns transformed our operational reliability from fragile to resilient.
The fourth decision was sovereign-by-default data handling. In conventional cloud architectures, data flows to wherever compute is cheapest. In our architecture, data stays where sovereignty demands it. Every data object in HarchOS carries metadata tags that specify jurisdictional constraints — which countries it may be processed in, which legal frameworks apply, and whether it may traverse international links. The scheduler enforces these constraints as hard requirements, not soft preferences. A dataset tagged for Moroccan jurisdiction will never be routed to a Senegalese hub for processing, regardless of available capacity. This adds complexity to scheduling but eliminates an entire category of compliance risk.
The fifth and perhaps most counterintuitive decision was to build our own monitoring and observability stack rather than adopting Prometheus, Grafana, or Datadog. The reason was sovereignty itself: shipping metrics and logs to a third-party SaaS platform defeats the purpose of sovereign infrastructure. Our custom stack — internally called SENTINEL — collects, stores, and visualizes all operational telemetry within Harch Intelligence's network perimeter. It was more engineering effort upfront, but it means that no foreign company has visibility into our infrastructure's performance, capacity, or failure modes. In a world where operational intelligence is itself a strategic asset, this is not paranoia — it is due diligence.
HarchOS is not finished. No operating system ever is. But the architectural foundation — federated scheduling, topology awareness, pipeline decomposition, sovereign data handling, and internal observability — has proven robust across 18 months of production operation. The decisions we made early forced us to solve hard problems that easier choices would have deferred. Those deferred problems always surface at scale, and they are always more expensive to fix than to prevent. We chose to pay the cost upfront, and the result is a platform that can scale to 10,000 GPUs across 20 hubs without fundamental re-architecture.
Related Topics
More Articles