PALO ALTO, CA – 13/09/2025 – (SeaPRwire) – In the rapidly evolving landscape of artificial intelligence, where investment and innovation collide at unprecedented scale, one of the industry’s most pressing challenges is no longer sheer computational horsepower but rather the efficiency of communication across GPUs, clusters, and clouds. Against this backdrop, Clockwork, a Silicon Valley-based startup founded out of Stanford research, has launched FleetIQ, a novel Software-Driven Fabric (SDF) that aims to close the AI efficiency gap, maximize GPU utilization, and fundamentally reshape how large-scale AI infrastructure is managed.
The unveiling of FleetIQ represents a significant milestone in the effort to transform massive AI investments into tangible outcomes. While enterprises, hyperscalers, and neocloud providers have poured billions into GPU infrastructure, studies reveal that only a fraction of that theoretical performance is realized in production. Industry analysts estimate that clusters containing tens of thousands of GPUs often run at just 30% to 55% efficiency, leaving enormous amounts of expensive hardware underutilized. In large-scale deployments—such as training a cutting-edge foundation model on a 100,000-GPU system—the financial waste from idle capacity can reach billions of dollars. The cost is not only economic but also environmental, as the energy demands of wasted cycles mount. FleetIQ was designed to directly address this unsustainable imbalance.
Bridging the AI Efficiency Gap
What sets FleetIQ apart is its emphasis on the communication layer of AI infrastructure. Today’s GPUs, whether from NVIDIA, AMD, or custom accelerators, are already powerful engines for computation. Yet the synchronization between them—coordinating data across links, networks, and clusters—often stalls progress. A single lagging connection, congestion event, or minor fault can disrupt an entire workload, forcing expensive restarts and undermining overall productivity. This problem, widely referred to as the “AI efficiency gap,” has emerged as the central bottleneck for the next generation of AI at scale.
FleetIQ’s technology traces its roots back to foundational research at Stanford University, where co-founders Yilong Geng, Deepak Merugu, and Professor Balaji Prabhakar developed software-based solutions for highly accurate global clock synchronization and dynamic traffic control. Building on this foundation, FleetIQ introduces microsecond-level visibility across clusters, enabling operators to identify slowdowns and pinpoint failures with precision. Its stateful fault tolerance ensures that workloads continue uninterrupted even when links fail, avoiding the cascade of restarts that can stall progress for hours or days. By incorporating real-time, path-aware routing, FleetIQ dynamically steers around congestion and contention, ensuring that resources are continuously optimized.
Strategic Expansion into the AI Domain
Until recently, Clockwork was known for its work in cloud performance acceleration and ultra-low-latency observability. With FleetIQ, the company is expanding its capabilities into the demanding domain of AI and GPU infrastructure. This move underscores a broader industry trend: as AI shifts from academic research and prototypes into mainstream enterprise production, the requirements for stability, predictability, and efficiency are intensifying. By extending its expertise to the GPU communication layer, Clockwork is positioning itself as a strategic partner for enterprises and governments seeking to build sustainable AI systems.
Industry support for Clockwork’s approach has been strong. The company recently closed a new funding round at four times the valuation of its previous raise. Leading the round was NEA, with participation from an array of high-profile investors, including Intel Chairman Lip-Bu Tan, former Cisco CEO John Chambers, and e& Capital. This financial backing, paired with the appointment of seasoned technology executive Suresh Vasudevan as Chief Executive Officer, signals confidence in the company’s trajectory. Vasudevan, who previously scaled Nimble Storage to IPO and grew Sysdig into a leader in cloud security, has emphasized that the next decade of AI infrastructure will be defined by breakthroughs in communication rather than raw computation.
Industry Validation and Early Deployments
FleetIQ is already in use or being tested by major organizations across industries. Uber, for example, has reported significant gains in networking observability and fault detection when running Clockwork’s platform across its hybrid, multi-cloud environment. Latency is critical to Uber’s real-time logistics operations, and reductions in outage detection times—from hours to minutes—are expected to improve service reliability for millions of users while reducing operational costs.
Other adopters include Nebius, which highlighted the need for reliability in large-scale AI workloads without hardware lock-in; NScale, a startup building planetary-scale AI infrastructure; and WhiteFiber, which uses Clockwork’s observability to accelerate deployment of GPU clusters. In Europe, Denmark’s DCAI is leveraging FleetIQ to power its sovereign AI supercomputer, Gefion, supporting research across fields such as quantum computing, drug discovery, and climate modeling. Each of these organizations underscored the same point: as AI scales, resilience and communication efficiency become mission-critical.
Partnerships with Industry Leaders
FleetIQ has also drawn support from technology giants including Broadcom and AMD. Broadcom executives pointed to the synergy between their Ethernet-centric silicon and Clockwork’s software-driven approach, which enhances observability and failover at scale. AMD similarly highlighted how Clockwork’s software complements its ROCm ecosystem and MI300X GPU systems by ensuring consistency and efficiency across deployments.
Broader Implications for AI Infrastructure
Analysts describe FleetIQ’s launch as a significant development for the industry. By functioning as a vendor-neutral, hardware-agnostic fabric, the platform helps organizations avoid being locked into proprietary ecosystems while simultaneously boosting reliability. It works seamlessly across Ethernet, InfiniBand, and RoCE, as well as across NCCL, RCCL, and custom accelerator environments. Importantly, it also enables enterprises to run training, inference, and user-facing applications concurrently on the same clusters, improving economics and reducing time-to-market.
The implications stretch beyond enterprise balance sheets. Governments and research organizations are increasingly concerned with sovereign AI infrastructure—building domestic capabilities that are sustainable, resilient, and not dependent on foreign hardware. FleetIQ’s software-only, flexible design makes it a candidate for these efforts, providing an abstraction layer that simplifies the orchestration of GPU fleets while driving better utilization.
Looking Ahead
The company’s leadership team is expanding in anticipation of growth. In addition to Vasudevan, Clockwork has named Joe Tarantino as Vice President of Worldwide Sales. Tarantino, who previously helped scale Cohesity and worked at GMI Cloud, brings deep expertise in navigating the rapid consumption of GPUs among enterprises. Both executives are expected to guide Clockwork into its next phase of hypergrowth.
In sum, Clockwork’s FleetIQ is not merely an incremental improvement but a potential shift in how AI infrastructure is conceived and operated. By tackling the communication bottleneck that limits GPU utilization, the platform seeks to transform idle silicon into productive intelligence. The company’s momentum, bolstered by investment, customer adoption, and leadership expertise, suggests that FleetIQ could become a cornerstone technology in the race to scale AI sustainably.