Sometimes, a demo is the best way to grasp a product’s potential, and that’s exactly the case with Runware. Visit Runware’s website, enter a prompt, and hit enter to generate an image—you’ll be amazed at how quickly it generates results, taking less than a second.
As a newcomer in the AI inference and generative AI space, Runware is focused on building proprietary servers and optimizing the software layer to eliminate bottlenecks and enhance inference speeds for image generation models. The startup has successfully raised $3 million in funding from notable investors, including Andreessen Horowitz’s Speedrun, LakeStar’s Halo II, and Lunar Ventures.
Runware aims not to reinvent the wheel, but to make it spin more efficiently. The company designs its own servers, packing as many GPUs as possible onto a single motherboard. It also implements a custom cooling system and manages its own data centers.
To optimize AI model performance, Runware has enhanced the orchestration layer with BIOS and operating system tweaks to reduce cold start times. Additionally, the company has developed proprietary algorithms to effectively allocate inference workloads.
If you consider companies like Together AI, Replicate, and Hugging Face, they all sell compute based on GPU time,” said co-founder and CEO Flaviu Radulescu in an interview with TechCrunch. “When you compare the time it takes for us to generate an image against theirs, along with the pricing, it’s clear that we offer significantly lower costs and much faster results.
It’s going to be impossible for them to match this performance,” he added. “Especially in a cloud provider environment, where virtualization introduces extra delays.
Runware is focused on optimizing the entire inference pipeline by enhancing both hardware and software. The company aims to incorporate GPUs from various vendors in the near future. This is crucial for many startups, given Nvidia’s dominance in the GPU market, which often results in high costs for Nvidia GPUs.
Currently, we only use Nvidia GPUs, but we see this as just the starting point for our software layer.
_ Radulescu explained _
Our ability to rapidly switch models in and out of GPU memory allows us to efficiently serve multiple customers on the same GPUs.
We’re different from our competitors, who typically load a model onto a GPU for a specific task. We’ve developed a software solution that enables us to switch models in GPU memory while performing inference.
If AMD and other GPU manufacturers can establish compatibility layers for standard AI workloads, Runware will be well-positioned to create a hybrid cloud leveraging GPUs from multiple vendors. This strategy will be vital for maintaining lower costs in the competitive AI inference landscape.
 
			         
			        