Home Resources News Positron AI Raises $230M for Memory-First Inference

Positron AI Raises $230M for Memory-First Inference

Published: 2.13.2026

Key takeaways

Positron AI raised an oversubscribed $230M Series B at a $1B+ valuation to scale an energy-efficient AI inference platform and accelerate its roadmap from shipping systems today to next-gen silicon.
The company’s pitch is “memory-first” inference, arguing that transformer serving is increasingly memory-bound, with energy availability and memory capacity becoming the real scaling limits.
The real competition isn’t peak FLOPS but tokens-per-watt and tokens-per-dollar. By prioritizing LPDDR-based memory capacity and air-cooled power envelopes, Positron is positioning itself as an inference-focused alternative in a market still dominated by NVIDIA GPUs.

AI hardware are still dominated by training clusters and ever-larger GPUs, but the center of gravity is shifting. As more companies move from building models to deploying them, inference is becoming “always-on infrastructure” chat, search, copilots, content moderation, recommendation, and emerging agentic workflows that keep models running continuously.

That’s the context behind Positron AI’s latest funding milestone and why it’s drawing attention from both the semiconductor and data center infrastructure ecosystems.

Positron AI announced an oversubscribed $230 million Series B at a post-money valuation exceeding $1 billion, with the round co-led by ARENA Private Wealth, Jump Trading, and Unless. The company also cited strategic investment from Qatar Investment Authority (QIA), Arm, and Helena, alongside participation from existing backers including Valor Equity Partners, Atreides Management, DFJ Growth, Resilience Reserve, Flume Ventures, and 1517.

The company said the funding will accelerate its roadmap from “shipping Atlas systems today” to next-generation “Asimov” silicon, targeting tape-out in late 2026 and production in early 2027, with the announcement made at Web Summit Qatar.

Inference is Running into Power and Memory Walls

In Positron’s framing, the next constraint is whether data centers can physically and economically support the power draw and memory behavior of large-scale inference.

CEO Mitesh Agrawal pointed to “energy availability” as a key bottleneck and described “memory” as the other major limiter for inference scaling. That theme is increasingly consistent with how operators think about deployment reality: rack power ceilings, air-cooling limits, and the cost of feeding ever-growing model weights and KV-cache traffic without turning inference into an energy sink.

What Positron is building Platform, not just a Chip

Positron positions its offering as an inference platform spanning systems available today and new silicon coming next:

1) Atlas: Shipping Transformer Inference Servers Today

Positron markets Atlas as a “Transformer Inference Server” that is shipping today, and it publishes system-level comparisons that emphasize tokens-per-watt and tokens-per-dollar versus GPU-based reference systems.

On its Atlas product page, Positron’s head-to-head example (Llama 3.1 8B, BF16) lists:

- NVIDIA DGX H200: 5900W and 182 tokens/sec/user
- Positron Atlas: 2000W and 280 tokens/sec/user along with claimed Perf/Watt (4.54×) and Perf/Dollar (3.08×) multipliers. These are company-published figures and should be treated as such until independently replicated.

2) Asimov: “Memory-first” Custom Accelerator Silicon (coming in 2027)

Positron’s next step is Asimov, described as custom AI accelerator silicon “coming in 2027,” with headline specs that prioritize memory capacity and practical bandwidth.

Asimov’s published spec highlights include:

- 864GB to 2.3TB memory per chip
- 2.76 TB/s realizable memory bandwidth
- PCIe Gen6 x32 with CXL
- 400W TDP with air-cooling support

3) Titan: Next-generation Inference System Built around Asimov

Positron also previews Titan, a 4U “next-generation inference system” “coming in 2027,” described as powered by 4× Asimov with 8+TB memory per system and aggressive claims about parameter scale and context windows.

Titan’s page lists:

- 8+ TB Asimov memory plus 3+ TB host memory
- 11.8 TB/s system memory bandwidth
- “Supports up to 16 trillion parameters per server”
- “Supports 10 million+ tokens context window”

What “Memory-first Architecture” Means

Most AI accelerators have been marketed around peak compute. Positron is trying to flip the decision rule: start with the observation that transformer inference often spends significant time waiting on memory movement, weights, activations, and KV cache,t hen design the system so memory capacity and bandwidth don’t starve the compute.

Positron explicitly says Asimov is designed around memory bandwidth and capacity first, with compute “balanced to match.”

Two details matter for engineers and buyers watching this space:

- A different memory tradeoff (LPDDR over HBM): Positron says it chose commodity LPDDR5x over HBM, arguing HBM brings cost, power, and supply-chain risk—and that its architecture can achieve “comparable realized bandwidth” while delivering significantly higher capacity per chip.
- A roadmap aligned with emerging server I/O: Asimov lists PCIe Gen6 and CXL support, signaling a view of inference nodes as composable, memory-centric infrastructure rather than monolithic GPU islands.

Positron’s narrative is to make inference cheaper and easier to deploy at scale, especially in power-constrained environments.

Data Center Dynamics reports Positron’s Atlas is “currently shipping,” and describes it as fabricated by Intel in the U.S., while reiterating Positron’s focus on inference workloads.

Meanwhile, Positron’s own announcement emphasizes “shipping Atlas systems today” and frames Atlas as an American-fabricated and manufactured system aimed at dependable supply.

Inference Demand is Structural and Energy is the Tax

The makeup of the round is notable: the company highlighted financial trading firm involvement among co-leads, and EE Times reported Positron’s CEO described inbound interest as “insane,” framing the raise as a way to “go on offense” and scale production confidence.

TechCrunch also contextualized the raise within the broader push, by hyperscalers and AI firms, to reduce reliance on a single dominant accelerator supplier, while noting QIA’s growing interest in AI infrastructure themes.

Share

Recent News