Most AI systems aren't ready. Check yours in 15 min →
YO

Yandex Open-Sources YaFF: Zero-Copy Protobuf With Near-Struct Reads

AuthorAndrew
Published on:
Published in:AI

This is the kind of thing that sounds like a nerdy footnote until you realize it can quietly move real money: Yandex just open-sourced a new way to store Protobuf data that basically says, “What if we stop decoding and just read it straight out of memory?”

On paper, that’s brilliant. In practice, it’s also the sort of “small” infrastructure change that can either make your systems feel magically faster or haunt you for years because you optimized the wrong layer.

What’s been shared publicly is pretty straightforward. Yandex released YaFF (Yet another Flat Format), an Apache 2.0 licensed C++ project, version 0.1.0. It’s a wire format designed around zero-copy reads. No parsing step. The idea is that your .proto schema stays the source of truth, but the actual bytes are laid out in memory in a way that lets you access fields directly from a buffer.

They also shared benchmarks: reads at 9.79 ns for their “Flat Layout,” compared with 37.30 ns for FlatBuffers and 219.35 ns for standard Protobuf decoding on their stated setup. They claim it’s close to raw C++ struct access (8.14 ns). And they say they already use it in an advertising recommendation system, with 10–20% CPU savings at production scale.

If you’ve ever owned a service that spends a depressing amount of time just moving data around, those numbers are hard to ignore. A lot of “backend performance work” is basically paying a tax for your own data formats. Parse, allocate, copy, decode, repeat. And every layer says “it’s only a little overhead,” right up until you add them together and your fleet is burning cash to accomplish nothing but turning bytes into other bytes.

So yes, I buy the core motivation. “Stop parsing” is a pretty clean idea.

But I’m wary of the story people will tell themselves after reading this: that Protobuf is “slow” and YaFF is the fix. That’s the wrong lesson. Protobuf isn’t a mistake; it’s a trade. It’s flexible, it’s portable, it’s widely understood, and it’s survived countless messy real-world schema evolutions. YaFF is basically saying: keep the schema, change the physical layout so reads become cheap. That can be a great trade too, but only if your system actually needs it.

Because here’s the uncomfortable part: the things that make “near-struct speed” possible can also make your system less forgiving.

YaFF offers four layouts—Fixed, Flat, Sparse, and Dynamic (the default)—to balance read speed and schema flexibility. That’s honest, and it’s also a quiet warning label. The moment you have multiple layouts, you have decision pressure. Which layout do we pick for which message? What happens when the schema changes? Who owns that choice? If your team is disciplined, you’ll treat this like a data contract with performance budgets. If your team is normal, you’ll pick whatever looks fast in a benchmark and then get surprised later.

Imagine you’re running a high-traffic service where one hot path reads the same few fields over and over. In that world, shaving CPU with zero-copy reads is not vanity. It can mean fewer machines, lower latency, and more headroom for features. The winners are obvious: the team that owns the bill, the users who get faster responses, the company that doesn’t have to scale hardware as aggressively.

Now imagine a different scenario: a company with lots of services, lots of owners, and schema changes happening every week because product keeps moving. In that world, the “conversion at system boundaries” approach Yandex mentions—two-way conversion with standard Protobuf—starts to look like both a safe ramp and a hidden cost. Safe because you can adopt it internally without breaking everything. Costly because you might end up converting back and forth so often that you lose some of the win, or you create new failure modes where one boundary accidentally becomes the bottleneck.

And then there’s the human factor. A tool like this makes it tempting to solve organizational slowness with technical speed. If your real problem is too many fields, unclear ownership, and no one pruning old data, a faster wire format just helps you carry more junk at higher speed. You’ll feel great for a quarter and then you’ll be right back where you started, except now the system is more complex.

To be fair, the alternative perspective is strong: if Yandex is already running this in production and claiming 10–20% CPU savings, that’s not hand-wavy. That’s the kind of improvement that can pay for itself quickly, especially in heavy data pipelines. And open-sourcing it means other teams can pressure-test it, not just trust a single company’s internal use.

Still, I’d want to know what the benchmarks don’t say. How sensitive is the win to access patterns? What does it do to write paths? How painful is schema evolution in practice when real teams are adding fields, deprecating fields, and making mistakes under deadline? And how easy is it for a “simple adoption path” to turn into a permanent split-brain world where half your tools speak one format and half speak another?

This feels promising, but it also feels like a sharp tool. Used well, it cuts your CPU bill. Used casually, it cuts your ability to change.

If you were responsible for a large system, would you rather spend effort squeezing 10–20% CPU out of serialization, or spend that effort reducing how much data you move in the first place?

Frequently asked questions

What is AI agent governance?

AI agent governance is the set of policies, controls, and monitoring systems that ensure autonomous AI agents behave safely, comply with regulations, and remain auditable. It covers decision logging, policy enforcement, access controls, and incident response for AI systems that act on behalf of a business.

Does the EU AI Act apply to my company?

The EU AI Act applies to any organisation that develops, deploys, or uses AI systems in the EU, regardless of where the company is headquartered. High-risk AI systems face strict obligations starting 2 August 2026, including risk management, data governance, transparency, human oversight, and conformity assessments.

How do I test an AI agent for security vulnerabilities?

AI agent security testing evaluates agents for prompt injection, data exfiltration, policy bypass, jailbreaks, and compliance violations. Talan.tech's Talantir platform runs 500+ automated test scenarios across 11 categories and produces a certified security score with remediation guidance.

Where should I start with AI governance?

Start with a free AI Readiness Assessment to benchmark your current maturity across 10 dimensions (strategy, data, security, compliance, operations, and more). The assessment takes about 15 minutes and produces a prioritised roadmap you can act on immediately.

Ready to secure and govern your AI agents?

Start with a free AI Readiness Assessment to benchmark your maturity across 10 dimensions, or dive into the product that solves your specific problem.