
Over the past year we’ve built four agents for telecommunications network operations: the RAN Agent, the L3 Agent, the VoLTE...
Written by Prashant Kumar, Product Strategy Director
Published by FutureNet World on 26 May 2026
Over the past year we’ve built four agents for telecommunications network operations: the RAN Agent, the L3 Agent, the VoLTE / IMS Quality Agent, and the Data Steward. Three were built in partnership with Google Cloud and one developed entirely in-house. Across all four, the lessons that stayed with us had less to do with the models themselves and more to do with the infrastructure around them.
This is a reflection on that infrastructure — the data platform, the tooling, the workflow knowledge, and the architectural decisions we got right, got wrong, and revised.
When we started building the RAN Agent, the working assumption was that model capability was the primary variable. It wasn’t. The primary variable was data access, specifically, the challenge of operating across PM counters with vendor-specific naming conventions, probe data on a different collection rhythm to performance management exports, and configuration state held in OSS systems with limited documentation.
An agent operating across these domains without a validated data layer doesn’t hallucinate in the typical sense. It reasons correctly from poorly understood inputs, which is a more subtle failure mode and harder to catch in testing.
This is the problem the Data Steward is designed to address. Rather than answering operational questions directly, it functions as a metadata orchestrator — maintaining a catalogue of what data exists, where it lives, what it means, and the quality score attached to it. When the VoLTE agent needs to analyse IMS session failures, it requests the appropriate tables and schemas from the Data Steward rather than reaching into raw storage. That separation keeps data semantics consistent across agents and makes quality issues visible rather than silently propagated.
Model capabilities improve with each release cycle. Data platform quality does not improve automatically. It requires deliberate, sustained investment, and in our experience it accounts for a disproportionate share of the work in getting agents to production.
For the L3 Agent, built independently without managed cloud services, we spent several weeks mapping the actual decision workflow before writing agent code. This included what a senior engineer examines when a layer-3 issue is escalating, what data is consulted at each decision point, what constitutes a sufficient answer to proceed. This work felt slow at the time. In retrospect it was the most valuable part of the build.
Agents don’t construct good operational workflows from context. They encode existing ones. Where that institutional knowledge isn’t captured explicitly, the agent fills the gap, but the result rarely matches what experienced engineers actually do. The L3 build reinforced something we now treat as a firm principle: the workflow specification is an input to agent design, not a byproduct of it.
Building without hyperscaler services also imposed a useful constraint. Every architectural decision had to stand on its own merits rather than being absorbed by a managed layer. The resulting agent is more self-contained, and several of the design choices from that build have since informed how we structure the Google Cloud Platform-partnered agents.
None of our agents launched with autonomous capabilities. All of them started as co-pilots, with a human reviewing outputs before any action was taken. This sequencing is partly about safety, but more fundamentally it’s how confidence gets calibrated, both the agent’s confidence in its outputs and the operator’s confidence in the agent.
The RAN Agent surfaced this clearly. There is a meaningful gap between an answer that is correct and an answer that a network engineer will act on. Closing that gap took time and involved improvements to data provenance, tooling consistency, and how the agent surfaces its reasoning, not model changes.
The VoLTE agent, our most mature deployment, now handles specific well-defined workflows with a lighter human touch. For most paths, review still happens before action. Fully closed-loop autonomous operations, where the agent resolves issues end-to-end with human involvement only on exceptions, remains the direction of travel, not the current state. The industry is moving there; we don’t think it’s there yet at the scope telecom operations requires.
A year ago, retrieval-augmented generation was a practical necessity for any knowledge-intensive agent task. Context windows were limited enough that careful selection of what the model could see at any moment was an architectural constraint you designed around. We invested accordingly embedding pipelines, vector stores, retrieval logic to surface the right counter definitions and vendor documentation at query time.
The models available now carry context windows large enough to change that calculus. For a significant portion of our use cases, loading the full relevant context directly — a complete counter dictionary for a given network function, the relevant PM table schemas, sufficient historical context — outperforms retrieval. It’s simpler, the failure modes are more transparent, and the reasoning is easier to inspect.
RAG remains necessary for genuinely large corpora. But the threshold has shifted considerably, and architectural complexity we treated as fixed a year ago is now something we actively reassess. The broader principle is that as model capabilities evolve, decisions made under earlier constraints deserve periodic review.
An agent’s reliability is bounded by the reliability of its tools. Inconsistent response schemas such as a tool that returns a list in one case, a single object in another, a silent empty result on error, force the agent to make assumptions. Those assumptions fail in proportion to how unusual the input is, which tends to mean exactly the cases that matter most operationally.
Across the agent builds, we’ve converged on a set of tooling standards: consistent response schemas regardless of result size, structured error responses the agent can reason about and potentially recover from, consistent mutation operations (agents retry on failure, retries need to be safe), and data provenance included in every data-returning tool response. These aren’t novel principles, but in an agent context they carry more weight than in a conventional API: there’s no human reading an error message and deciding how to proceed.
The execution environment — what systems the agent can reach, what actions it can take without approval — is the other half of this. In network operations, the consequence of an incorrect automated action is service degradation, not a failed transaction. That reality shapes permission scoping, action boundaries, and the graduated autonomy model described above.
After a year, what we have is four agents at varying levels of maturity, a data platform we regard as genuinely differentiated, and a set of architectural principles grounded in real deployments rather than whiteboard design. The work ahead — extending the agent portfolio, deepening autonomous capabilities, expanding the data platform coverage — is clearer for having built what we’ve built.
The agents we have now are not the agents we’ll have in two years. The stack underneath them, built carefully and iteratively, is what makes that progression possible.

Over the past year we’ve built four agents for telecommunications network operations: the RAN Agent, the L3 Agent, the VoLTE...

Last year, Google Cloud unveiled the Autonomous Network Operations framework, a comprehensive blueprint designed to help Communication Service Providers (CSPs)...

A few months ago, I shared an article regarding the importance of automated processes within the RAN domain, emphasising that...

The telecommunications industry is entering a transformative era, driven by the deep integration of artificial intelligence (AI) into its core...
