From Research Papers to Production Systems: Tony Montes on Engineering AI Infrastructure for Financial Operations

Tony Montes discussing AI infrastructure and production systems for financial operations and enterprise workflows

The Silicon Review
27 May, 2026
Author: Sashindra Suresh

Tony Montes is among a small group of engineers who have built production AI systems at two Y Combinator–backed companies before the age of 25.

His path from dual-degree engineering student at Universidad de los Andes to visiting researcher at Cornell University to CTO of a YC-backed fintech company reflects something relatively rare in AI: a practitioner who has moved fluidly between rigorous academic research and the operational demands of production engineering, without sacrificing depth in either direction.

Montes works in AI systems engineering, focusing on production infrastructure for financial operations and enterprise workflows. His research background includes semantic shift detection, OCR correction using large language models, multimodal systems, and semantic compression, through collaborations with Cornell University and publications at ACL and EMNLP. That research foundation is what makes his production work unusual.

At Zolvo, where he serves as Co-Founder and CTO, Montes leads development of reconciliation and document-processing infrastructure for commercial lenders. The systems handle extraction workflows, discrepancy detection, validation layers, and integrations with legacy financial platforms, including FIS, Finastra, and LoanPro. Much of the work involves messy operational conditions — incomplete records, inconsistent service data, formatting drift, and reconciliation queues that still depend on manual review.

Before Zolvo, Montes worked on production AI systems at ProCibernética and Domu. At Domu, he helped build a voice AI platform that handled more than 100,000 daily calls in regulated U.S. financial environments. The engineering work centered less on experimenting with models and more on reducing latency, stabilizing orchestration pipelines, handling failure states, and maintaining audit visibility under production load.

His current work sits at the intersection of financial infrastructure, reconciliation operations, and cross-border servicing workflows. Commercial lenders working with international businesses often deal with fragmented systems, inconsistent documentation, and slow reconciliation cycles. The infrastructure his team builds is designed to reduce operational bottlenecks rather than automate lending decisions.

In this Q&A, Montes discusses the realities of moving AI systems from research environments into production financial infrastructure.

Q: Tony, you’ve published research at ACL and EMNLP. How did that academic work influence your approach to producing AI systems?

Tony Montes: “Research environments force you to pay attention to edge cases early. In projects like ‘Historical Ink: Semantic Shift Detection for 19th Century Spanish’ and our OCR correction work on Latin American newspaper archives, the inputs were rarely clean — degraded scans, inconsistent spelling, missing metadata, and low-confidence OCR outputs.

That changes how you think about evaluation. A benchmark score can look great while hiding failure patterns you only notice once the data distribution shifts.

At Cornell, while working as a visiting researcher in Professor Zhiru Zhang’s group, I co-first-authored ‘Semantic Compression of 3D Objects with Language Models for Open and Collaborative Virtual Worlds’ (ArXiv 2025). The work focused on multimodal representations, compression tradeoffs, and inference limits inside collaborative 3D systems.

Oddly enough, a lot of those constraints carried over into enterprise infrastructure later. Financial systems also operate on fragmented records, duplicated transactions, inconsistent formatting, and partially missing fields. Once you start dealing with production environments, the challenge usually stops being model capability alone. It becomes workflow stability when the data stops behaving predictably.”

Q: What led you from academic research into production engineering roles at ProCibernética and Domu?

Tony Montes: “At ProCibernética, I worked on BlooBot, which translated natural-language requests into SQL execution workflows running on Google BigQuery. Generating queries wasn’t the difficult part most of the time. The harder problem was constraining unsafe execution paths and handling schema ambiguity across constantly changing enterprise datasets.

Domu introduced a completely different operational environment. The voice AI stack handled more than 100,000 calls daily across speech recognition, telephony orchestration, reasoning agents, and text-to-speech systems. Once the platform expanded into financial workflows in the U.S., latency became a serious issue because users reacted to delays immediately during live calls.

A lot of our work shifted toward reducing inference lag, restructuring orchestration pipelines, and improving visibility around dropped sessions and transcription failures. In production systems, infrastructure maintenance usually consumes more time than model iteration.”

Q: As CTO and Co-Founder of Zolvo, what AI infrastructure are you currently building which is making a difference to the profitability and productivity margins of companies in the United States?

Tony Montes: “Zolvo was accepted into Y Combinator's Spring 2026 batch earlier this year, which gave us the capital and the network to go heavier on infrastructure. At Zolvo, we’re building reconciliation and servicing infrastructure for commercial lenders. I designed the document extraction layer using NVIDIA Nemotron-Parse, state-of-the-art document parsing models, retrained over financial data to process invoices, payment records, proofs of delivery, and servicing documents arriving in inconsistent formats.

One thing we learned fairly quickly was that aggressive automation too early in the pipeline led to more downstream reconciliation exceptions. As a result, deterministic validation rules run first, while model-driven checks run later, after the data has been normalized.

A substantial amount of engineering work involves integrating with the most used loan management systems globally, such as FIS and Finastra, and even the more niche ones such as FactorSoft, and LoanPro. Some lender systems expose very limited APIs, so parts of reconciliation still rely on scheduled batch synchronization, CSV ingestion, and manual exception review.

The infrastructure runs primarily on Google Cloud and uses Python and Django. Current reconciliation automation rates exceed 86% in production workflows, which means humans only need to handle 14% of the cases. This is more than a 5 to 1 reduction in terms of the work that was required before; and this will only continue to increase. However, keeping those numbers stable requires constant monitoring, retraining, and rule adjustments because vendor document formats change frequently.

Q: How does this work relate to broader financial and economic infrastructure in the United States?

Tony Montes: “Commercial lending still depends heavily on manual servicing and reconciliation operations. Things become more complicated when lenders work across jurisdictions, as documentation standards, payment records, and compliance requirements vary significantly.

A surprising amount of friction comes from systems disagreeing rather than from the transactions themselves. Different servicing platforms handle records differently, and reconciliation delays usually start there.

Our infrastructure reduces processing delays across servicing updates, reconciliation, and document verification workflows. If lenders can move through onboarding and servicing more quickly, capital can reach operating businesses more efficiently. When operational cost drops on the lender side, that capacity translates directly into more competitive rates for US borrowers, especially in alternative lending where small businesses have historically paid the highest cost of capital.

Most enterprise systems add operational burden. They require an adoption cycle, a learning curve, sometimes a dedicated team to keep them running. Zolvo integrates directly with the systems you already use, and complements what your team does instead of asking them to learn a new tool. The result is less operational overhead, not more.”

Q: You’ve also served as a judge and reviewer. What role does that play in your work?

Tony Montes: “I served as a judge at HackMIT 2025 and reviewed papers for the LChange workshop associated with EACL 2026 in the United States. Reviewing research exposes you to a large number of systems that perform well experimentally but haven’t been tested under production conditions.

One recurring pattern is that many projects optimize heavily around benchmark metrics while underestimating operational complexity. Once systems start interacting with unreliable inputs, infrastructure bottlenecks, or regulatory requirements, engineering priorities shift pretty quickly.

Judging at HackMIT is just different. MIT is one of the densest concentrations of technical talent and researchers in the world, and Y Combinator pulls heavily from it. Being a judge at the hackathon puts you in a room with some of the most ambitious student builders in the country, often seeing projects that turn into companies a year or two later.”

Q: What are the primary challenges when converting academic AI work into enterprise systems?

Tony Montes: “Reliability under imperfect conditions is usually the hardest part. Financial records arrive partially corrupted, APIs fail unpredictably, schemas drift over time, and compliance requirements change during deployment cycles.

Academic environments tend to optimize around model performance metrics. Production systems optimize around continuity and recoverability rather than around production.

At Zolvo, a lot of engineering time goes into retries, audit logging, encryption layers, exception routing, and monitoring. Some reconciliation workflows still require human review because low-confidence outputs can create downstream accounting problems if they propagate automatically.

Legacy infrastructure slows things down, too. Financial institutions often operate multiple systems built over the years with incompatible schemas and very limited interoperability. Integrating around those constraints usually takes longer than training the models.”

Q: Your work and systems have been transformative for many companies in the United States. What guidance would you give engineers trying to work between research and production infrastructure?

Tony Montes: “Build complete systems, not isolated demos. Benchmarks reward narrow performance. Production exposes everything they hide: ingestion failures, formatting drift, concurrency bottlenecks, rollback paths nobody tested. You only see those when real data hits real infrastructure.

Track operational metrics alongside model metrics. In deployed systems, latency stability and how cleanly the system recovers from a bad input usually matter more than two points of benchmark improvement.

Research experience helps, but production engineering is a different discipline. It rewards tolerance for incomplete data, comfort with operational tradeoffs, and the patience to keep iterating long after the first version ships.”

Q: What are your final thoughts on the future of AI Engineering in the United States?

Tony Montes: "Honestly, what I think will become a lot more obvious over the next few years — and I don't think enough people are talking about this — is that the model itself is almost becoming the easier part. The harder part is keeping everything around it running properly once it's actually live, in a real environment, with real data that doesn't always behave the way you'd expect, harness engineering is becoming the actual new AIrevolution we’re seeing in 2026 already with systems like Claude Code and Codex.

That's where a huge chunk of my energy has gone. Not necessarily building the most cutting-edge model, but figuring out how you actually operate this thing day to day? What happens when the data gets messy? What happens when the client's requirements shift halfway through? Those are the problems that keep showing up.

I've done a lot of mentoring through hackathons — HackMIT being one of them — and the same thing comes up constantly. Engineers are brilliant, technically sharp, but a lot of them are still thinking about deployment the same way they'd think about a research project. And those are just very different problems. One has a deadline and a paper. The other one never really ends.

Most of what I see in enterprise AI is honestly maintenance. Not the exciting stuff — keeping pipelines stable, managing validation rules, staying compliant under regulatory pressure, handling variability you didn't plan for. That's the bulk of the work. People outside the industry don't see any of it, but it's what determines whether the whole thing actually holds up at scale.

And look — beyond the technical side — what genuinely gets me up in the morning is what this could mean for people. My dad had a small business. He had to make some really tough credit decisions, paid a lot in interest, and some of that still follows him. That's what I'm building toward. Making credit less of a barrier for US small businesses, where access to working capital at competitive rates often determines whether a company hires, expands, or survives. If we can do that here, we can do it anywhere.

As for how I want to be remembered? I'd love for it to be as someone who proved you don't have to spend a decade in an industry or finish a PhD to actually change how it works. If a young team with no grey hair can walk into a high-stakes financial industry, handle serious amounts of client money responsibly, and actually move the needle — I hope that gives other people permission to try. That matters to me more than any specific milestone."