What’s the BUZZ? — AI in Business

Transforming AI Agents into Reliable Enterprise Assistants (Tomas Gogar)

Andreas Welsch Season 4 Episode 7

What if the next member of your team could be an AI agent, designed to think independently, adapt, and collaborate seamlessly with others?

In this episode, host Andreas Welsch engages in an enlightening conversation with Tomas Gogar, CEO and co-founder of Rossum, as they explore the potential of AI agents in the modern workplace. Together, they discuss critical topics such as the challenges of making AI agents enterprise-ready, the importance of reliability and governance in AI deployments, and how these intelligent systems can redefine efficiency and decision-making in businesses.

Key discussions include:

  • How can organizations create robust AI agents that excel in complex environments?
  • What unique qualities make an AI agent suitable for handling processes requiring precision and accountability?
  • The necessity of establishing clear guardrails to ensure AI operates within safe boundaries while still retaining flexibility.
  • Insights on fostering collaboration between human employees and AI agents for optimal team dynamics.

Whether you’re a business leader exploring AI integration or a tech enthusiast eager to understand AI’s evolving role, this episode is filled with practical insights and innovative strategies.

Don’t miss out on this opportunity to understand how to harness the AI wave effectively! Tune in now to discover how to transform AI hype into measurable business success!

PS: Thank you to the team at Rossum AI for sponsoring this episode.

Questions or suggestions? Send me a Text Message.

Support the show

***********
Disclaimer: Views are the participants’ own and do not represent those of any participant’s past, present, or future employers. Participation in this event is independent of any potential business relationship (past, present, or future) between the participants or between their employers.


Level up your AI Leadership game with the AI Leadership Handbook:
https://www.aileadershiphandbook.com

More details:
https://www.intelligence-briefing.com
All episodes:
https://www.intelligence-briefing.com/podcast
Get a weekly thought-provoking post in your inbox:
https://www.intelligence-briefing.com/newsletter

Andreas Welsch:

Today we'll talk about the three steps to make AI agents enterprise ready, and who better to talk about it than someone who's actively working with that Tomas Gogar. Hey Tomas. Thank you so much for joining.

Tomas Gogar:

Hello, Andreas. Thanks for having me.

Andreas Welsch:

Fantastic. Hey, why don't you tell us a little bit about yourself, who you are and what you do.

Tomas Gogar:

Okay. I'm Tomas Gogar. I'm currently CEO and one of the founders of Rossum. Computer scientist, AI scientist. Originally, we started the company during I met the, a co-founders in the same AI PhD lab. So really, from the AI space before it was cool. And also we focus on automating the paperwork for large enterprises.

Andreas Welsch:

I've, done quite a bit of work in that space before, and I'm always surprised how much paper there is in a business. I'm excited to see that you're solving a real problem and a real need, and I'm also excited to have you on, the show today. And by the way, thanks to the team at Rossum for sponsoring today's episode. Why don't we play a little game to kick things off? What do you say?

Tomas Gogar:

Let's do it.

Andreas Welsch:

Okay, perfect. So this one is called In Your Own Words, and when I press the buzzer, the wheels will start spinning when they stop. I would love to hear your answer. You have about 60 seconds for it. And let me know what's the first thing that comes to mind. So are you ready for, what's the buzz? Am. That's it. Okay, perfect. Here we go. If AI were a vehicle, what would it be? 60 seconds on the clock go.

Tomas Gogar:

I think it would be, definitely electric car. Exciting, huge opportunity, but still in major in certain ways. Yeah. So yeah and, basically I think it's very accurate. The, all manufacturers focusing on that, knowing that this is the direction. And big race. How to make it. How to make it real and impactful.

Andreas Welsch:

I love that. And in indeed it seems like it's this big race. If you follow the news and the headlines it's crazy. I don't know how people are, holding on or if, it even still makes sense to hold on or if you just accept that there's so much stuff flying by. But anyways. Great answer. But that brings us to the topic of our show to today, making AI agents enterprise ready and, building them in ways that are enterprise great. This term of enterprise readiness can mean many different things to, different people in, if you follow the, headlines in, the news, it seems like AI agents obviously are the big thing of 2025. Everybody's doing it, everybody's been doing it for a long time, whether it's vendors or it's companies and practitioners. I think when you peel it back, it's a lot more nuanced. And just because you can run an agent on, your own machine or deploy it somewhere in, in the cloud and, tinker around with, it doesn't necessarily mean it's enterprise greater, is it ready for production? But I know you guys at raso have been building AI agents and have just come out with the first one. So I'm curious as a software leader and as a founder, what are you seeing there? What makes agents enterprise great? What do you need to get right if you're building them?

Tomas Gogar:

It's a, great question. Look, what we are seeing around is a tremendous hype, and I believe that the hype comes from the fact that you can build demos extremely quickly, right? Like really good looking demos can, you can take an agent, the framework and build an agent with. Having an LLM deciding and planning and do a lot of things and it looks extremely cool. However, the challenge is that it, looks cool in a demo, but what we are seeing is usually when you, try to deploy it in the real life, production scenarios, there, there is a lot of situations where it doesn't work well. And the challenge is that usually with the current state of LLMs, if you let the LLM drive the workflow of that agent, it's unpredictable, right? Because of the nature of how LLMs are designed, how they work, they are non determined mistake and, they can hallucinate. So the challenge is that it looks cool. Usually works well in 70% of cases and it can have disastrous results in the 30%. And the challenge is that usually you don't know which 30 person will go wrong. So usually it's great when, so it, what we are seeing around on LinkedIn and on social media are those like. Pre recording demos, because somebody just recorded the scenario where it went well or cherry picked situations where it actually works. But for enterprise readiness, it needs to be reliable. In the job in which you want to use the agent. And that reliability comes from couple of properties. And I believe that there are a couple of, for example, the agents need to understand the limits of their knowledge. If you are an employee advert and you don't know, you are not certain, you have a way how to escalate to your boss how to ask your colleagues. But in order to escalate, you need to understand limits of your knowledge. And that's where LMS are tremendously bad, because. They are like those students that always try to answer. I think understanding the limits of the knowledge is one of the things then obviously you need to have certain types of guardrails, for that agent to execute. And we know those guardrails from. Standard corporate world, like I'm a CEO of a company and I cannot go and, increase my salary tomorrow, like on my own. I don't have, I don't have access to that system and I probably will need some approvals from, let's say, my board members, right? So, there is governance. I don't have, I'm a CEO and co-founder of the company and I cannot access the data of our customers for very good reasons. So. there are limits to us as humans and humans, our agents as well. And, we need to build a system that that, AI agents have those, limits and guardrails around them as well. So that, that could be the second one. And the third one is really just having the expertise of the task that you are doing. And the expertise needs to be factual as well as procedural. How it is done in my company employees know this, so the AI agents need to copy the human employees, I believe with all their skills before they are really enterprise grade.

Andreas Welsch:

I have a question there because I'm, really curious. For, a long time we said, let's not compare AI and humans, let's not and, prop our size now with agents. I also go into this mode a lot of times and say people in a business are expected to do X, Y, Z, and like I said, they're. They're expected to know what they know and figure out what are the things they don't know and where they need help. They need to read the code of conduct. They need to know things like IFRS and other accounting standards. Agents will eventually need to have some kind of knowledge or similar knowledge too. So comparing humans and AI are we blurring the lines and is that helpful or is that harmful in, some ways?

Tomas Gogar:

I, think that building this one-to-one relationship and like logging them, I think that to really deploy them on scale, at scale, for the corporate structures, it's actually helpful if we model an AI agent as a replacement of a human. Okay. For example, but there might be different setups. So imagine for me if I have an AI agent that needs to be supervised because of, let's say, a lack of the confidence skills, and, I can still use an AI agent as a personal assistant to every employee. That's fine. So we can suddenly say, Hey, every employee has, an AI assistant. Good, but whenever we will try to have those AI agents act autonomously, because that's what I believe is the holy grail. The best way how to plug them to corporate structure is actually to have that as a, virtual employee. That allows you, you will not change your corporate structure just because of AI agents. You need to plug them somewhere, let's say, in certain processes. And the best way to do that is to do that in this way, like as a, replacement or widening your team.

Andreas Welsch:

Yes. With that, that makes a lot of sense. That, that resonates. So we, we, have these familiar structures of, how teams and organizations and companies are, set up and it, helps us find, use similar concepts now with. These pieces of software that can do more than just follow basic rules. Now there was something else that you mentioned around we need to teach them or tell them what are the, limits of, the knowledge in putting guardrails. How do you do that? We've, or probably many of us have seen these basic prompts. You're a helpful customer service agent, and you do X, Y, Z and, if you don't know, then you escalate. Is that how you do it? Or how, do you put these guardrails in, place? What does it mean? So

Tomas Gogar:

some, I hear you. Some, guardrails are simply like. I don't wanna say physical because they are actually virtual, but I simply cannot do certain things. I don't have the access rights to this file. I don't have, user rights to do something in my SAP instance, right? So very similar guardrails. To human employees, but that needs to be executed on the, let's say, up layer, the API layer that the agent is using, right? However, what I was talking about before is the, limit of my knowledge comes from the ability to estimate my own confidence in my decision, and that is something that cannot be solved by the prompt. It can be. And, we've proved that many times here at Rosson. But for that, we need to change how LLMs are actually working on the next generation of Gen ai because. The design of LLMs simply doesn't allow you to come up with something like, calibrated confidence course. If I would be, if I try to explain it in technical terms because it's it just tries to generate the answer, but doesn't assess the, it's confidence in the answer. The fact if you try to prompt it, tell me how confident you are it will not work. It's not what I'm talking about.

Andreas Welsch:

Awesome. Thank you for sharing that. I think that, that makes it a lot more tangible in seeing that it does come back to basic principles of security of architecture and design. And you simply don't have access to this data. I. Duh. Sounds obvious, but when you talk about you need to have guardrails in place or you need to tell her what to do and what not to do, I think then it's more nebulous. So, great to see that it can be as, as simple and as tangible as, that. And, by the way I'm looking at the chats where folks joining from Luxembourg, from Dubai. From many other places. So excited to, to have you with us. Also, if you have a question for, Tamas, please put it in the chat and we'll pick it up in a minute or two. That's one of the benefits of having this as a live stream engaging with both the guest and the audience. Now we've talked about. Agents being similar to employees at a minimum on a conceptual level. As we think about these these, new capabilities, what does that actually mean? And, what does it look like on a day-to-day basis from maybe where we are today to where you are seeing things go in the next couple of quarters or the next year or two years. How will people work with these technologies?

Tomas Gogar:

Every company defines it differently via rossum. Define AI agent as a virtual colleague you can delegate work to. Okay. Very simple definition. And you typically delegate the work to a colleague using virtual language or. In more structured processes by sharing standard operating procedures and telling them, Hey, this is how this particular job is being done here. So you delegate the work and you expect them to finish, to do their job ends to end. And when they are not exactly, when they are not confident enough, you expect them to escalate to their manager. I do believe, and we are seeing it, let's say in our deployment, is that you will have corporate structures there, you have a manager. And that manager has let's say some, direct reports, which can be either human or AI agents. And and the way it can work is that the AI agents will do their part of the job. It can be anything in our case at Frost, for example processing some, paperwork tasks. And when they are not confident enough, they, let's say in the first instance, escalate to their human colleague, to their human peer, right? So you can imagine a clerk. That an AI clerk that is doing some, paperwork job and if not confident enough, it escalates to human clerk. And if the human clerk is not enough, then they escalate to, to, to their manager. And if the manager is not confident enough, they can escalate even higher. So I do see them working side by side as, and as an outcome of that, you will have. You'll have smaller teams that can actually handle more work, as a team. But, I do see that in the future, in the close future couple of quarter or a year, it'll be more like a mixed team. That's what we are seeing in our deployments, let's say that the AI is able to handle 60, 70% of the cases. On its own if it can, it escalates to human colleagues. It learns from them as well. And there are cases when even human colleague doesn't know and then they escalate to the manager. So I, I see them working side by side. Yeah.

Andreas Welsch:

Now how, do you see that work across different departments? So maybe you have a sales order that needs to be scanned, or you have a production order, or you have a bill of lading in, in their different departments. How. How does that work? Are agents going to collaborate with each other and maybe even on, on the same platform if they know I have an agent here for sales and I have one for manufacturing and planning. Do you en envision them collaborating and figuring things out?

Tomas Gogar:

Yeah, sure. I think, it's needed because I think the premise is that. The expertise, right? Like I'm an agent that does X and part of my job description might be if something happens, I need to involve some other team members. And I think that the, basic pro or premise should be that I don't know who it'll be. Will it be a human or another agent? I don't know. And, that's the way to, to build it, because that's the way how you can create dually scale it. I think it's very, I think here the analogy to self-driving cars works pretty well. We could build self-driving cars by rebuilding the whole infrastructure, changing our roads, changing our crossroads, but it's huge infrastructural change that would need to happen overnight to really have an impact. Impossible. So instead of that. We use intelligent agents, intelligent cars that can use existing infrastructure and can lift next to the cars driven by humans. And you don't care because we might be driving behind each other on the highway and one of us will not be driving to be self-driving car, but the other one doesn't know. So I think that. This is the way how you can gradually deploy. And the self-driving car analogy works here as well. Because it's first it works on the highway or during parking, and later on it works. In a small Italy town where we driving is extremely difficult. So I think it's needed for gradual deployment to kind copy the corporate structure. And not assume if the job description involves communicating with somebody, being internally or externally. You cannot assume whether it'll be a human or another AI agents pretending and communicating like you.

Andreas Welsch:

I really like that analogy to self-driving cars and the similar conditions that exist, right? And as long as they're safe or maybe even safer than a human driver. That meets the standards that that authorities and, departments of transportation and so on set so similarly, right? If, we know that agents will behave, a according to certain rules or that they don't go crazy or, break out of their, little box, right? That, that analogy works really well. One other thing that that, I'm wondering there is do we need. Like a second address book for agents. Today if I'm in a business and I'm stuck. I know I need somebody in procurement. Maybe I'm new to the organization. It's a larger company. I don't know everybody personally. I can ask around. You can say I. Who, who should I talk to for this procurement question? Maybe they, point me to the right person, or they point me to a generic inbox or submit a ticket or something like that. But eventually I get to a person, next time I have a question, I call this person directly. I send them an email and say, Hey, you were able to help me last time. Can you help me again? Do we need something like that for agents too? Kinda like a, registry, a register, or like an address book to know this agent can work with that one, but don't use this. That sort of thing. There are some protocols in, standards being proposed in by some of the vendors. How do you see that working, that agents even know that there are other agents surround surrounding them?

Tomas Gogar:

I think the only difference to humans is that agents will be better in using APIs. It's unnatural for us. But at the same time, I do believe that they will also need to be able to use the channels that the humans are using. Exactly. Because of this co-existence. There will be a phase where we will co-exist next to each other. We see it in the paperwork jobs. So you need to have, obviously it can it's, good in calling APIs and for the APIs it probably needs an address book. But for non API ish communication and collaboration, it needs to have, user interfaces and address books like humans do.

Andreas Welsch:

Yeah, that makes sense. Yeah. Thanks for sharing. I'm looking at the chat and I see Karina put two messages there. One saying, Hey, agents don't think or understand their tools for specific tasks. And when we know those limitations of these tools, we can develop safeguards and it eliminate them. So validation against trusted databases, methods for detecting. When an LLM is operating outside its training domain and so on how do you approach that at Rossum? I think security was one point that you mentioned, but are there maybe others?

Tomas Gogar:

Yeah, look, guardrails are easy, right? The physical guardrails, you don't have access to this, you have access to that's easy. But we rossum, because we are narrowly focused right on specific type of supply chain paperwork. We've built a different type of LL lab. We call it TLLM, transactional large language model that is trained specifically to process transactional paperwork. Because it's such a narrow domain, we were able to build it in a way that it can actually, assess its own confidence reliably. So our the brain behind Rossum has the ability to assess its confidence and then it's not confident enough it has the humans. So that's a big safeguard, like a big one. We are now, we are now releasing functionality that is specifically designed like a reasoning functionality, which is able to intelligently interpret data on the documents, but again, with ability to assess its confidence. But this is not easy and we are able to do that just because our domain is fairly narrow. It's not known currently in the, research community how to do that in a generic gen AI setup. Our paperwork task is simpler and that's why it allows us to build type of AI that is able to do that. But in traditional lms, we currently, nobody has that. And that's I think that's the big next holy grail to chase. If we really want to live in that dream where we have really, truly autonomous AI agents that are also very generic generalistic.

Andreas Welsch:

No that's, exciting. I love how you described that, right? And, because you're staying focused, you're able to do that on, a very specific task and do it really, well. Now we're getting close to the end of the show and I was wondering if you can summarize the key three takeaways for our audience today to take away from today's episode

Tomas Gogar:

The three main takeaways. At first. Number one, like just be a bit skeptical to the overhyped AI demos that you see out there. Demo is usually, especially in the current age of Jane ai it, can be very, far from the real production usage. If you are deploying this AI agents, choose wisely where we are deploying them either in a setup where you will have a human supervision. Or in a setup where the AI provider can guarantee the sense of confidence, before you really let it run autonomously. And I think the third one, like at the same time let's, for, and that goes to the AI vendors. Let's keep innovating, the fundamentals of ai. Let's not just rely on the current LLMs because there is so much they can't do and will not be able to do. True.

Andreas Welsch:

Awesome. I love that. Maybe before we really wrap up for if we good, I see there's one more question in the chat around whether in this agent AI era, we will still need to have a graphic user interface or. How that will work? Or are we going back to Ms. Does and putting our commands in a command prompt? Where do you see that going?

Tomas Gogar:

I think the answer is both. There will be a lot of real is that there will be a lot of vendors that will be fighting this change and we'll close their APIs and so on. So we will, we'll probably have both API way as well as the user interface way.

Andreas Welsch:

Alright, Thomas, thank you so much for your insight. It was a pleasure having you on and hearing and learning about how you are approaching this and where you're seeing the opportunities with agent ai and also how you make sure that it stays relevant and, responsible as well. So thank you so much.

Tomas Gogar:

Thank you, Andreas.

People on this episode