Automate Your Document Processing With LLMs (Guest: Petr Baudis) Artwork

What’s the BUZZ? — AI in Business

“What’s the BUZZ?” is a live format where leaders in the field of artificial intelligence, generative AI, agentic AI, and automation share their insights and experiences on how they have successfully turned technology hype into business outcomes.

Each episode features a different guest who shares their journey in implementing AI and automation in business. From overcoming challenges to seeing real results, our guests provide valuable insights and practical advice for those looking to leverage the power of AI, generative AI, agentic AI, and process automation.

Since 2021, AI leaders have shared their perspectives on AI strategy, leadership, culture, product mindset, collaboration, ethics, sustainability, technology, privacy, and security.

Whether you're just starting out or looking to take your efforts to the next level, “What’s the BUZZ?” is the perfect resource for staying up-to-date on the latest trends and best practices in the world of AI and automation in business.

**********
“What’s the BUZZ?” is hosted and produced by Andreas Welsch, top 10 AI advisor, thought leader, speaker, and author of the “AI Leadership Handbook”. He is the Founder & Chief AI Strategist at Intelligence Briefing, a boutique AI advisory firm.

All Episodes

What’s the BUZZ? — AI in Business

Automate Your Document Processing With LLMs (Guest: Petr Baudis)

December 01, 2024 • Andreas Welsch • Season 3 • Episode 28

Is your business ready to harness the full potential of AI? Learn how to move beyond the hype and use AI tools like large language models (LLMs) to drive real outcomes in document processing, enterprise integration, and more.

In this episode of What’s the BUZZ?—AI in Business, host Andreas Welsch is joined by guest Petr Baudis, co-founder and CTO of Rossum. Together, they explore how AI leaders can achieve measurable results with automation and large language models. Sponsored by Rossum, this episode takes a deep dive into the strategies and tools that transform business processes and improve ROI.

Discover:

•How reliability in AI-powered document processing prevents errors and saves time.

•Why adaptability matters, and how intelligent systems can align with your unique workflows.

•The benefits of seamless integration in large organizations with complex IT ecosystems.

The conversation also takes a forward-looking view of AI’s role in business, discussing the rise of Artificial General Intelligence (AGI) and its implications for enterprise IT. Petr shares insights on why preparing for AGI today is essential for future-proofing your organization.

Don’t just follow the AI buzz—turn it into actionable results. Tune in to hear Petr Baudis share how Rossum is shaping the future of document automation and how your business can stay ahead with AI. Listen today!

Questions or suggestions? Send me a Text Message.

Support the show

***********
Disclaimer: Views are the participants’ own and do not represent those of any participant’s past, present, or future employers. Participation in this event is independent of any potential business relationship (past, present, or future) between the participants or between their employers.

Level up your AI Leadership game with the AI Leadership Handbook:
https://www.aileadershiphandbook.com

More details:
https://www.intelligence-briefing.com
All episodes:
https://www.intelligence-briefing.com/podcast
Get a weekly thought-provoking post in your inbox:
https://www.intelligence-briefing.com/newsletter

Andreas Welsch: 0:30

Today we'll talk about automating your document processing with LLMs. And who better to talk about it than somebody who's actually working on that? Petr Baudis hey Peter, thank you so much for joining.

Petr Baudis: 0:41

Thank you for having me, Andreas.

Andreas Welsch: 0:43

Great. Hey, why don't you tell our audience a little bit about yourself, who you are and what you do?

Petr Baudis: 0:47

Absolutely. So yeah, my name is Petr Baudis. I'm a co-founder and CTO of Rossum. I'm a programmer by heart, I would say. So, yeah, I've started programing when I was like 10. I contributed to many many open source projects. I worked on many AI projects as well as early as 2005 ish. I, for example, collaborated on, in the area of Computer Go on some programs, which then DeepMind, when building AlphaGo, used as their main reference. But for the past eight years, I'm building ROSUM. So the R&D department and the product strategy belongs on my side. And at ROSUM, what we do is we have this vision that when you as a company are processing incoming documents, it shouldn't be open spaces full of people sitting there eight hours a day, five days to be just typing over some pieces of information from from pieces of paper to some computer forms. This should be automated. Our vision is that one person processes 1 million incoming document based transactions per year in such a company. Right now it's tens of thousands, maybe low tens of thousands typically. So it's a pretty ambitious vision. And we use advanced AI. We are deep tech company and we are focusing on managing this process end to end, so not just capturing data from documents, but everything from receiving the document to the company through things like approval flows, GL coding, etc. All the way to getting the data in great quality to your SAP, Oracle, or whatever other system there is. Even automatically rejecting the document and sending an email back to the sender if something is wrong with it.

Andreas Welsch: 2:35

Awesome. That sounds really exciting. And I remember, I've seen you already a couple of years ago when you were just starting out and it was super impressive to see what you guys are building. So also thank you for sponsoring today's episode and bringing this information to many more people. I'm excited to have you on the show, like you said. Should we play a little game to kick things off?

Petr Baudis: 2:58

All right.

Andreas Welsch: 2:59

Okay, so let's see. This one is called In Your Own Words. And when I hit the buzzer, the wheels will start spinning. When they stop, you'll see a sentence and I'd like you to answer with the first thing that comes to mind and why. In your own words. To make it a little more interesting, you'll only have 60 seconds for your answer. Are you ready for What's the BUZZ?

Petr Baudis: 3:20

Let's go.

Andreas Welsch: 3:21

Okay, so here we go. And if you follow us live, please put the answer in the chat, too. If AI were a movie, what would it be? 60 seconds on the clock.

Petr Baudis: 3:32

Alright, a movie. Honestly, the first movie that came to my mind is Terminator, but that's so cliche and I don't really believe in it. But I think Shrek is a great analogy actually. Because the first scene that comes to mind from Shrek is is the scene where one of the characters keeps asking, Are we there yet? Are we there yet? And the other one says: No, stop asking, please. But on the other hand, they still keep making progress to their destination. So I think AI is a lot like this. It's been overhyped a lot. And now I think many people in the audience have to be asking, are we there yet, finally? And I'm going to say no, not quite yet. There is still a lot of hype in the field, but there's also a lot of amazing progress and it just needs a bit of a discerning person to just pick up what are the right pieces there.

Andreas Welsch: 4:21

Wonderful. Thank you. See, I was going to ask, are we there yet? But, I'm excited for our conversation and seeing where that leads us. Especially in the realm of processing your documents with LLMs. So thank you for the awesome answer. Now, I've been seeing so much about digital transformation and digitization for pretty much the last 10 years. And still it baffles me that even the largest organizations rely on paper based processes. I was working with a number of Fortune 500s a couple of years ago, and it just blew my mind. I thought everything is automated, everything is digital, everything doesn't even come in a PDF or in some kind of a scanned document. But purchase orders, invoices, bills of lading, you name it. And now I feel that the good thing is that there's been even more automation over the years with machine learning, for example, and other techniques and OCR. But now it comes down to the old question, right? How much does this actually cost me if I want to extract information from my documents? And how much can I save? So I remember doing some calculations many years ago. Time spent per document multiplied by the number of documents per year. This is what your cost is. This is what your spend is. If you do this with and with the software solution, but it's rarely that simple, right? So I'm curious, what are you seeing? If you're a leader, you're thinking about how can I automate my document processing? Where should you start? What's the calculation you should be doing?

Petr Baudis: 5:46

I think it's most important to think about what do you actually want to achieve and try to phrase that in a bit of more concrete terms. So from my perspective my advice would be to start with your use case. And as the machine learning practitioners, they always think about the input and output, right? So yeah, I want to get to put some input in and this is the kind of output I expect. And pretty often I actually hear people that don't have a clear idea who they want to get out from the system, right? We just want to get some extra insights, et cetera. And that might be fine for some cases, but if you are really optimizing some business process, you need to think, okay, what's my input? What's the type of the data that I'm processing? Is it gonna be invoices or is it going to be, for example, construction charts? You will want a very different system with pretty different cost and ROI equations, also depending on what the use case is. Another thing is, what's the volume of my data? You might have hundreds of documents per month, super complex, long documents with complex tables, maybe logistics or material analysing, or use cases like this. Or you might be talking about millions of purchase orders or invoices a month. And again, then the cost equation will be different. You will be in the first case, maybe you have just one or two people processing this. In the other case, you probably have a big team of people processing it. But maybe they are all in the same place, or maybe each of them is sitting in a different office, working in a different region, maybe entering the information to a different downstream system. That's one part of it, is this use case, what's the type of data, the volume of data, what do you want to get out? Do you want to get an SAP business record out or something else maybe? Then, I think it's good to think a little wider. And think, okay, what's your actual business objective? And by that, I mean that okay, you want to capture some data from some documents maybe, or intelligence, the process documents in general, but probably there is an overall business process that you want to automate. And then it's worth starting from the big picture, from this whole business process. Is it an order to cache or something like that? Then you want to think about, okay, what's the biggest piece of it I actually can automate? Because maybe you can automate way more than just the document data capture. The more you automate, the longer part of the process that you automate, usually that means that you are gonna use some pretty integrated platform to run this process. The more integration there is, the better. The tighter product you use that covers a bigger piece of the process, the more benefits you usually get. Because you don't need to switch between systems, you need to think less about integrations, and also if those products, if they advertise automation, the more signals, the more feedback from user usage they get, the higher automation they can achieve. And then the last piece is, do you want everything at once, or are you willing to think about some sort of roadmap? Maybe at the beginning, I don't need full automation. I just want to save 70 percent of my time or time of my team, but I'm still okay with people checking those documents at the beginning. Maybe at the beginning, I don't I don't need a fully fine tuned integration. I am okay with some extra manual steps during the integration and until I work out all the details, right? If you are willing to think in a bit of an agile way like this, you probably can get a better ROI and also de risk your project, because already you can start getting some benefits early and meanwhile, of course, you still should aim high from my perspective.

Andreas Welsch: 10:01

No that's a great way to, to frame it, right? Thinking in, first of all, business terms, thinking manageable chunks of what you want to go after. And maybe if I can pick up the question from Crispin here in the chat. And I think it's great with your technical background, right? I remember when I worked with large organizations, again, six, seven years ago, we would ask them for tens of thousands of documents and they had to be annotated and where's the bounding box and what is this field? And lots of additional data or ground truth to test how does our model actually work. And it was a big leap to go from text to tables and to PDFs and tables in PDFs. So, Crispin here is asking what happens when pre processing fails on a PDF or on a PowerPoint, or when it's complex with tables, images, or potentially videos, how do you handle that, when it comes to document processing and working with AI on that unstructured data?

Petr Baudis: 10:57

Sure. First, you want to design your process and use appropriate tooling so that you don't lose those documents altogether, right? At the worst case, you want to do the same thing as you would do if you were just doing this manually. And maybe, yeah, sometimes you just get a PDF that you just cannot open on your computer at all, and then what you do? You just write back to the sender this isn't working for me, you need to resend it, I cannot process it, something like that, right? And then you just figure it out. Actually, if you are thinking about an IDP solution like ROSUM that has this end to end integration this platform can do things like this automatically on your behalf if it cannot open those documents and then you need to think about the quantitative aspect Is it just one document in 10,000? Is it just gonna happen to me once a week that something like this happens? In that case, just an escalation to a human person can be totally fine, right? If it's just a small fraction. What you want is a system or a platform that has this built in and that has a user friendly process for it, so that it generates minimum extra hassle to have a human take a look at it and process it manually if need be. If it's just a small fraction of those documents, then that can be totally fine answer and probably much cheaper than trying to figure out, okay, how do I tweak the process so that I get 100 percent literal automation.

Andreas Welsch: 12:25

I love that answer. Very practical. Now, we've been talking about this for quite a bit before we set up today's episode and topic. And, obviously anybody following the tech field for the last two years, there's no way around LLMs. There's no way around Generative AI. And there's an abundance of these models in the market. And it feels every week there's some kind of enhancement or some new one coming out. Vendors are a outbidding them themselves in the capabilities that they ship, that they look into, that they deliver. How would you say should AI leaders use LLMs and, bring them to the business? How should they make the tangible?

Petr Baudis: 13:08

So first, I think each AI leader should have a lot of personal experience. Just using LLMs day-to-day to get a good feel what are their strong points and what are their weak points because there is plenty of both, right? If you are an AI leader or innovation leader or even just upper management in an enterprise company and you don't use LLMs, at least sometimes even for just personal purposes to help you help you write personal documents or whatever. Doesn't need to be in the compliance of enterprise context, right? But you should try using those LLMs and just see what comes out. You should be willing to experiment with them a little. And I think that's super important to develop this intuition about what's the current state of AI, because in some cases it will really amaze you and blow your mind. In other cases it will make your head ache because it will mislead you with some hallucinations. That's all part of the story, right? But but in order to get value from those LLMs, you need to understand their weaknesses as well as their strengths. So that's one part of this. Then when it comes to actually using LLMs in real world business processes within your company, then one thing that's the most important one to distinguish is distinguish between a fancy demo and some tool that you can use reliably and that can reliably process 98 percent or 99. 9 percent of your documents, right? Of course, if you copy paste a purchase order to ChatGPT and you start talking about it, it will already pick up a lot of the information, etc. But then, if it's a purchase order with three pages of line items, and you start asking about about specific columns, it might not do so well anymore, right? It's about this reliability aspect that needs to be really thought out, and you want to find find a platform that has a really good story around how can they achieve this reliability. Then another thing there is regarding adaptability, because the thing is, unless you are just a small company using totally standardized process around some existing off the shelf accounting software, etc., you probably have some pretty specific process. And I think this is super underrated aspect of LLMs. How do you make them adapt to your concrete process? Of course, you can prompt them in a custom way, etc. But that's actually super laborious. If you have a bit of an elaborate process, you are maybe a little a little bigger company, and you actually want a system that just watches over your shoulder as you execute the process yourself and learns from it on the fly. Again, an example of what we do at ROSSUM is we have this instant learning system where you just put in a document, you mark something on the document, whatever you need. Of course, if it's just like an invoice, it will already pick up a lot of things on its own. But then, the second document you send in even if you put it in an email, something totally specific with a very high chance it's already picking up this this information from this document just based on what you showed me. You don't want a system that you manually set up where you do some AI engine management and run some trainings manually or which you need to prompt up explicitly. You want something that just adapts on the fly. And those solutions are on the market. You really want that. And ultimately though, there are risks and there are weaknesses. So you want a system that handles those well. You need a system that has a good answer for what do you do in case of hallucinations? How do you defend against prompt injection, etc., etc.. It's pretty easy to find out those risks. We just Google around and again, pick a vendor that has a solid solution. ChatGPT would have the same connector. Probably will not be the answer.

Andreas Welsch: 17:17

Several great points that you made, and they resonate deeply with me. One, I remember again a couple years ago when I was working with large organizations on things like document processing, we asked for sample data, we got hundreds of invoices, or hundreds of PDFs, and it wasn't always one invoice in one document. Sometimes it was two or three or four in one document that a customer's vendor has sent them. And, I think they're splitting the document, finding out where what belongs to one document, what's the next one, what's on page one, what's on page three was a very complex task for a software system, even in machine learning or now LLM based system to figure out. So yes, there are standard off the shelf things that you can use for free or for 20 per user per month. But I think you make an excellent point that as with, standards offering in general, if. There's somebody that has already thought this through for you. There's a big benefit and a big value, to, do that. Which kind of leads me to that next point of enterprise organizations or IT landscapes are heterogeneous and vendors would love it to be wall to wall one company especially the bigger that their footprint is, but the reality is that there's an HR system, there's a customer experience system, there's a finance system, there's an HR system, supply chain system, and so on. And usually they're from many different vendors. Makes sense, right? With the cloud, you had a lot of the individual buying centers going and buying the best of breed solutions there. They are pretty much left with two choices, right? Either you use each individual vendor's built-in OCR or document processing features, and they will probably deliver different results. Or you quickly need to integrate your own solution with all of these individual systems. And they can very well mean a lot more cost, a lot more time that you need to spend than you initially thought. And I'm curious, what are you seeing when it comes to integration? What do some of your customers do, for example?

Petr Baudis: 19:21

Yeah, the story can be even wilder. Because a lot of those biggest enterprises, how they grow is essentially mainly through acquisition. And when they do an acquisition of another company that company's SAP instance can survive for many years within the company. So just like each business unit can have their own Oracle and of course there is plenty of IT departments just like working on integrating it and ideally normalizing everything. But those projects usually looks like so much that if you have a shared service center within a large enterprise, it usually just has to deal with an insane amount of systems. And from our perspective at ROSSUM, we call this the multinational enterprise problem or even the many national enterprise problem because if you look at an average Fortune 500 enterprise, they operate in 70 plus countries and they have their supply chain in different countries than where they have their consumers, right? So you need a system that can deal with a wide variety of regions and file formats that come with those regions, different standards. If you, for example, throw in invoicing into the mix, that's really totally wild because each country does it totally differently, even within within areas like EU, right? And then, of course, plenty of different downstream ERP systems. You want to standardize on a kind of shared piece of infrastructure that can handle all those situations and connect to all those different systems. And the big advantage of standardizing like this is you can manage your IT costs much better. And, for me, that's kind of part of this recent story, which is really important for me. That's pushing back against the status quo of working around a legacy IT. Because it feels to me that the work of the ERP systems is switched into a very different, it's a much more advanced gear, much slower gear in terms of change and adaptation and adopting best practices than the world of innovative pieces of enterprise software like AI based IDP solutions. So at ROSSUM we actually gave this a lot of thought, and very recently we had a big launch around formula fields for admins, which is essentially a way to customize your business transformations without knowing programming and with using LLMs with just text based input. It is a kind of pretty impressive ability, at least our customers say so when we show it to them and when they use it to onboard their cases. So from this perspective, eventually you want also to think about gradual rollout. So when you think about a shared piece of infrastructure for your many national enterprise, it's always best to just pilot it in a single country, then add extra three and then go global so that you really gain this trust because it's not worth the risk to just jump directly into a global rollout.

Andreas Welsch: 22:37

I'm curious, we've talked quite a bit about extracting information from documents, doing that with machine learning and OCR, now LLMs. Certainly, LLMs have been at the forefront of news and everything for quite some time. And a lot of times I see this being about text. And obviously with invoices, sales documents, bills of lading, pretty much any document where we do have a lot of text. What are some of the key benefits you would say that adding LLMs to a document processing workflow or solution has brought in your case? What hasn't been possible before that's possible now?

Petr Baudis: 23:16

From my perspective, when we have our POCs with our clients, etc., what didn't change is we usually were the top solution in terms of the accuracy and automation potential before. We still are but but it just enables us to keep increasing the accuracy without increasing the machine learning model complexity. So one big thing is large language models are a bit of a unifying factor that lets you just put a lot of the complexity to this large language model. So our tLLM, the Transactional Large Language Model that we developed here in ROSSUM, it substitutes about 10 different machine learning models that we have used before. So the platform gets much simpler, and that lets us focus into just increasing the raw accuracy, the automation, the speed of learning of the system, the adaptation process, without getting distracted by all the complexity and making sure that we support OCR in this language and that language, etc.. We can just all solve this in a single place. So I think that's a huge advantage. And that also gives those LLM based systems a lot more versatility than has been traditional for IDP systems before. But I think the other aspect is really, I think LLMs in the next five years, there'll be a huge game changer in how enterprise IT infrastructure gets integrated to it. Because you will empower non programmers a lot more. So the process domain experts, etc., they will be able to do a lot more without relying on developers. And that will speed up the change in the enterprise IT ecosystem tremendously.

Andreas Welsch: 24:57

I love that. And I'm fully with you that there's so much more change coming in many good ways—more than we can even comprehend right now. And just seeing what is already possible today and where things are going, to me, it's super exciting. Because then you know how should you plan your vision. How should you think about these things before they become reality, so you're ready when they do? But also if I look at my Twitter feed, for example, AGI, Artificial General Intelligence, is near. It's almost here, especially if you believe the ones of OpenAI and several other AI labs. What are you seeing? What's your perspective on this? Are we going to have AGI next year?

Petr Baudis: 25:40

Probably not next year, even though it's always difficult to predict, right? The most difficult thing there is defining what AGI is and agreeing with each other on that, because we just keep shifting goalposts. And if we showed someone from a decade ago what current ChatGPT can do, they would totally say, that's totally AGI. We see all the weaknesses still, so we think, okay, we need to go a little further, right? So the goalposts keep shifting a bit. I think it's going to take a little more but I am more in the camp of people who are very optimistic about the speed of AI development overall and I think AGI is more of this decade thing yet and not the next decade but I also think that AGI cannot be overrated too much because at the very beginning it's going to be very resource intensive and it's not that once we get AGI with a snap of the finger, the world will totally change. I think the world is a lot slower to change than people, especially the deep tech people in AI think. And still, even if there were no new big inventions beyond the current GPT-4 and Claude Sonnet, etc., in the next 10 years, it would have plenty of world changing potential. So I think it's gonna still take some time, but I'm pretty optimistic about AGI overall, and from my perspective. The angle then is thinking again in your in your kind of enterprise IT ecosystem story. How do you pick solutions that can leverage this, that where AGI can be plugged, also solutions that can be used by AGI engines from the future, etc.. Because it's not that AGI would obsolete all the internet infrastructure and all the enterprise infrastructure, right? AGI will not obsolete it. Even if you have the whole company built around AGI agents, it will still need SAP to keep their records. And it might still want to use a solution like ROSSUM rather than just doing everything themselves, because it's going to be a lot more robust and it's going to be a lot cheaper. The intelligence will still cost a lot more. The generalized one will cost a lot more than a specialized one.

Andreas Welsch: 27:51

Awesome. Look, I see we're getting close to the end of the show, and I was wondering if you can summarize the key three takeaways for our audience today. Again, thanks for sponsoring today's episode, and I'm super excited to see what you're already doing today with large language models, and what that path is as we increase autonomy and automation. So maybe just the three key takeaways. Before we wrap up.

Petr Baudis: 28:11

Okay, three key takeaways. We ended up the last question on the little Sci-Fi note, but let's go back to the reality. Let's go back to the today, right? I think when you are on the market looking at LLM tools that can help you, I would say look at a tool that has a great reliability story, so that it cannot be thwarted by just low inconsistent performance or prompt injections and hallucinations, etc. Look for a tool that has great adaptability stories so that it really adapts on the fly to your specific requirements without you spending very long onboarding times or huge amount of IT resources on it and look for a tool that that can be integrated easily. It has a good story around, yeah how you can spend minimum possible IT resources to connect it to all your systems that you need. Because you want a tool that's end to end, that doesn't just you don't actually want a tool, you want a platform where the documents live, which connects to all the systems and solves your business process end to end.

Andreas Welsch: 29:20

Wonderful. Thank you so much, Petr. Thank you for joining us today and for sharing your experience with us. I really appreciate it.

Petr Baudis: 29:27

Thank you for having me.

Andreas Welsch: 29:28

And thanks for those of you in the audience for joining us.

People on this episode

Andreas Welsch

Host