Build Generative AI Applications With Open Source Tech (Guest: Tobias Zwingmann) Artwork

What’s the BUZZ? — AI in Business

“What’s the BUZZ?” is a live format where leaders in the field of artificial intelligence, generative AI, agentic AI, and automation share their insights and experiences on how they have successfully turned technology hype into business outcomes.

Each episode features a different guest who shares their journey in implementing AI and automation in business. From overcoming challenges to seeing real results, our guests provide valuable insights and practical advice for those looking to leverage the power of AI, generative AI, agentic AI, and process automation.

Since 2021, AI leaders have shared their perspectives on AI strategy, leadership, culture, product mindset, collaboration, ethics, sustainability, technology, privacy, and security.

Whether you're just starting out or looking to take your efforts to the next level, “What’s the BUZZ?” is the perfect resource for staying up-to-date on the latest trends and best practices in the world of AI and automation in business.

**********
“What’s the BUZZ?” is hosted and produced by Andreas Welsch, top 10 AI advisor, thought leader, speaker, and author of the “AI Leadership Handbook”. He is the Founder & Chief AI Strategist at Intelligence Briefing, a boutique AI advisory firm.

All Episodes

What’s the BUZZ? — AI in Business

Build Generative AI Applications With Open Source Tech (Guest: Tobias Zwingmann)

November 12, 2023 • Andreas Welsch • Season 2 • Episode 20

In this episode, Tobias Zwingmann (AI Advisor & Author) and Andreas Welsch discuss building Generative AI applications with Open Source technology. Tobias shares his examples on weighing proprietary against open source technology for large language models and provides valuable advice for listeners looking to decide whether to use commercially available LLMs or open models such as Llama 2.

Key topics:
- What types of use cases does Generative AI enable?
- When should you use off the shelf vs open source LLMs?
- What components do you need to set up, train, and use Open Source LLMs?
- What are the downsides and risks of using Open Source components?

Listen to the full episode to hear how you can:
- Understand the four categories of Generative AI use cases
- Evaluate when off the shelf is more advantageous vs open source
- Consider total cost to operate an LLM
- Determine the risk of skill shortages for cutting edge technologies

Watch this episode on YouTube:
https://youtu.be/g9TNndkfdU0

Questions or suggestions? Send me a Text Message.

Support the show

***********
Disclaimer: Views are the participants’ own and do not represent those of any participant’s past, present, or future employers. Participation in this event is independent of any potential business relationship (past, present, or future) between the participants or between their employers.

Level up your AI Leadership game with the AI Leadership Handbook:
https://www.aileadershiphandbook.com

More details:
https://www.intelligence-briefing.com
All episodes:
https://www.intelligence-briefing.com/podcast
Get a weekly thought-provoking post in your inbox:
https://www.intelligence-briefing.com/newsletter

Andreas Welsch: 0:30

Today we'll talk about how you can build Generative AI applications with open source tech. And who better to talk about it than someone who's actively working on that. Tobias Zwingmann. Hey Tobias, thanks for joining.

Tobias Zwingmann: 0:43

Hey Andreas, thanks for having me. It's great to be here.

Andreas Welsch: 0:46

Hey, why don't you tell our audience a little bit about yourself, who you are and what you do?

Tobias Zwingmann: 0:50

Yeah, sure. I'm Tobias. I'm based in Hanover in Germany, and yeah, I run my own business, Rapyd.Ai, essentially a AI consultancy, helping businesses to grow with AI. I help companies from the beginning of figuring out what AI is and what AI can do for them, up to the implementation side of things and also building projects and implementing them hands-on with different technologies and yeah, I've been doing that for quite a while now. Also I'm actively on LinkedIn and writing. I've written my first book last year with O'Reilly, AI Powered Business Intelligence. That's the book title. It has been written before ChatGPT came out though, but I'm already working on the next one so that I can tell. And I'm also writing a newsletter. It's called AI for Business Growth. And you can subscribe to the newsletter on aiforbi.rocks.

Andreas Welsch: 1:36

That's awesome. And I've been a subscriber for a while and I really enjoy reading all the nuggets and details that you share of how you can go from an idea to an implementation. That was one of the main reasons why I reached out to you and invited you to the show. Thanks for being on. Awesome. Hey for, those of you in the audience, if you're just joining the stream, drop a comment in the chat where you're joining us from. I'm curious how global our audience is again today. Tobias, should we play a little game to kick things off?

Tobias Zwingmann: 2:04

Absolutely.

Andreas Welsch: 2:06

Let's see. This game is called In Your Own Words, and when I hit the buzzer, the wheels will start spinning. And when they stop, you'll see a sentence. I'd like you to answer with the first thing that comes to mind, and why. In your own words. To make it a little more interesting, you'll only have 60 seconds for your answer. Now, are you ready for What's the BUZZ?

Tobias Zwingmann: 2:27

I hope so.

Andreas Welsch: 2:28

Let's go! Perfect. Here we go. If AI were a song, what would it be? 60 seconds on the clock?

Tobias Zwingmann: 2:38

Oh, that's a good one. That's a good one. The first one that comes to mind is Zombie Nation. I don't know why, but that's the soundtrack I associate with AI somehow. A little bit weird, but also very straightforward in a way.

Andreas Welsch: 2:51

I think in a way AI has been deemed dead many, times, right? With all the AI winters and it's always coming back.

Tobias Zwingmann: 3:00

Always coming back.

Andreas Welsch: 3:02

Thank you. That was really on the fly and a great answer. Perfect. I'm taking a look at the chat and I'm seeing folks joined from all over the place, so that's really exciting. From Amsterdam in the Netherlands, to India, to Toronto in Canada, North Carolina in the US, Sweden. Really awesome. Thank you for joining. Why don't we talk a bit about the actual topic of our episode today? Obviously there's a lot of buzz and hype around Generative AI, and it's been almost a year since we've seen ChatGPT come onto the stage. But I'm curious, what new use cases do you see that Generative AI enables, or what certain categories do you see and maybe some examples that you can share from them?

Tobias Zwingmann: 3:47

Personally, I work mostly with B2B companies and with them, especially in the realms of departments like sales, marketing customer support. And in these areas, I see four, I would say like four super categories of use cases that generate an AI either enabled because it's completely new or it enables those because it's just cheaper to do those now than before. And the first sort of category what I call it is summarization and extension. Given you have a large text, it could be a large sales document or it could be a large information from your company. It's very easy to summarize that down and just extract some certain key points or highlight. And on the other hand side, it's also very easily possible to extract things or to expand things. And that's as easy as starting with a very broad or high level marketing idea and then expanding that into an outline, and then expanding that outline into actual content. And then going from a very tiny piece of information to a very long thing. And again this could be anything. It could be a sales transcript. It could be a marketing strategy. It could be anything that we associate normally with knowledge work. So that's the first big category that I see. The second one is search and knowledge retrieval. And this is especially important in customer chatbots or customer support chatbots where I do some projects in. So where you have all that knowledge available of how to deal with certain support issues acquire requests that customers bring in. And you just have to match that piece of information with the question that the customer is asking in the chatbot, the second big use case. And then the third one is digital asset creation. And especially with multimodality now in Generative AI, this is a very important topic. So to generate different types of digital content, like video, images, audio, and obviously also text. So that's especially in sales and marketing a very important topic. And then, finally, coding. And we don't actually see coding so much in sales in terms of we want to code something. But where I do see it a lot is in analytics. So when we analyze data coming from different customer touch points or from different tools, CRM tools, or analytics tools that we use. Before, it was really hard to analyze these things, and now with the help of LLMs and code, it's very easy to translate queries into initial language queries, into SQL queries, or into code, Python code, that finds the top selling products in the database or whatever. So it's very easy to make those bridges. And these are the four super categories that I see where most use cases that I work with fall into.

Andreas Welsch: 6:31

That's awesome. Thanks for sharing. I think it helps thinking along those lines and just seeing what is possible using that new technology. And to your point, not everything is brand new in many things, or in many ways, it also helps us do things better than we've been able to do them before.

Tobias Zwingmann: 6:47

To give you one concrete example of that. We're currently working on a chatbot project which actually has already existed in that company for the past couple of years. But the new thing with Generative AI now is that it's much cheaper to run that chatbot. So before, that chatbot essentially consisted of some kind of AI, but also very like handcrafted rule system in the back end of when to pull what kind of information. And that cost immense sums per year to maintain that chatbot at a good quality level. And now with Generative AI, it's just possible to deliver the same experience at just a fraction of the cost. And we're still talking about GPT-4 access here, the expensive version of the model. But still, it's much cheaper than compared to the old approach where you had all these kind of manual tasks and maintenance there involved. I think that's a great use case. Even though it's not completely new, it just works as well out of the box as previously, but just as a much lower cost.

Andreas Welsch: 7:43

When try to get in touch with my cable provider or some of the retailers when you order something. They're not yet on that latest technology. It gets really frustrating because you sometimes end up in this infinite loop of, is this what you mean? Or here are the three things that I can help you with. And you say, it's actually number four or five that I need. That's not on your list. Agent, right? Connect me to a real person.

Tobias Zwingmann: 8:06

Absolutely.

Andreas Welsch: 8:07

Huge value if that is more natural or is able to understand better what you're looking for. Now look, when we talk about these kinds of use cases or for big areas where Generative AI can help. I think on one hand, we see the likes of commercial providers making large language models available. On the other hand, there's a big debate about going open source for transparency, quality. They're just as good, maybe not quite as expensive. When would you say should somebody use off the shelf Generative AI compared to open source? What are some of the decisions that people should think about and some of the criteria

Tobias Zwingmann: 8:46

For off the shelf solutions, there are two major advantages that I see right now. The first advantage is, it's super fast to access these services. Everything is just one API call away or maybe there's even already a product or software that is catering to that kind of use case, which you already have in mind, because again, those use cases often come back down to the similar topics or similar categories. So it's fast. And the second one is it's cheap, at least at the beginning, because it's very often usage based. And this is very good for building fast prototypes, for validating ideas, for building the first proof of concept. You can probably with Generative AI and off the shelf services, or generally AI services, you can probably go to the 70, maybe sometimes 80 percent of an application really fast within a couple of days. So personally, when I do consulting projects, I follow that 20-20 rule where I say the first prototype shouldn't take longer than 20 days and it shouldn't cost more than$20,000. So this is what you can very easily do with off the shelf services. But at the same time at some point, there's that tipping point where you need to figure out, okay, is that really the right way to go in the long run? And if you validated a concept or if you validated idea, then sometimes it might be a good choice to switch over to open source technology and just building a more customized stack. Yeah, we can talk about open source in a bit. Just one thought for off the shelf services, which I currently see is that especially enterprises tend to use off the shelf services just because everything is so new. In enterprises, you need that kind of like security and privacy in terms of, okay, data security, but also security and safety in terms of maintaining the solution. And it's sometimes very hard to reason in an internal IT department to use this or that GitHub repository, which has just been released two months ago to run any full scale productive application on that. So that's why a lot of companies also fall back to these off the shelf or enterprise services from professional vendors.

Andreas Welsch: 10:52

I think that's a great point, right? So basically, take into account the full calculation, not just look at the cost of inference, fine tuning, if you will, but also to operate it. And do you have the skills? Or can you get access to the skills to build it and to maintain it? I think that's an important consideration that you're pointing out. Yeah. With that thought of, sure, there's open source, there's off the shelf, what would you say in terms of quality, right? I think there are a number of benchmarks, the number of models that are being benchmarked against these. How comparable are these two types of commercial models, proprietary and open source?

Tobias Zwingmann: 11:34

So far, GPT-4 still is the leading large language model out there in the world. There's no other model I wouldn't even say not even any other commercial model, but at least no open source model which can compete with GPT-4 in all aspects. There are certain types of LLMs that provide a similar capability or similar performance on different specific use cases tasks if you want to do a sentiment analysis or classify text into different categories very niche categories. But on that broad and universal scale, GPT-4 still has the highest what we call cognitive abilities and also the highest alignment and being able to follow the instructions that are given in the prompt. In that sense I think they are still ahead. But at the same time we do see that open sources is catching up. They are also moving at an incredible pace. We will see how far it goes. But at least for now, I still see their commercial models and especially GPT-4 there on the edge. Everyone's trying to beat it. And I haven't seen a model that is really able to outperform GPT-4 in all aspects.

Andreas Welsch: 13:03

Thanks for sharing that perspective. Now with off the shelf software and capabilities you can get pretty far. Maybe to some extent you get everything out of the box, or depending on what you're looking to do, take something like GPT-4, sure you can prompt it and you can get some good results or some excellent results depending on what you're looking for. But then on the other side there's also this point around hallucination or giving it more context, so the information is more relevant to what you're trying to create and to generate. What components do you see are needed to set up and train and overall use open source LLMs then on the other hand?

Tobias Zwingmann: 13:45

When we consume an LLM as an API or as an end point, there's much more going on than just the raw inference. There are lots of knobs and buttons around that help to make the model safe, to align that model, also to make sure that we have threading limits, that we are able to scale model up and down depending on the usage. So there are lots of like infrastructural components to it. But also at the same time that takes away some flexibility from us, if we want to build something with that. And especially in terms of GPT-4, that directly translates into cost. So GPT-4 is fine if you use those augmented use cases where you don't have that many requests to the model going there. But as soon as you enter more automated and highly integrated use cases your OpenAI bill can accelerate quickly. And that is normally the point where I say, Hey, let's take a look if there's any open source solution that helps you to tackle the exact problem that you want to do. So I'll give you an example. So there was this use case recently where we had survey data coming in from an event organizer just asking their feedback survey after the event. And we wanted to classify open text feedback. GPT-4 can do that super easily. It was pretty good. We also didn't have any kind of hallucinations, because within the prompt, we provided the categories. And the LLM had only to match the text input with those categories. So that worked pretty well. But if we just repeat that over and over again for all events, the sum quite adds up, especially if you pass that for every survey response. Sometimes it might be even better to use a open source model like Llama 2 and trying to fine-tune it on that particular task of saying, Okay, here we want to do that classification task on the text. And now we already have that kind of training data available because that's what we did with GPT-4 right now. And we can now use that data as a training input to try to get Llama, Llama 2 in this case, on a similar performance level as as GPT-4. And then hopefully, have some advantages in the overall cost calculation. Because also make no mistake, hosting a model like Llama 2 doesn't come like for free. You still have to host it somewhere. You still have to have GPU power and everything associated. So it's not okay, like GPT 4 costs us X and LLAMA2 is free of charge. So you still have to have the whole infrastructure also considered. But at least you can make a definite decision if someone asks you, okay why are we paying so much for that model? Isn't there any other way? We can say, okay, we have looked into that and here's the result. And that's why we're doing it this way. So yeah, and I think this is important and also this is how I see it, how you can transition from off the shelf to more custom customized models by trying to fine-tune and trying to adapt models that cater to a more specific task because what you do with those big LLMs is to take its general purpose capability and throw it as a very like specific problem and in many cases it can do that pretty well. But again, like it's literally throwing a lot of computational power at oftentimes a very small problem and at some point might be better to just take a like smaller or more effective model and just apply those to these more specific and niche use cases.

Andreas Welsch: 16:55

Great point about the cost. Again, it doesn't come for free. It might be cheaper, but maybe there are also other components to take into account. I'm just taking a look at the chat and I see Michael basically confirms what you're saying. Open source software Generative AI provides a viable alternative based on the cost and the political acceptance. I see a question from Willy here. Willy's asking, How do you assess the relevance of European models such as Aleph Alpha Luminas for companies affected by the GDPR, especially considering that Azure services are now running on European servers? In general, are we in danger of becoming dependent on the one big model, GPT-4?

Tobias Zwingmann: 17:34

Oh, yeah, that's a good one. I would love to see that European model being being developed and being hosted here. And I think Aleph Alpha just had another like 500 million series B round recently. Things are going into the right directions. But on the other hand side, if you just look at what scale companies like OpenAI are operating, and especially at what speed they are innovating, I really doubt if we can compete with that. I'll give you an example. Most of those benchmarks, including a lot of European models compare themselves still with GPT-3.5 performance and say, Hey, we can beat GPT-3.5, 5, but the thing is GPT-3.5 is like literally available free of charge within ChatGPT and very low cost and you know in the API so that's not really the benchmark. The benchmark is GPT-4. And whenever you compare that to GPT-4, I really wonder if it's such a clever decision to try to build the next GPT-4 from Europe. Maybe we should invest the fragmented resources that we have in Europe, because there's not a single player, there are multiple players. Maybe we should rather invest that into more specific models that are probably catering specific industries or specific use cases or so. Honestly, I don't believe that the next big foundational model that outperforms everything else will come from Europe. There might be still great use cases, but as Willy says here everything that revolves around GDPR, in security, inside of privacy. These things will be just solved just as we did that with cloud computing a decade ago. Everyone then thought, oh, we can't just give our data to one of the big three cloud providers, right? And here we are. I think this is a very similar development happening here.

Andreas Welsch: 19:14

Thanks for taking that one. I see there's another one from Josue here in the chat as well. And maybe if I can rephrase that a bit. With GPT-4, there's already a relatively cost-effective model available with high performance that comes out at the top of all these benchmarks. But what are our considerations for using open source? Is it really about the confidential data that you really don't want to send to those providers even if they don't use them for training? Or what are some other considerations that you see there to go open source?

Tobias Zwingmann: 19:46

Yeah, maybe to answer this question, maybe we have to zoom out a little bit because so far we just talked about open source LLMs. But in fact, I think when we look at the open source ecosystem, there is more than just the open source LLMs. So I see three different layers. So we have the LLM layer where we have commercial models like GPT-4 and others, and we have open source models like Llama 2 and Flan and so on. But then there's also this middle or tooling layer where we have tools like LangChain, like LlamaIndex, where we have things like Hugging Face for fine-tuning models. And I think this is where most things are happening right now. And then we have a third layer, which is even abstracting even more. Where you have not only the tool stack, but also the whole front-end components stacked on top of that. These are tools like Vercel AI, for example, Microsoft also putting out some repositories that help you Create a ChatGPT-like interface or setting up the RAG architecture and things like that. And I think this whole ecosystem is what makes these kind of use cases thrive at the moment, especially if you want to have more customizability. Yesterday was that OpenAI Demo Day, and they introduced the capability of building these custom GPTs where you essentially get like the RAG architecture with knowledge retrieval out of the box using these new custom GPT models. And you know one could say why do we need this whole RAG ecosystem if you can just get it out of the box from OpenAI? The answer to that is because there are lots of use cases which need a much higher customization. For example, one customer I currently work with, they have the challenge of handing permission levels of what kind of user in the chat within an enterprise can access what kind of information from the knowledge graph behind that. And ideally, they want to inherit that from, for example, the Azure directory or from their SharePoint. That if a user asks a question to the chatbot, they don't get like the full like access to all the knowledge in the company, but only to those aspects where their current user account has access to. And if you want to build something like this, then you're very quickly in those more customized developments. And in these cases, really tools like LangChain and things like that are really cool to build these kind of functionalities and features really quickly. And this is where I do see the most potential development in this whole LLM ecosystem to happen, at least in the short term. Because this is really democratizing LLM development.

Andreas Welsch: 22:07

I think we've covered a lot of ground in this episode so far. I was wondering if you can summarize the top three takeaways for the audience today from our discussion.

Tobias Zwingmann: 22:18

At the beginning, I would not overthink the whole topic with should I use off the shelf or open source? Should I do it myself? I think go with whatever looks or is more natural for you. If you're the tech guy, if you're a developer and you totally feel like you can build that by yourself, just like implementing the next LangChain. Okay. Just go with that build something for you, but I think in most cases, especially enterprise cases, you're probably better off using an off the shelf service at the beginning in order to get started and figure out not a, if the tech works, but B, if it's actually valuable what you're building. I would always focus on this. And the second takeaway from today is that it's not only as we said about inference costs when you make the whole cost. But also what other kind of hidden costs are associated with that? This is not only running the model and the infrastructure cost. But let's say like you're a medium sized business. Maybe you have a small engineering team. Maybe you have one guy, you know, knowing how to operate or to build stuff in LangChain. You know helping you to build those internal chatbots that you're currently building. What happens if that guy is leaving tomorrow or the day after tomorrow, right? Who can fill that gap? And if you still have resources to fill that gap really quickly, then that's fine. If not, then just be aware that this is all really cutting edge technology and it's not there's a huge labor market out there like JavaScript people waiting to build something for you with that. It's really at the edge and it takes a lot of time often to find people who have done that before and also build productive applications and not only I'll build a small Jupyter notebook prototype.

Andreas Welsch: 23:51

We're getting close to the end of the show today. Thank you so much for joining us and for sharing your expertise with us. I think it was a very insightful discussion and conversation ranging from off the shelf to open source technologies, when to use what, are some of the upsides, some of the downsides of doing that. And I really appreciate you spending your time with us.

Tobias Zwingmann: 24:12

Thank you so much. It was my pleasure. Thanks for hosting me.

Andreas Welsch

Host