What’s the BUZZ? — AI in Business

Supercharge Your Retrieval Augmented Generation (RAG) App With A Knowledge Graph (Guest: Anthony Alcaraz)

March 17, 2024 Andreas Welsch Season 3 Episode 8
What’s the BUZZ? — AI in Business
Supercharge Your Retrieval Augmented Generation (RAG) App With A Knowledge Graph (Guest: Anthony Alcaraz)
What’s the BUZZ? — AI in Business
Become a supporter of the show!
Starting at $3/month
Support
Show Notes Transcript

In this episode, Anthony Alcaraz (Chief Product Officer) and Andreas Welsch discuss supercharging your Retrieval Augmented Generation (RAG) application with a knowledge graph. Anthony shares his experience expanding Generative AI applications and large language models and provides valuable tips for listeners looking to get even more relevant results from their AI applications.

Key topics:
- Identify shortcomings of Retrieval Augmented Generation (RAG)
- Describe a knowledge graph and its purpose
- Learn how to build a knowledge graph
- Clarify when to use a knowledge graph

Listen to the full episode to hear how you can:
- Provide context between data sources with a knowledge graph
- Gather and prepare qualitative data to build your knowledge graph and AI model on
- Treat knowledge graphs as one concept among others in your AI strategy
- Start with the business problem and identify the best data sources and methods for solving it

Watch this episode on YouTube:
https://youtu.be/OSj08YBdQrg

Support the Show.

***********
Disclaimer: Views are the participants’ own and do not represent those of any participant’s past, present, or future employers. Participation in this event is independent of any potential business relationship (past, present, or future) between the participants or between their employers.


More details:
https://www.intelligence-briefing.com
All episodes:
https://www.intelligence-briefing.com/podcast
Get a weekly thought-provoking post in your inbox:
https://www.intelligence-briefing.com/newsletter

Andreas Welsch:

Today we'll talk about how you can supercharge your RAG app with a knowledge graph. And who better to talk than someone who's doing just that, Antony Alcaraz. Antony, thank you so much for joining.

Anthony Alcaraz:

Thank you. Thank you for having me.

Andreas Welsch:

Wonderful. Hey, why don't you tell our audience a little bit about yourself, who you are and what you do?

Anthony Alcaraz:

Yes. Currently I am a Chief AI Officer. What we do is that we automate CV screening and job matching for human resources processes. So we are a startup we are raising funds right now. And I work yes, I work with an engineering team to leverage AI for human resources. I have a comewhat surprising career because I started as a CFO, so Chief Financial Officer, and I fell in love with data. And since day one, I coded my first Python program. I couldn't stop. Yes. That's why I'm here.

Andreas Welsch:

Excellent. Thank you so much for being on the show. I'm really looking forward to our conversation and what you'll share. What do you say? Play a little game to kick things off?

Anthony Alcaraz:

Why not?

Andreas Welsch:

Okay, perfect. All right. This game is called In Your Own Words, and when I hit the buzzer, the wheels will start spinning. When they stop, I'd like for you to answer what's the first thing that comes to mind and why in your own words. Are you ready for What's the BUZZ?

Anthony Alcaraz:

Let's go.

Andreas Welsch:

Alright. Awesome. So here we go. If AI were a movie, what would it be? 60 seconds on the clock. Go!

Anthony Alcaraz:

I really like, because I am a Tarantino guy. I don't know why, but I thought about Pulp Fiction. Okay. It's creative. It's creative. You never know what will happen in the scene. So it's generative. It's it's make me, it make you think. It's funny, it's, yeah, it's fun. It's fun and I hope it's, it's useful. I don't know if Pulp Fiction was useful, but yes, I don't know why I think about this topic.

Andreas Welsch:

And I think it can get pretty pretty messy sometimes as well. Awesome. You did good. I love the answer. Fantastic. Yeah. Now, look if we, talk about more of the subject of our episode today, and I think some of the AI can get pretty, pretty messy too, especially if you look into some of the technology and there's so many terms and so much terminology being thrown around these days. And if you have one of the ones that we've seen in last couple of months emerge is RAG, Retrieval Augmented Generation to get better results, get more tailored results. But then I also see that has its shortcomings, right? It's supposed to help you with hallucinations and generic output, make that more specific, reduce hallucinations. But where do you see RAG fall short, for example?

Anthony Alcaraz:

Yes so RAG, what it is. So we have currently models, okay, that are large language models, that can generate an answer from examples. RAG it like this: you bring the models the right information to generate an answer. To make it simple, and to bring this information so you can bring it in many ways. Where it falls short right now, it's that in some systems, for example, with vector similarity search, vector database, the information that you give to the model can get messy. You can get maybe misleading information. It's very hard to bring the right information at the right moment to the model. So what you can get is it's a model that can be confused about what you give it. So one of the stakes is to control the food that you will serve to the model. And this talk will be about that. Maybe giving the best food that varies within your business data.

Andreas Welsch:

We talked a lot about data quality, fresh development data. Data is proprietary. But then if you look at think graph that, it's the main topic of episode today, maybe for folks in the audience that are not that familiar with the topic yet, what does it help you with when you have all other sources?

Anthony Alcaraz:

Yes, I have a definition, okay? A knowledge graph is an interlinked set of facts that describe describe real world entities and their relationship in a machine readable format. What it does is that it provides organizing principles over your data to add structure and context to support your knowledge discovery and reasoning. To make it simple, it's based on graph structure with nodes that represent entities, for example, a chicken, and are connected with a relationship edge. A chicken is part of a farm. This is a triplet in a knowledge graph. Actually you can have billions of such relationships and all these relationships are connected in some way. So you can apply a mathematical and a graph algorithm within this structure. And what we found recently, is that applying this structure of data is good for the accuracy, transparency, and explainability of retrieval augmented generation system.

Andreas Welsch:

Now speaking of knowledge graph, I'm reminded of things like social networks. And I think many years ago, there was this statement over what two or three hops, you're connected with the entire world or something like that, right? That kind of reminds me of that:entities, relationships and then the distance, closeness or distance between them, right?

Anthony Alcaraz:

Pretty interesting because currently we are on LinkedIn. Behind the hood within LinkedIn, you have a huge knowledge graph. Most social media are built on the knowledge graph database, on which, AI are trained on.

Andreas Welsch:

Obviously social media is one of the examples, but what are maybe some other examples of using knowledge graphs in a business?

Anthony Alcaraz:

Actually, this kind of relationship between your data can be found in every fields. Recently, this day, I had a call with a friend that work on biology data, okay? Where they are building a knowledge graph. For example, in my company, so we used a knowledge graph for human resources with the skills job occupation, et cetera, et cetera. Every data that there is in the world you have ontology, so relationship rules. Because behind your businesses there are rules, there are relationship between your clients, between your suppliers, between etc Actually, every data can be translated in Knowledge Graph. That's the beauty of it.

Andreas Welsch:

Building on the understanding of what is a knowledge graph, what can it help with to some extent, how do you build it, do you need a lot amount of data? Can you do it with a little bit of data? Do you need to be deep expert in data science? Do you need to have a PhD or can anybody do it? Even how do you build a knowledge graph?

Anthony Alcaraz:

You have a speciality, okay, Knowledge Graph Engineers. You have languages to build knowledge graphs. They are called, for example, for Neo4j, would be Cypher Queries. You have Spark SQL queries. So you have a coding language, okay? Before LLM, it was technical. You, you needed to know the programming language to build it. You needed the people skills. Now it's still the case. You need people that are that know what we are doing. But, something interesting is that LLMs can help build up knowledge graph and or do you build it? Actually you start from your data. 80% of your data is unstructured. It comes from your emails, from your reports, it comes from your all the stuff that you have written, that you use in your everyday business. This can be transformed into a knowledge graph. It can be also your relational database, numbers that you have in your database. Okay, this can be also translated into a knowledge graph. And you can use right now you can empower your knowledge graph engineer with the capabilities of LLM. It helps speed up the process and reduce cost. Because from text you can use LLMs with many techniques to extract the necessary information to build your knowledge graph.

Andreas Welsch:

So how does it work together with RAG then? For example, if you upload documents or if you have data in your own data structure already, how does a concept like a knowledge graph enhance even getting better results with retrieval augmented generation?

Anthony Alcaraz:

Start, for start, I mentioned the stakes of controlling the information that you fed to your Artificial Intelligence. With Knowledge Graph, you have a huge control of what is feeding the LLMs. For example, you can apply mathematical operation on your knowledge graph. I will give you an example. You can for example, look for a node for a chicken, okay? This species of chicken is part of how many farms, okay? This is a mathematical operation. You can do that to feed your LLM. And the LLM will have this kind of information to answer your customer that eat chickens. This is one example. The second example, and the most interesting one, is that you can train other artificial intelligence models with data that are present in your knowledge graph. For example, you can train the models that look at the most important nodes in your knowledge graph. For example, in social media there is a phenomenon where a few accounts get most of the likes, most of the comments. It's called power law. You can have graph algorithms that automatically detect this kind of nodes, the important influencer. Maybe for your marketing campaign. I want to know who I will pay for my marketing and I will look at the most influential nodes. Okay, all that, all those relationships, those mathematical operations can help the LLM think and it will reduce considerably by a huge percentage is amount of accuracy and you can build up a system where the LLM actually reason on your data.

Andreas Welsch:

That's exciting because I think a lot of times when we hear about large language models and some of their shortcomings and deficiencies, right? It's on one hand, it's huge understanding of language or at least being able to predict what's the next word in a sentence, I think is a hotly debated topic. But then if you can feed it more relevant information and context about either your business or your business function that you're trying to solve, then you use the LLM for natural language understanding and generation, but what it actually generates comes from the knowledge graph.

Anthony Alcaraz:

Exactly. I think the most important feature right now for LLM, the real revolution behind LLM, is what we call in context learning. Is that they learn And they can leverage something that they didn't see in their training. And they can reason about this new context to give a personalized, for example, answer, et cetera, et cetera. But to leverage this in context learning, you need to feed it the proper data, the right data, and the format of the data will be important. I have another point I want to make. Over time, we will have more and more powerful models, okay? For example, yesterday Anthropic released a model that is called Claude 3 and three different models. We will have more and more powerful. The only key differentiator for companies, it will not be the model that they use. It will not be the AI system that we built. It will be their data, okay? And the quality of their data. And I think if you compare each database that exists currently, the most qualitative data that you can have are knowledge graphs. By their nature, their interconnected nature, and their curated nature when it is built correctly. So I think it's a strategic investments, strategic assets to have this kind of data to leverage the future AI model that will appear.

Andreas Welsch:

Think that's an important point, right? Because every company has a good amount of data about their business, about their products, about their customers, how they are doing business with each other. What are the products that they're buying and so on. So putting that in the machine then.

Anthony Alcaraz:

Another point I wanted to make. But we think most of the time of LLM about chatbots. You have a chatbot, you ask question and it answer. I am more interested in automation. In building behind the hood document understanding and document processing, in an automated way. And LLM are just one part of the equation, but an important part. And it needs to be built within a system that leverages proper data. And you can create many values from this. When you automate, when you don't have many errors, etc.

Andreas Welsch:

I think that is really powerful together with the data that you mentioned, right? If 80 percent of our data is stored in documents and so on on one hand, being able to extract it, but then as a next step, put it into a relationship with some other data, maybe some sort of data you have in your database or other instructions and then to reason over it. I'm taking a look at the chat here real quick. And I think there are two interesting questions. Originators of that approach of using knowledge graphs for better semantic understanding by LLMs. And the second one is so Claude 3 shows some advancements in ontological understanding of finance concepts. Do you know if Anthropic used knowledge graphs to augment the LLM to achieve this? I don't know how much has been published I haven't gone through all the news in the last 24 hours yet myself. But maybe you have some insights to

Anthony Alcaraz:

So the originators of the approach, I don't know precisely. There is a paper that is called a roadmap about unifying LLMs and knowledge graphs. It's from February 23, I believe. Something like that. Actually there is many people that are studying the fusion of both approach. Both approach come from different fields because the knowledge graph, it's a field in curriculum and NLP, it's another field. The fusion of both, it's something recent, I think And it comes from this idea of combining neuro and symbolic, okay? Neuro, it's maybe it's too complicated, this notion, but it's neuro symbolic approach. Okay. And for Claude 3, I don't know. But for this question about ontological understanding. It's interesting because what you can do is that you can train your LLM, an open source LLM on a particular ontology. There are paper, for example, OntoGPT. There are techniques to make sure that your LLM understand an ontology. Yes. It's something you can use. Yes.

Andreas Welsch:

Yeah. Questions for Anthony, please put them in the chat in a couple of minutes, but thank you for answering those on the spot. There's so many new papers on a daily basis, new announcements like the ones yesterday that already supersede the one from a couple of days ago. It's mind blowing, right? The pace.

Anthony Alcaraz:

I believe we are living a change some sort of technological jump. Yes, the amount of paper or the amount of technique it's challenging to follow even in a narrow field. So yes, saddle up and yes, we are living I think in information revolution, and accessibility of knowledge revolution many revolution at the same time. I wouldn't be afraid because I am optimistic. I think those technologies will empower people. Better learned, better personalized when they are built right. So I wouldn't be so afraid of what is happening. I think it's for the better. And I am a huge believer in open source.

Andreas Welsch:

There's a lot of advancement and development happening there for sure. And if that's open source definitely gives a good amount of transparency of what's actually happening. Yeah. Maybe one question then is we've talked so much about knowledge graphs, what they are, improve your large language model type applications and so on. But from your perspective, when should people even use a knowledge graph, say in a business? Is that something that you now need to know? Is that the next thing after RAG that everybody now needs to put in? Does it only make sense at a certain amount of data, certain complexity of the problem? When should you even use a knowledge graph to begin with?

Anthony Alcaraz:

It's a tough question. If you have a customer or department that is linked with a finance department etc. You need a relationship in your data. You need to connect the dots. I have a good friend of mine actually Jeremy Ravenel is working on this subject in his company. What you need to think is, it has been demonstrated that it improves the RAG system. Retrieval augmentation generation. It's better with a knowledge graph, okay? But, like I said, investment in your data will help you train AI models, that are performant. It will help you leverage generative AI with no hallucination or better accuracy, et cetera, et cetera. So it is a strategic investment to get proper database that fits your needs. And I think Knowledge Graph can serve any use cases. Could it be recommendation system? Could it be analytics, business intelligence? It's a powerful paradigm in database right now. And it's more easily understand. Relational database, for example, or vector database. Because it's intuitive. It's a way our minds work. We connect concept and relationship between concepts. So yes, but I am biased.

Andreas Welsch:

I'm glad you are. It makes you the perfect show today. Now, when you look at large language models themselves and what they're able to do, how are you becoming a source for creating a knowledge graph? And do you need to have more like an agent type framework? Or to even use a knowledge graph, or is that independent of that? What do you say?

Anthony Alcaraz:

Okay, there are many approaches to leverage because your question is how to leverage the information that is in the knowledge graph with the LLM, okay? Yes, you can have an agent system. Currently, one approach that I have because you have many ways to interact with Knowledge Graph. I won't be technical, but there are currently three main ways. And you can imagine each of those ways as a tool. Some kind of a key. For the LLM to interact with and to leverage those keys. Yes. An agent framework, so an autonomous LLM that think about what he's doing and which key to use can be leveraged. But you can also have a simple system where for example, for customer chatbot what kind of question will be asked. So you can constrain the LLM. On this part of the graph, okay, you can, it's very flexible of what types of information you want to feed the LLM. There are many approaches. It will depend on your problem. You need to think first of your problem and find the solution. In every data science project this is the approach. It works if you start from the tool and you go back to your problem, you will have problems.

Andreas Welsch:

I love that. Thank you for re emphasizing that, right? It all starts with a clear point of view. Yes. Awesome. Hey we're getting close to the end of the show and I was wondering if you can summarize three takeaways for our audience before we wrap up.

Anthony Alcaraz:

Yes, GenAI can create value for your businesses, but you need qualitative data. And I believe knowledge graph is a huge asset to have. Second key takeaway is that GenAI, it's not all the thing, okay? You need other AI tools, AI models, and you also need the qualitative data to train those models. So it's circular, it goes with the what I just told you. The third point would be to plan for this. You need some kind of strategy and you need to start from your problems. You need to understand correctly your businesses needs. And you need to build up from these businesses needs to see what would be the solution in data science way. And whatever the technology that will happen in the future, okay, you will need data and qualitative data. So it should be your priority, a data centric approach. It's not a tool centric approach, it's not a model centric approach, it would be a data centric approach.

Andreas Welsch:

Sounds like that need for good and clean data is not going away anytime soon.

Anthony Alcaraz:

No. No, no, because one funny fact is that a model is trained at point A time, okay? If your business is a business that evolves a lot. No model will understand it without access to your data. You will need, clean data.

Andreas Welsch:

Awesome. Anthony, thank you so much for joining us today and for sharing your experience with us. And for those of you in the audience, thanks.

Anthony Alcaraz:

Thank you. Bye bye.