What’s the BUZZ? — AI in Business

Red-Teaming And Safeguards For LLM Apps (Guest: Steve Wilson)

Andreas Welsch Season 3 Episode 24

In this episode, Steve Wilson (Co-Lead OWASP Top 10 for LLM Apps & Author) and Andreas Welsch discuss red-teaming and safeguards for LLM applications. Steve shares his insights how Generative AI vulnerabilities have evolved from embarrassing to financially risky and provides valuable advice for listeners looking to improve the security of their Generative AI applications.

Key topics:
- One year after OWASP Top 10 for LLM apps, how have LLM security and vulnerabilities evolved?
- How do you build Generative AI safeguards into your app? What’s the impact on cost for checking and regenerating output?
- How can you red-team your LLM apps?
- Which roles or skillsets are best suited to improve security?

Listen to this episode and learn how to:
- Become aware of new ways bad actors can exploit your LLM-based apps and assistants (prompt injection, supply chain attack, data exfiltration, etc.)
- Understand the characteristics of your LLM supply chain and new vulnerabilities
- Consider the implications of using agents on your security

Watch this episode on YouTube:
https://youtu.be/Rpp7u93mDfM

Questions or suggestions? Send me a Text Message.

Support the show

***********
Disclaimer: Views are the participants’ own and do not represent those of any participant’s past, present, or future employers. Participation in this event is independent of any potential business relationship (past, present, or future) between the participants or between their employers.


Level up your AI Leadership game with the AI Leadership Handbook:
https://www.aileadershiphandbook.com

More details:
https://www.intelligence-briefing.com
All episodes:
https://www.intelligence-briefing.com/podcast
Get a weekly thought-provoking post in your inbox:
https://www.intelligence-briefing.com/newsletter

Andreas Welsch:

Today, we'll talk about red teaming and safeguarding for large language models and who better to talk about it than somebody who's not only actively working on that, but who's actually written the book on it. Steve Wilson. Hey, Steve. Thank you so much for joining. Hey, Andreas.

Steve Wilson:

Thanks for having me today. Always happy to be here.

Andreas Welsch:

That's awesome. Hey this is the third time you're actually on the show. You're the first guest to be on three times. We talked about large language models last year, about two weeks ago for the launch of my book, the AI Leadership Handbook. You shared a lot of great advice in a short segment on security, but we said, Hey, we want to spend a little more time on this today because I also feel that it needs a lot more attention now that Large language models are becoming more or less ubiquitous and embedded in so many more applications. But before I keep stealing your thunder, maybe for those of you in the audience who don't know Steve, maybe Steve, you can introduce yourself and share a bit about who you are and what you do.

Steve Wilson:

Great. I'm the Chief Product Officer at Exabeam, which is a cyber security company that specializes in using AI to find cyber security threats. Last year, I got involved in creating an open source product. Basically an open source project called the OWASP Top 10 for large language models. And OWASP is a large foundation with a few hundred thousand members that's dedicated to building secure code. And we did a bunch of research into what are the security vulnerabilities I'm not really good at reading languages, but I do have a couple of books in large language models because this isn't something that I've gotten a lot of research, and that got a lot of attention. Last year, after that got rolling, I actually, I got an email out of the cold from an editor at O'Reilly, and they asked if I wanted to write a book. And I've spent the intervening year writing this book, so I was super pleased to be on your launch stream, and today is the launch of my book, So I'm hijacking your stream to give it back. So this is my O'Reilly book. It's called the Developers Playbook for Large Language Model Security, Building Secure AI Apps. And, it's been fun to do that because it lets me get much deeper into a lot of these issues than you can do in a top 10 quick hit list.

Andreas Welsch:

That's awesome. Again, I'm so glad to have you on the show and folks also check out Steve's book here on large language model security. First of all, also congratulations, right? That's a big day for anybody that's that spent a year writing something and pouring your heart and soul into it and going through edits and revisions and getting some, first excitement. Now it's the big day. So excited that we can all celebrate that together with you. Awesome. Perfect. Hey folks, if you're just joining the stream, drop a comment in the chat where you're joining us from. I'm always curious how global our audience is. And also don't forget to subscribe to my newsletter, the AI memo and the intelligence briefing. So you can stay up to date on how you can run AI projects successfully. And again, this one, AI Leadership Handbook, is also available now on Amazon. Now, why don't we play a little game to kick things off, right? We've been doing this for a while on the show. So this game is called In Your Own Words. And when I hit the buzzer, the wheels will start spinning, and when they stop, you'll see a sentence. And I'd like for you to answer with the first thing that comes to mind, and why, in your own words. Okay. Let's see. Are you ready?

Steve Wilson:

Let's do it.

Andreas Welsch:

If AI were a fruit, what would it be?

Steve Wilson:

Oh man, the thing that just jumps to mind is strawberry. Did you load that question?

Andreas Welsch:

Oh, that's a good one.

Steve Wilson:

But for folks who haven't been following that the big news the last few months has been OpenAI secretly developing some new supermodel and it was code named Strawberry and they did release an early version of it, which is actually pretty fascinating. And I've been playing with that a lot the last few weeks.

Andreas Welsch:

That's an awesome answer. Didn't expect that one though. So good, perfect. So folks we obviously skipped your answer this time, but if you have one, please feel free to put them in the chat as well. But now, why don't we jump to the main part of our conversation, and we already have a couple questions prepped to guide our conversation. you mentioned where we're about a year after the release of the OWASP top 10 for large language model apps, and that's how we first got in contact with each other. But I'm curious where are we? What are you seeing? Are these threats that you discovered and that you described last year. Are they still somewhat far away in the distant future? Are they already here? Are all of them here? What are you seeing? How real are these things?

Steve Wilson:

It's interesting. When we go back more than a year now to when we published the first version of the list, I wouldn't say these things were hypothetical. Many of them there were real exploits that we published and discussed, but I'd say the the consequences for the people who fell victim to them were mostly embarrassment, sometimes severe embarrassment, but it was mostly embarrassment. In fact Martin Mikos, the CEO over at HackerOne, which is a really well known security company, put out a list that was a variant of the top 10 that was the top 10 ways to embarrass yourself with Gen AI. What's been interesting to watch first is the consequences are clearly much more severe now we've gone from a car dealership in Central, California putting up a chat bot that gets tricked into Offering to sell people cars for a dollar. I don't think anybody actually got a car for a dollar So they got embarrassed but no consequences On the other hand now, we're to the point where we see the AI agentry being a added to really major applications like Microsoft Copilot, and Slack being vulnerable to the things at the top 10 list, like indirect prompt injection. And both of those have come out recently with being vulnerable to that, where we see the major companies like Apple putting out these announcements about how they're going to make their Gen AI environment secure. But often I feel like they're doubling down on their traditional security. Apple's I'm building a super hardened data center to put your data in, and I'm going to run the LLM locally on your device. And both of those are good things to do. There's no downside to that. But if you've got the LLM locally on your device, and that LLM is susceptible to some of these vulnerabilities, and you haven't dealt with that, and it has access to your data, Somebody's going to trick it into doing something bad with your data. And we can see that coming. And we see those early results from the Microsofts and the slacks and a lot of other people, it's not limited to them. The other thing that we see out there are some of the vulnerabilities from the top 10 list let's call it changing rankings. We actually did. Some re voting and the expert group continues to grow and we shipped the first version. It was 400 people. Now it's 1400 people.

Andreas Welsch:

Wow.

Steve Wilson:

So people jump in and vote and participate. And we did some re ranking recently to decide where we wanted to focus. And I'd say there are two things that dramatically went up the list in terms of what people are interested in. One of them is what we call supply chain security. And this was on the list, but it was near the bottom. It was a hypothetical worry that bad actors could get into your supply chain for where am I getting my models and my training data and my weights and manipulate that somehow. and make it bad for me. Well known general cyber security problem, hypothetical not hypothetical anymore. When we look at these places like Hugging Face, which has become the de facto source to get so much stuff, and Hugging Face is an awesome site and community, but there have been a lot of researchers demonstrating there are thousands of poisoned AI models on that site, and we've seen examples where major organizations like Intel and Meta have lost control of their accounts and their keys and so that's going way up to the top of the list next time we publish it is understanding those supply chain concerns. The other one I don't think will surprise people, but agents. We had one vulnerability class for agents last time, what we called excessive agency, which is basically, are you giving the app more, more rights or more ability to execute autonomously than it should? really should have for its capabilities. If you look out in the AI sphere right now for the last couple months, everybody's talking about agents. It's suddenly pivoted to that time where people want to unleash a lot more autonomy. But I think we're still in a very dangerous place where I don't think that's well understood what the security implications are going to be of unleashing that and ultimately what the safety implications are.

Andreas Welsch:

I was just gonna say, especially that part, right on, on one hand you have the vulnerability of the LLM and to some extent the gullibility of it doing things that you don't intend it to do, but somebody else is getting it to do or tricking it into doing. And then certainly putting it in into workflows that are a little more critical. That are a little more high risk, that are a little more customer facing, certainly can quickly go beyond embarrassing.

Steve Wilson:

And look, people are not moving in small steps here. We see all sorts of examples where people are jumping straight from sales chat bot to medical application and financial trader. We're getting to the heart of the matter now.

Andreas Welsch:

Now that makes me curious. Especially as you're seeing and as we are seeing in the industry, that shift towards agents towards more LLM based applications of all different sorts and kinds in all different kinds of industries. How do you build these safeguards into the apps in around your LLMs? And doesn't that drive up the cost for checking and regenerating output if it's something that's harmful or something that that it's not supposed to generate? How do people deal with that?

Steve Wilson:

So one of the things that I have in the book is, ultimately when you get to the last chapter. There's a checklist of things you should be doing. The top 10 list was great for here's 10 things you need to understand that can go wrong. And I do cover all of those in the book, but the second half of the book is, what do you do about it, right? Just giving me a list of problems is not helping anymore. So I would say one of the things is people naturally jump to this idea that my defense If I'm worried about prompt injection, my defense is going to be putting a filter in front of the LLM that's going to look for prompt injection. And and that can be one part of the solution, and I do talk about some ways to do that and screen for things. But that's an incredibly hard problem, and I'd say, for the most part, something I don't encourage people to solve by themselves. The first thing to think about is for that part of the approach, you want to think about going and getting a guardrails framework. And there are commercial ones, there's a whole bunch of cool, virgining companies. I could name a few and then I'll get in trouble for the ones I don't, but let's try anyway. You've got PromptSecurity and LassoSecurity and YLabs. All these people are doing great stuff. You've got open source projects from companies as big as Nvidia and Meta. That are attacking parts of these problems. I suggest people check those out because that's, going to give you the state of the art for that layer of defense. The other things that I say, though, the first thing on the checklist is actually more about what I will call product management than engineering. And that's about carefully selecting What you are going to allow the thing to do and not do. And actually really closing down your, field of view. If you're trying to build ChatGPT in your open AI, there's a tremendous amount of things. And we all, trip over ChatGPT guardrails all the time. Oh, I can't do that. But then people trick it into doing it because it's a very large space to defend. It's actually much easier to create what we call an allow list than a deny list. If, as a product manager, I decide this bot is for helping with tax calculations, it's much easier for me to put some guidance in there. build it into that bot that says don't answer questions that are not directly about taxes, then list all of the things that should not be. And when you think about it that way, you unlock some of these things where that's not a huge computational cost. That's just a smart decision.

Andreas Welsch:

That makes a lot of sense. I think putting it that way and giving it that persona of This is what you do, this is what you're not supposed to do, or avoid or ignore anything else, right? And again, comes back to the point that I think we talked about last year. Almost like a zero trust, zero, yeah, zero trust boundary.

Steve Wilson:

What I tell everybody is you need to treat your LLM as something in between a confused deputy And an enemy sleeper agent. And really we're used to talking about trust boundaries in our software as the software is here, it's my software. I trust it. And what I'm trying to do is control what's coming in from the outside. The thing is at this point, almost by the nature of any of these applications, they're designed to take untrusted input from the outside. And by proxy, I have to stop trusting the LLM. And that means actually, I put a lot more effort into filtering and checking what comes out than I do what comes in. It turns out to actually be, again, a much easier problem to solve and screen for. I can watch for, is it spitting out code? If I didn't intend it to spit out code, I can watch if it's spitting out credit card numbers or social security numbers. Those are easy problems to solve versus, is this an attempt to hijack my LLM?

Andreas Welsch:

That's great. I think especially the examples that you give of how to do that or where to do it, right? It's easier, like I said, to do that at the beginning than at the tail end of it. Which triggers another thought or another question here. And I haven't been really able to find good expertise on this for people who can talk about this in depth and in breadth. Which is how do you actually red team your LLM applications? Meaning you have something that's untrusted, but how do you go through all these scenarios? What are things you should be testing for? And how, can your own security team maybe get more familiar with these techniques and, upscale now that LLMs are here?

Steve Wilson:

It's, been a, it's been a really interesting journey and, I've been exploring this from a few different angles. Personally my day job being at Exabeam, we just added a bunch of Gen AI capabilities to our cyber security software that we sell to enterprises. All of a sudden, I wasn't just the author writing about this stuff. I was on the hot seat and responsible for it. And, when we started putting this out and started putting it out in early access, and frankly got some really great responses from users, I went back to the engineering team and started asking a lot of questions like, Hey, what kind of guardrails are we putting on this? And what are we doing? And, there, there weren't a lot of great resources for red teaming, but it turned out to be very, or it was very easy to trip it up with even basic red teaming at the beginning because These, things are hard, but what I will say is, oh, and the other part is from the OWASP side red teaming has become a huge topic within the expert group. Again, it was like, hey, what are all the problems? Now it's like, all right, red teaming is a big part of the solution. How are we going to give people good red teaming guidance? And we actually, we did a panel on that at RSA that was really well received. And it's up on YouTube. If you go look for it, OWASP top 10 LLM red teaming, you can find it. But, What I'll say is interesting, which is, people coming to this from what I call an AI background or, coming to this from an AI angle often don't have a lot of cybersecurity experience. And that's one of the tricks. So they don't necessarily know where to start. They actually, the term red teaming has become pervasive. In the community, broadly, due to LLMs I was watch I was watching a clip off cable TV recently, where Senator Chuck Schumer from the United States Senate was talking about AI red teaming and LLM, My mind just exploded, so it's in the popular culture. And, what we really see is that, there's a tremendous amount of expertise on how to approach this in an obscure corner of your development organization at your enterprise called the AppSec team. And these teams are sometimes feel unloved, but they're the ones who are charged with building secure software inside your enterprise. And they are coming up a learning curve about AI, but they know how to secure things and they know how to pen test them. And for a long time, we called that pen testing. And then red teaming is a different variant of that, but that's a good place to start with that piece of expertise. The thing about red teaming that's different, and I break this down in the book with a table that actually compares pen testing and red teaming, but AI red teaming is really, you want to attack it from a broad perspective. You're not probing it looking for traditional attacks. You're looking at it from a holistic point of view. Security, safety. And it's really interesting to watch these teams that were viewed very narrowly, like an AppSec team in an enterprise. Suddenly become much more important where they're getting asked for questions. Like how do I test this for safety? And sometimes they're uncomfortable with that. They're like nobody's asked me this Like I got nobody else to ask you're the only person here with expertise on this. Let's figure it out some people get uncomfortable Some people are really embracing it and these teams are stepping up and it's really cool to see them step up and attack this teaming if You don't have experience with it and you're looking to just get your feet wet. There's an awesome resource from a company called Laykara. They're another one who makes guardrails frameworks. The thing they got famous for originally was, they put out basically a game on the internet called Gandalf. And it is, a tool to practice attacking large language models. And so if you want to just get there in a way that's not illegal or immoral, practice your skills at hacking LLMs and develop those red teaming skills. That's a great place to go get started.

Andreas Welsch:

Cool. I know what I'll be doing once we end the stream. That's for sure. I always love learning something new and getting new resources. Two things, folks, for those of you in the audience, if you have a question for Steve, please put it in the chat. We'll pick it up in a minute or two. And while you're thinking about your question, Steve, you mentioned you wrote a book and today's launch day. What's the name of the book again?

Steve Wilson:

Yep, so the book is called The Developers Playbook for Large Language Model Security Developing Secure AI Applications. It's from O'Reilly. And and again, it's available today. You can get the electronic copies on Amazon or wherever you buy books. And they promised me the hard copies are on a truck somewhere. Coming really soon.

Andreas Welsch:

That's awesome. Fantastic. So folks, definitely check that out. So while we're waiting for questions to come in, I think another aspect then is, as you're thinking about red teaming, it's the usual teams that have been doing this kind of work for a long time. It's an additional dimension of how do we do this for large language models now. How do we upskill? What roles do you see within these teams are best suited to do that? What skill sets do people have that they do red teaming that do develop these safeguards? Is it the AI scientists, I think you said, that's maybe not the best angle. Is it the pure security teams? Is it the English majors who know how to write a crafty sentence and elicit a good response? Where do you start?

Steve Wilson:

Yeah, it's. It's funny. Just in general, having been immersed in not just the security, but the development of Gen AI apps. It's interesting to look at the role of, let's call it, data scientists and machine learning engineers. Because at ExitBeam, we have a bunch of them who've been developing. Deep machine learning models for high speed analysis of data for 10 years. But when we added the co pilot with, Gen AI, and we're actually, we're using we're using a lot of things people who watch your show would be familiar with. We were using a private version of Gemini inside Google Cloud, and it's an LLM, and we do rag and prompt engineering and, these things that we're all learning about together. The role of the data science there. is very minimal. If you're a machine learning specialist in the world of large language models, you might be working at somewhere like Google or OpenAI, working on the next wave of those frontier models. And I have a tremendous amount of respect for that. But when it comes to building these apps, this is really a new skill set that we're all developing together about. How do I build effective prompts and how do I do effective RAG? And there are ways to connect that up to existing, machine learning technologies, which has been really cool to do at Exabeam. But, I'd say what that means is that we're all developing these new skill sets. I think from a security perspective, those data scientists are the furthest away, from, doing this directly at your enterprise. Although there are a lot of smart ones at places like NVIDIA and Meta working on security guardrails. And I would just take advantage of that. I would not try to imply, employ data scientists to build security guardrails for a new enterprise app.

Andreas Welsch:

Good point, that you're making there. So then what types of skills do you see if it's not the data scientists? But there is some knowledge needed that you understand what are the vulnerabilities? What's a good skill set that can test for these things? Again, is it more language? Is it more security? Is it something completely different?

Steve Wilson:

I'd say that the term red teaming existed in cyber security long before large language models. And the first place is to look to those security teams that have experience in pen testing and red teaming. There are also great things out there that are cropping up. Companies like HackerOne. HackerOne is a really cool company with a cool business model where they basically run people's business. bug bounty programs for large enterprise. What they have is a million, literally a million hackers under contract, and they will come try to hack your stuff. And if they hack your stuff, you pay them because they will disclose to you how they did it. And in the last six to 12 months, they've added AI red teaming to this. And I've seen some amazing feedback on what you're able to do with that. So you're able to actually go out, and basically crowdsource

Andreas Welsch:

That is indeed an interesting business model as well. And I could see, especially for companies that don't necessarily have the bandwidth or already the expertise, we're still bringing up the expertise in their own organization to leverage offerings like that you just mentioned to augment their skills or again get some external help on this. Fantastic. Steve, we're getting close to the end of the show, and I was wondering if you can summarize the key three takeaways for our audience today before we wrap up.

Steve Wilson:

Yeah, from the topics we've covered, I'd say, this is a fast evolving space. I would say that if you're tracking this, you want to start to be aware of some of the newer places where the vulnerabilities are showing up, and these are definitely things like supply chain. If you're not just using software as a service model, like a Google Gemini, but you're off trying to run your own LLM or trying to use training data from third parties and things like that. Really start to understand the characteristics of your supply chain. And then as you go out and start to think about adding agents to the mix letting things execute autonomousy and seek goals, really think hard about the implications of that. And lastly, please check out the book, developers playbook for large language model security.

Andreas Welsch:

Steve, it's been a pleasure having you on. Thank you for sharing your expertise with us.

Steve Wilson:

Thanks, Andreas. Always enjoy it.

People on this episode