The ultimate guide on prompt injection

The ultimate guide on prompt injection

Remember little Bobby Tables?

He’s all grown up now.

What is prompt injection?

Prompt injection is a general term for a category of techniques designed to cause an LLM (Large Language Model) to produce harmful output. When applications use LLM technology to somehow respond to user input, users can give arbitrary instructions to the LLM, potentially bypassing censorship and revealing sensitive information.

This is roughly analogous to SQL injection, since it works on the same general principle of escaping the limited context of a string literal that user input is supposed to reside in, which gives the user the power to actually execute instructions. SQL injection is for the most part a solved problem at this point though, because we’ve learned to sanitize user inputs and separate them from code instructions.

Prompt injection, however, is a brand new beast. It started becoming well-known only in 2021 and 2022, and only recently with the explosion of AI-driven SaaS tools has it become a serious security concern. Perhaps you’ve heard of the story from 2023 where Chevrolet decided to put chatbots on their dealership websites, and people quickly got it to offer them cars for just $1 by just asking it to. The general public was so unaware of this attack vector that news outlets called the person who originally posted about it a “hacker”. As you’d expect, Chevy isn’t too keen to keep their chatbot’s word… but if they were in Canada they might be forced to. A court in British Columbia recently set a precedent that companies are responsible for the output of AI agents on their website since its appearance implies that the company endorses what the LLM is saying. This was decided in a case where a chatbot on Air Canada’s site misled a customer about the process for getting a flight to a funeral refunded — the customer sued for the price of the fare plus legal fees and won.

How are we to deal with this mind-blowing, potentially legal-action-inducing vulnerability? That’s an excellent question. Given that Algolia’s engineers and our friends across the industry are some of the world’s leading experts on generative AI, we’ve set out to compile the ultimate guide on mitigating the risks associated with prompt injection. Unless otherwise indicated, the information to follow comes from our in-house AI experts, but you’ll see the sources cited when it comes from extensive external research and interviews conducted by an experienced developer and technical author on our blog team.

Do you even need to use an LLM?

Before we get started on solutions, let’s do a little risk analysis. One volunteer organization involved in construction work notes in their internal guidelines that eliminating risks is the first step to safe output. Swapping dangerous ideas for less risky ones comes next, and only then do we get to solutions that involve engineering. Surely you’d agree that removing risks altogether is better than trying to mitigate or lessen them?

With that in mind, be honest about your use case: if it wasn’t the trendy thing to do, would you even be using an LLM? Is it the right tool for the job? Before we get to engineering solutions, examine whether you can remove the risky LLM tech altogether or replace it with a narrower, safer, solution. Consider these questions:

  • Are you using an LLM to answer a specific set of support questions? If so, you might be able to just match queries with an intelligent vector search algorithm. Vectors are the mathematical representations of complex ideas, and they can be generated from words. There is plenty of excellent information on this around the Internet, but for a one-minute crash course, take a look at this beautifully-illustrated short from Grant Sanderson, the mind behind the YouTube channel 3Blue1Brown. The gist is that the vector can be visualized as a physical direction in a space¹. That direction contains meaning, and since you can perform math with vectors, you can quantify analogies. This is the technology that LLMs use under the hood to convert prompts into numbers that can run through the model, but you can skip that step if your use case just requires you to essentially find the closest match to a dataset of potential results. In theory, a trained vector generation model would generate vector embeddings for the questions you’d like answered, and then figure out which one is within a certain distance threshold in that high-dimensional space from the vector embedding of your user’s query, effectively matching it to the correct response.

Biased plug: This vector search algorithm is actually the idea behind Algolia’s main product, NeuralSearch. This article is meant to be educational and not marketing, so instead of extolling the virtues of NeuralSearch here, feel free to read further about it with this blog post and come to your own conclusions. Because we have experience in this though, we’re going to explore these vector-based ideas more in future articles.

  • Are you using an LLM to make input-dependent, limited-choice decisions? If so, you might be able to train a simpler MLP or KAN model with only the output options necessary. When you’ve researched neural networks and seen that scary-looking graph of nodes, this is probably what you were thinking of:

from the paper on Kolmogorov-Arnold Networks released in April 2024

It’s not as scary as it looks, though. Those graphs of nodes actually condense into some fairly straightforward equations if you build it up from first principles. That’s the premise of a very in-depth DIY series from sentdex on YouTube called Neural Networks from Scratch, which was also worked into an interactive book of the same name. The goal was just to understand the root principles of these kinds of networks, since they produce seemingly-complex results from rather simple instructions. In a real application, you’d likely use a framework that handles most of this complex math for you like Tensorflow and Keras or PyTorch. We’ve even built one or two for this blog to use in tandem with legit LLMs. In this use case, the output of these models need only be a few nodes. If the network is trained to make a certain limited-choice decision, the combination of which nodes are on² can determine which choice to pick.

  • Are you using an LLM to connect users with new content or products based on queries entered into a chat box or previous interactions? If so, you might be trying to implement a search or recommendations algorithm. We’ve written about this before, so we’ll suggest you dive into this article and this one for more details, but here’s the gist: don’t reinvent the wheel. Other types of AI have proved their worth in these use cases, and LLMs don’t offer any significant advantages. The data LLMs respond with is hard to analyze and control, since it doesn’t work with structured datasets like product catalogues well. The suggestions are rarely optimally relevant and most users agree that the experience of product discovery inside the context of a chatbot is subpar.
  • Are you using an LLM to perform statistical analysis or mathematical operations? If so, there’s way more accurate and speedy tools for that. A good example of this vast category of use cases is building a chess engine, a famously analytical problem to solve. If you’re curious, here’s a chess YouTuber commentating a “match” between ChatGPT and the leading chess AI StockFish. ChatGPT invents several new (read: illegal) moves, captures its own pieces, and puts itself in checkmate almost immediately. Can you blame it? It’s not the right tool for the job! At the end of the day, LLMs just output the most likely next token in a long list of tokens, so if nothing similar to the analysis you’re trying to get the LLM to perform appears in its training dataset, then it can just produce random nonsense. One solution to this has been to connect ChatGPT to Wolfram Alpha to answer computational questions… but if this isn’t part of an application that strictly needs the conversational interface, why not just use Wolfram Alpha directly, saving costs and reducing potential inaccuracies?³

Despite the cautious tone of the previous section, here at Algolia we’re incredibly optimistic about generative AI (genAI) when it’s used in the right context — LLMs were even used sparingly in the creation of this article. But to use them responsibly, we must understand the risks and plan accordingly. If we don’t need to expose ourselves to the vulnerabilities and costs that come along with LLMs, why should we?

Identifying and lessening risks associated with prompt injection

Say that your use case does require that you use an LLM — what then?

Well, our friends over at Prism Eval mentioned in an interview that while the ideal solution would be that the LLM is trained to not know harmful content in the first place, this is an unreasonable approach. Why? Remember that what counts as harmful can change based on the application. Talking about $1 cars is harmful content for Chevrolet, but we could easily construct a scenario where, say, a student solving a homework problem might talk about $1 cars. There, that conversation would be helpful to the student, not harmful. So if that approach isn’t going to work, what other steps can we take?

Remember how during the COVID-19 pandemic, we were advised of many different precautions we could take to slow the spread of the virus and protect ourselves from infection? None of the individual methods were 100% effective, but they were very effective as a group. Each precaution caught much of the potential for infection that the previous precaution missed, and the next precaution caught even more. This is known as the “Swiss cheese” model for risk prevention.

Let’s apply that model to the risks associated with LLMs: if we can identify specific attack vectors and develop strategies to counteract them, we should be able to stack those strategies up next to each other and drastically increase our coverage.

This is by no means an exhaustive list, and it should be clear that this is still an area of active research, but we’ll focus in on these five categories of solutions.

Continue reading The Ultimate Guide on Prompt Injection


Super cool article! If you are in this situation, you need to check out our open-source products. We have multiple solutions for this problem. https://github.com/openshieldai/openshield

Like
Reply
Alexandru Armasu

Founder & CEO, Group 8 Security Solutions Inc. DBA Machine Learning Intelligence

1mo

Great insights.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics