This AI mistake can destroy any brand

May 3

Most companies using chatbots aren’t even aware of the risks they are exposing themselves to if they forget to add a crucial component.

Imagine this:
You spend your resources on implementing a shiny new chatbot. You’re ready to go live. You even tested it against your FAQ list and were happy with the performance. You’re a few steps away from solving the most time-consuming, low-value tasks of your customer support team.

You deploy the bot. Everything seems okay. Customers are happy — most of their questions are solved in seconds instead of minutes or hours.

The next day, just to be sure, you check your brand’s social media.

You’re horrified because people are sharing screenshots of your chatbot being racist, acting rude, or saying wildly inappropriate or nonsense things.
After a desperate check of recent conversations, you find that people are using jailbreaking methods to break your bot in ways you couldn’t even imagine.

What is Jailbreaking?

It’s the act of using targeted questions to bypass your instructions and prompts. It can be playful or malicious.

Examples of what could happen:

Giving away sensitive details hackers can use to break into your system or access data
Forcing your bot to make false claims
It can even recommend competitor brands to your customers.

Don’t forget: Your bot is the representative of your brand.

All this happens because you forgot to add one crucial component:

A final output check that enforces certain policies on your bot.

Guardrails

One way to do this is to add Guardrails to your code. There are plenty of different options to choose from, for example you can opt to use guardrails.ai or Nvidia’s NeMo.

Think of this layer as your brand’s border control. It will check information coming in and going out of your brand’s domain.

The goal of a Guardrail system is to make sure that your bot isn’t forced out of the expected behaviour by malicious prompting attacks and well targeted messages.

You can define which topics to avoid completely.
Give instructions on how to respond if a certain kind of message is sent to your bot.

Applying a Guardrail layer in your application can give you peace of mind that your brand’s AI representative is doing its job as expected.

Build Your Own Solution

Pros:

More control.
Future-proof - it will grow and change with your brand and your systems.
Flexible - can be changed in any way you need and it can be easily integrated in your existing systems.
Lightweight.

Cons:

Takes more time.
The implementation takes skill and knowledge of AI systems.
You’ll need to define all rules and edge cases manually.

Add a new final node to your app. This node is executed before the final output is shown to the customer.

You decide how to handle problematic outputs:

Want to return a default answer? Easy.
Want to send feedback or provide corrective instructions to the previous LLM step so it changes the output? Sure.

It only takes a few lines of code and a couple of config files to implement your own policies. This way, your bot won’t harm your brand while helping your users.

Conclusion

While AI can provide huge gains in efficiency and cost savings, it’s crucial that no safety steps are skipped during product development and that your bot is constantly monitored.

Interested in implementing safe AI systems in your workflows?

Schedule a call!

Vajk Turi

This AI mistake can destroy any brand

What is Jailbreaking?

Guardrails

Build Your Own Solution

Conclusion

Jailbreaking

AI Blacksmiths Ltd.