Jailbreak (AI) — definition and meaning
An AI jailbreak is a prompt or technique that tricks a model into bypassing its safety guidelines or content restrictions to produce outputs it would normally refuse.
Last reviewed 2026-05-25
Jailbreaking refers to crafting inputs that get around an AI's built-in safety filters — through roleplay framings, hypothetical wrappers, or adversarial phrasing. In the AI-companion space, jailbreaks circulate as ways to push filtered platforms toward content they otherwise block. We're candid about our stance: Bae is SFW by default with a careful, age-verified Spicy track in development, rather than something users have to jailbreak. Designing the limits in deliberately is, in our view, more honest than shipping filters people are expected to defeat.
About jailbreak (ai).
What is jailbreak (ai)?
An AI jailbreak is a prompt or technique that tricks a model into bypassing its safety guidelines or content restrictions to produce outputs it would normally refuse.
How is "jailbreak (ai)" used in AI companion apps?
Jailbreaking refers to crafting inputs that get around an AI's built-in safety filters — through roleplay framings, hypothetical wrappers, or adversarial phrasing. In the AI-companion space, jailbreaks circulate as ways to push filtered platforms toward content they otherwise block. We're candid about our stance: Bae is SFW by default with a careful, age-verified Spicy track in development, rather than something users have to jailbreak. Designing the limits in deliberately is, in our view, more honest than shipping filters people are expected to defeat.
What other terms relate to jailbreak (ai)?
Related terms in the AI companion space include: No-filter AI, System prompt, NSFW. Each has its own glossary entry on /glossary.
- No-filter AI
Refers to AI chatbots without content moderation around adult or sensitive topics.
- System prompt
A system prompt is the hidden instruction given to an AI model before a conversation begins, defining its persona, rules, and behavior. It's how a generic model becomes a specific character.
- NSFW
NSFW stands for 'Not Safe For Work' — a label for content (images, text, links) that's sexual, graphic, or otherwise inappropriate to view in a workplace or public setting.