Security Stop-Press : OpenAI’s ChatGPT Atlas Getting Prompt Injection Protection

OpenAI has said it is strengthening security in its ChatGPT Atlas browser to reduce the risk of prompt injection attacks, warning the threat is unlikely to ever be fully eliminated.

ChatGPT Atlas includes an agent mode that can read web pages and take actions inside a user’s browser, such as clicking links or sending emails. This makes it a more attractive target for prompt injection attacks, where hidden malicious instructions are used to override the user’s intent.

OpenAI has rolled out a security update with a newly adversarially trained model and stronger safeguards. The changes were driven by automated red teaming, using an internally built AI attacker trained with reinforcement learning to identify complex exploits before they appear in the wild.

The company said prompt injection is a long-term AI security challenge, similar to online scams and social engineering. Industry analysts, including Gartner, have warned that AI browsers could pose significant risks if not tightly controlled.

To reduce exposure, OpenAI advises businesses to limit logged-in use of AI agents, carefully review confirmation requests, and give clear, tightly scoped instructions to reduce the impact of hidden or malicious content.

Recent Posts

Archives