When ChatGPT first launched, and I got to try it for the first time, two immediate thoughts rushed through my mind: one was that I was going to lose my job as a writer, and the other was that this was going to forever change how vulnerabilities are discovered and exploited.
Today, one of those things is inching closer to reality. A team led by former GitHub Copilot pioneers has announced XBOW, an artificial intelligence system designed to find and exploit vulnerabilities in web applications autonomously. According to their statements, XBOW has demonstrated remarkable proficiency, successfully solving 75% of web security benchmarks without human intervention.

Oege de Moor, founder and CEO of XBOW, is well-known in programming tools and AI-assisted software development. Before founding XBOW, de Moor was a co-founder and CEO of Semmle, a code analysis platform that was acquired by GitHub in 2019. His work at Semmle laid the groundwork for GitHub Copilot, the AI pair programming tool that has gained widespread developer adoption.
“XBOW brings AI to offensive security,” de Moor stated in a blog post announcing the technology. “Today we’re announcing the results of testing XBOW on hundreds of web security benchmarks. Without any human intervention, XBOW correctly finds and exploits the vulnerabilities in most of them.” In other words, you can think of this as “ChatGPT as an actual security expert,” which can understand not only existing vulnerabilities, such as those based on CVEs, but also develop novel ways to find a means inside a system.
The system’s capabilities are demonstrated through a series of impressive feats. In one instance, XBOW successfully broke a cryptographic CAPTCHA by exploiting a padding oracle vulnerability in an AES-CBC implementation. Another showcase highlights XBOW’s ability to exploit an Insecure Direct Object Reference (IDOR) vulnerability in a GraphQL API, demonstrating its capacity to navigate complex web architectures.
Perhaps most intriguingly, XBOW has shown its ability to write custom tools when necessary. In one challenging scenario, it wrote a custom implementation of the SHA-256 hashing algorithm to execute a hash length extension attack. This task has only been completed by 649 human users on the PentesterLab platform.

The team describes XBOW’s approach as “agentic AI,” meaning it pursues high-level goals by executing commands and reviewing their output. While the company is tight-lipped about the specific technologies powering XBOW, citing the need to protect its intellectual property, they said it goes beyond standard techniques to include proprietary innovations.
Such technology has significant implications for the cybersecurity landscape. Automated vulnerability detection and exploitation could dramatically accelerate identifying and patching security flaws in web applications and software, including networks.
Of course, it also raises concerns about potential misuse. One problem is that such tools could be heavily abused by threat actors with only malicious intentions. Addressing these concerns, De Moor said, “We will only make our technology available to trusted customers in the cloud. It is impossible to run XBOW as a standalone application outside our control.”
It will be interesting to see how their process develops over time. Threat actors can often gain access to companies’ internal operations because of vulnerabilities in third-party software, such as Exim and Outlook. Having a specifically developed and trained “agent AI” constantly try to break into a system twenty-four hours a day could significantly increase the number of discovered and disclosed vulnerabilities. NIST will not be happy about this.
Looking ahead, the team said they plan to open-source the novel XBOW benchmarks and hope that the broader security community will adopt them as a “standard” for evaluating and improving security tools. For now, little else is disclosed, and the website itself, outside of examples, only provides a “Join a waitlist” link.