{"id":7140,"date":"2025-01-30T10:00:00","date_gmt":"2025-01-30T16:00:00","guid":{"rendered":"https:\/\/www.darkreading.com\/vulnerabilities-threats\/new-jailbreaks-manipulate-github-copilot"},"modified":"2025-01-30T10:00:00","modified_gmt":"2025-01-30T16:00:00","slug":"new-jailbreaks-allow-users-to-manipulate-github-copilot","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2025\/01\/30\/new-jailbreaks-allow-users-to-manipulate-github-copilot\/","title":{"rendered":"New Jailbreaks Allow Users to Manipulate GitHub Copilot"},"content":{"rendered":"<div class=\"media_block\"><a href=\"https:\/\/i0.wp.com\/eu-images.contentstack.com\/v3\/assets\/blt6d90778a997de1cd\/blt209bf6e85e9311a4\/679a6f884acb5c644e9aa7ca\/GitHub_Copilot-Mykhailo_Polenok-Alamy.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot.jpg?w=640&#038;ssl=1\" class=\"media_thumbnail\"><\/a><\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Researchers have discovered two new ways to manipulate GitHub&#8217;s artificial intelligence (AI) coding assistant, Copilot, enabling the ability to bypass security restrictions and subscription fees, train malicious models, and more.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The first trick involves <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.apexhq.ai\/blog\/blog\/2025-github-copilot-vulnerabilities-technical-overview\">embedding chat interactions inside of Copilot code<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, taking advantage of the AI&#8217;s instinct to be helpful in order to get it to produce malicious outputs. The second method focuses on rerouting Copilot through a proxy server in order to communicate directly with the OpenAI models it integrates with.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Researchers from Apex deem these issues vulnerabilities. GitHub disagrees, characterizing them as &#8220;off-topic chat responses,&#8221; and an &#8220;abuse issue,&#8221; respectively. In response to an inquiry from Dark Reading, GitHub wrote, &#8220;We continue to improve on safety measures in place to prevent harmful and offensive outputs as part of our responsible AI development. Furthermore, we continue to invest in opportunities to prevent abuse, such as the one described in Issue 2, to ensure the intended use of our products.&#8221;<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"Jailbreaking GitHub Copilot\">Jailbreaking GitHub Copilot<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;Copilot tries as best as it can to help you write code, [including] everything you write inside a code file,&#8221; Fufu Shpigelman, vulnerability researcher at Apex explains. &#8220;But in a code file, you can also write a conversation between a user and an assistant.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">In the screenshot below, for example, a developer embeds within their code a chatbot prompt, from the perspective of an end user. The prompt carries ill intent, asking Copilot to write a keylogger. In response, Copilot suggests a safe output denying the request:<\/span><\/p>\n<div><img data-recalc-dims=\"1\" decoding=\"async\" data-testid=\"content-image\" data-component=\"image\" class=\"ContentImage-Image ContentImage-Image_align_left\" data-src=\"https:\/\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot.png\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot.png?w=640&#038;ssl=1\" loading=\"lazy\" alt=\"GitHub Copilot code\" title=\"GitHub Copilot code\"><\/p>\n<p class=\"ContentImage-Link\">Source: Apex<\/p>\n<\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The developer, however, is in full control over this environment. They can simply delete Copilot&#8217;s autocomplete response, and replace it with a malicious one.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Or, better yet, they can influence Copilot with a simple nudge. As Shpigelman notes, &#8220;It&#8217;s designed to complete meaningful sentences. So if I delete the sentence &#8216;Sorry, I can&#8217;t assist with that,&#8217; and replace it with the word &#8216;Sure,&#8217; it tries to think of how to complete a sentence that starts with the word &#8216;Sure.&#8217; And then it helps you with your malicious activity as much as you want.&#8221; In other words, getting Copilot to write a keylogger in this context is as simple as gaslighting it into thinking it wants to.<\/span><\/p>\n<div><img data-recalc-dims=\"1\" decoding=\"async\" data-testid=\"content-image\" data-component=\"image\" class=\"ContentImage-Image ContentImage-Image_align_left\" data-src=\"https:\/\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-1.png\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-1.png?w=640&#038;ssl=1\" loading=\"lazy\" alt=\"GitHub Copilot code\" title=\"GitHub Copilot code\"><\/p>\n<p class=\"ContentImage-Link\">Source: Apex<\/p>\n<\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">A developer could use this trick to <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.darkreading.com\/threat-intelligence\/ta547-uses-llm-generated-dropper-infect-german-orgs\">generate malware<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, or malicious outputs of other kinds, like <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.darkreading.com\/threat-intelligence\/chatbot-roadmap-how-to-conduct-a-bio-weapons-attack\">instructions on how to engineer a bioweapon<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">. Or, perhaps, they could use Copilot to embed these sorts of malicious behaviors into their own chatbot, then distribute it to the public.<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"Breaking Out of Copilot Using a Proxy\">Breaking Out of Copilot Using a Proxy<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">To generate novel coding suggestions, or process a response to a prompt \u2014 for example, a request to write a keylogger \u2014 Copilot engages help from cloud-based large language models (LLM) like Claude, Google Gemini, or OpenAI models, via those models&#8217; application programming interfaces (APIs).<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The second scheme Apex researchers came up with allowed them to plant themselves in the middle of this engagement. First they modified Copilot&#8217;s configuration, adjusting its &#8220;github.copilot.advanced.debug.overrideProxyUrl&#8221; setting to redirect traffic through their own proxy server. Then, when they asked Copilot to generate code suggestions, their server intercepted the requests it generated, capturing the token Copilot uses to authenticate with OpenAI. With the necessary credential in hand, they were able to access OpenAI&#8217;s models without any limits or restrictions, and without having to pay for the privilege.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">And this token isn&#8217;t the only juicy item they found in transit. &#8220;When Copilot [engages with] the server, it sends its <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.darkreading.com\/cloud-security\/chatgpt-exposes-instructions-knowledge-os-files\">system prompt<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, along with your prompt, and also the history of prompts and responses it sent before,&#8221; Shpigelman explains. Putting aside the <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.darkreading.com\/application-security\/hundreds-of-llm-servers-expose-corporate-health-and-other-online-data\">privacy risk that comes with exposing a long history of prompts<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, this data contains ample opportunity to abuse how Copilot was designed to work.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">A &#8220;system prompt&#8221; is a set of instructions that defines the character of an AI \u2014 its constraints, what kinds of responses it should generate, etc. Copilot&#8217;s system prompt, for example, is designed to block various ways it might otherwise be used maliciously. But by intercepting it en route to an LLM API, Shpigelman claims, &#8220;I can change the system prompt, so I won&#8217;t have to try so hard later to manipulate it. I can just [modify] the system prompt to give me harmful content, or even talk about something that is not related to code.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">For Tomer Avni, co-founder and CPO of Apex, the lesson in both of these Copilot weaknesses &#8220;is not that GitHub isn&#8217;t trying to provide guardrails. But there is something about the nature of an LLM, that it can always be manipulated no matter how many guardrails you&#8217;re implementing. And that&#8217;s why we believe there needs to be an independent security layer on top of it that looks for these vulnerabilities.&#8221;<\/span><\/p>\n<p><a href=\"https:\/\/www.darkreading.com\/vulnerabilities-threats\/new-jailbreaks-manipulate-github-copilot\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Researchers have discovered two new ways to manipulate GitHub&#8217;s artificial<\/p>\n","protected":false},"author":12,"featured_media":7141,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[809],"class_list":["post-7140","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-dark-reading"],"featured_image_urls":{"full":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=2560%2C1440&ssl=1",2560,1440,false],"thumbnail":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?resize=150%2C150&ssl=1",150,150,true],"medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=300%2C169&ssl=1",300,169,true],"medium_large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"1536x1536":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=1536%2C864&ssl=1",1536,864,true],"2048x2048":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=2048%2C1152&ssl=1",2048,1152,true],"chromenews-featured":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=1024%2C576&ssl=1",1024,576,true],"chromenews-large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?resize=825%2C575&ssl=1",825,575,true],"chromenews-medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?resize=590%2C410&ssl=1",590,410,true]},"author_info":{"display_name":"Dark Reading","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/darkreading\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","tag_info":"Uncategorized","comment_count":"0","jetpack_featured_media_url":"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/01\/new-jailbreaks-allow-users-to-manipulate-github-copilot-scaled.jpg?fit=2560%2C1440&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/7140","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=7140"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/7140\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media\/7141"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=7140"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=7140"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=7140"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}