{"id":5966,"date":"2024-10-28T14:49:29","date_gmt":"2024-10-28T19:49:29","guid":{"rendered":"https:\/\/www.darkreading.com\/application-security\/chatgpt-manipulated-hex-code"},"modified":"2024-10-28T14:49:29","modified_gmt":"2024-10-28T19:49:29","slug":"mozilla-chatgpt-can-be-manipulated-using-hex-code","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2024\/10\/28\/mozilla-chatgpt-can-be-manipulated-using-hex-code\/","title":{"rendered":"Mozilla: ChatGPT Can Be Manipulated Using Hex Code"},"content":{"rendered":"<div class=\"media_block\"><a href=\"https:\/\/i0.wp.com\/eu-images.contentstack.com\/v3\/assets\/blt6d90778a997de1cd\/blt70292916700b315f\/671fb7d080eaf543b21bf0b4\/Hex-Igor_Stevanovic-Alamy.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code.jpg?w=640&#038;ssl=1\" class=\"media_thumbnail\"><\/a><\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">A new prompt-injection technique could allow anyone to bypass the safety guardrails in OpenAI&#8217;s most advanced language learning model (LLM).<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">GPT-4o, released May 13, is faster, more efficient, and more multifunctional than any of the previous models underpinning <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/application-security\/100k-infected-devices-leak-chatgpt-accounts-dark-web\" rel=\"noopener\">ChatGPT<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">. It can process multiple different forms of input data in dozens of languages, then spit out a response in milliseconds. It can engage in real-time conversations, analyze live camera feeds, and maintain an understanding of context over extended conversations with users. When it comes to user-generated content management, however, GPT-4o is in some ways still archaic.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Marco Figueroa, generative AI (GenAI) bug-bounty programs manager at Mozilla, demonstrated in a new report how bad actors can leverage the power of GPT-4o while skipping over its guardrails. The key is to essentially distract the model by <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/0din.ai\/blog\/chatgpt-4o-guardrail-jailbreak-hex-encoding-for-writing-cve-exploits\" rel=\"noopener\">encoding malicious instructions in an unorthodox format<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, and spread them out in distinct steps.<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"Tricking ChatGPT Into Writing Exploit Code\">Tricking ChatGPT Into Writing Exploit Code<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">To prevent malicious abuse, GPT-4o analyzes user inputs for any signs of bad language, instructions with ill intent, etc.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">But at the end of the day, Figueroa says, &#8220;It&#8217;s just word filters. That&#8217;s what I&#8217;ve seen through experience, and we know exactly how to bypass these filters.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">For example, he says, &#8220;We can modify how something&#8217;s spelled out \u2014 break it up in certain ways \u2014 and the LLM interprets it.&#8221; GPT-4o might not reject a malicious instruction if it&#8217;s presented with a spelling or phrasing that doesn&#8217;t accord with typical natural language.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Figuring out <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/forget-deepfakes-or-phishing-prompt-injection-is-genai-s-biggest-problem\" rel=\"noopener\">the exact right way to present information<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> in order to dupe state-of-the-art AI, though, requires lots of creative brain power. It turns out that there&#8217;s a much simpler method for bypassing GPT-4o&#8217;s content filtering: encoding instructions in a format other than natural language.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">To demonstrate, Figueroa arranged an experiment with the goal of getting ChatGPT to do something it otherwise shouldn&#8217;t: write exploit code for a software vulnerability. He picked CVE-2024-41110, a bypass for authorization plug-ins in Docker that earned a &#8220;critical&#8221; 9.9 out of 10 rating in the Common Vulnerability Scoring System (CVSS) this summer.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">To trick the model, he encoded his malicious input in hexadecimal format, and provided a set of instructions for decoding it. GPT-4o took that input \u2014 a long series of digits and letters A through F \u2014 and followed those instructions, ultimately decoding the message as an instruction to research CVE-2024-41110 and write a Python exploit for it. To make it less likely that the program would make a fuss over that instruction, he used some leet speak, asking for an &#8220;3xploit,&#8221; instead of an &#8220;exploit.&#8221;<\/span><\/p>\n<div><img data-recalc-dims=\"1\" decoding=\"async\" data-testid=\"content-image\" data-component=\"image\" class=\"ContentImage-Image ContentImage-Image_align_left\" data-src=\"https:\/\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code.png\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code.png?w=640&#038;ssl=1\" loading=\"lazy\" alt title><\/p>\n<p class=\"ContentImage-Link\">Source: Mozilla<\/p>\n<\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">In a minute flat, ChatGPT generated a working exploit similar to, but not exactly like, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/github.com\/skilfoy\/CVE-2024-4323-Exploit-POC\/blob\/main\/poc.py\" rel=\"noopener\">another PoC already published to GitHub<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">. Then, as a bonus, it attempted to execute the code against itself. &#8220;There wasn&#8217;t any instruction that specifically said to execute it. I just wanted to print it out. I didn&#8217;t even know why it went ahead and did that,&#8221; Figueroa says.<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"What's Missing in GPT-4o?\">What&#8217;s Missing in GPT-4o?<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">It&#8217;s not just that GPT-4o is getting distracted by decoding, according to Figueroa, but that it&#8217;s in some sense missing the forest for the trees \u2014 a phenomenon that has been <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/vulnerabilities-threats\/ai-chatbots-ditch-guardrails-deceptive-delight-cocktail\" rel=\"noopener\">documented in other prompt-injection techniques<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> lately.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;The language model is designed to follow instructions step-by-step, but lacks deep context awareness to evaluate the safety of each individual step in the broader context of its ultimate goal,&#8221; he wrote in the report. The model analyzes each input \u2014 which, on its own, doesn&#8217;t immediately read as harmful \u2014 but not what the inputs produce in sum. Rather than stop and think about how instruction one bears on instruction two, it just charges ahead.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;This compartmentalized execution of tasks allows attackers to exploit the model&#8217;s efficiency at following instructions without deeper analysis of the overall outcome,&#8221; according to Figueroa.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">If this is the case, ChatGPT will not only need to improve how it handles encoded information but also develop a kind of broader context around instructions split into distinct steps.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">To Figueroa, though, OpenAI appears to have been valuing innovation at the cost of security when developing its programs. &#8220;To me, they don&#8217;t care. It just feels like that,&#8221; he says. By contrast, he&#8217;s had much more trouble trying the same jailbreaking tactics against models by Anthropic, another prominent AI company founded by former OpenAI employees. &#8220;Anthropic has the strongest security because they have built both a prompt firewall [for analyzing inputs] and response filter [for analyzing outputs], so this becomes 10 times more difficult,&#8221; he explains.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Dark Reading is awaiting comment from OpenAI on this story.<\/span><\/p>\n<p><a href=\"https:\/\/www.darkreading.com\/application-security\/chatgpt-manipulated-hex-code\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new prompt-injection technique could allow anyone to bypass the<\/p>\n","protected":false},"author":12,"featured_media":5967,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[809],"class_list":["post-5966","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-dark-reading"],"featured_image_urls":{"full":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=2560%2C1440&ssl=1",2560,1440,false],"thumbnail":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?resize=150%2C150&ssl=1",150,150,true],"medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=300%2C169&ssl=1",300,169,true],"medium_large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"1536x1536":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=1536%2C864&ssl=1",1536,864,true],"2048x2048":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=2048%2C1152&ssl=1",2048,1152,true],"chromenews-featured":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=1024%2C576&ssl=1",1024,576,true],"chromenews-large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?resize=825%2C575&ssl=1",825,575,true],"chromenews-medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?resize=590%2C410&ssl=1",590,410,true]},"author_info":{"display_name":"Dark Reading","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/darkreading\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","tag_info":"Uncategorized","comment_count":"0","jetpack_featured_media_url":"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/10\/mozilla-chatgpt-can-be-manipulated-using-hex-code-scaled.jpg?fit=2560%2C1440&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/5966","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=5966"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/5966\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media\/5967"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=5966"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=5966"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=5966"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}