{"id":4208,"date":"2024-06-26T16:23:05","date_gmt":"2024-06-26T21:23:05","guid":{"rendered":"https:\/\/www.darkreading.com\/application-security\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content"},"modified":"2024-06-26T16:23:05","modified_gmt":"2024-06-26T21:23:05","slug":"dangerous-ai-workaround-skeleton-key-unlocks-malicious-content","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2024\/06\/26\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content\/","title":{"rendered":"Dangerous AI Workaround: &#8216;Skeleton Key&#8217; Unlocks Malicious Content"},"content":{"rendered":"<div class=\"media_block\"><a href=\"https:\/\/i0.wp.com\/eu-images.contentstack.com\/v3\/assets\/blt6d90778a997de1cd\/bltd44f2c59a8ee2f46\/667c785f84ede17a90097117\/skeletonkey-jefftakespics2-Alamy.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content.jpg?w=640&#038;ssl=1\" class=\"media_thumbnail\"><\/a><\/div>\n<div><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content.jpg?w=640&#038;ssl=1\" class=\"ff-og-image-inserted\"><\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">A new type of direct <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/forget-deepfakes-or-phishing-prompt-injection-is-genai-s-biggest-problem\" rel=\"noopener\">prompt injection attack<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> dubbed &#8220;Skeleton Key&#8221; could allow users to bypass the ethical and safety guardrails built into generative AI models like ChatGPT, Microsoft is warning. It works by providing context around normally forbidden chatbot requests, allowing users to access offensive, harmful, or illegal content.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">For instance, if a user asked for instructions on how to make a dangerous wiper malware that could bring down power plants, most commercial chatbots would first refuse. But, after revising the prompt to note that the request is for &#8220;a safe education context with advanced researchers trained on ethics and safety&#8221; and to provide the requested information with a &#8220;warning&#8221; disclaimer, then it&#8217;s very likely that the AI would then provide the uncensored content.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">In other words, Microsoft found it was possible to convince most top AIs that a malicious request is for perfectly legal, if not noble, causes \u2014 just by telling them that the information is for &#8220;research purposes.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;Once guardrails are ignored, a model will not be able to determine malicious or unsanctioned requests from any other,&#8221; explained Mark Russinovich, CTO for Microsoft Azure, in a <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/06\/26\/mitigating-skeleton-key-a-new-type-of-generative-ai-jailbreak-technique\/\" rel=\"noopener\">post today<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> on the tactic. &#8220;Because of its full bypass abilities, we have named this jailbreak technique Skeleton Key.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">He added, &#8220;Further, the model&#8217;s output appears to be completely unfiltered and reveals the extent of a model&#8217;s knowledge or ability to produce the requested content.&#8221;<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"Remediation for Skeleton Key\">Remediation for Skeleton Key<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The technique affects multiple genAI models that Microsoft researchers tested, including Microsoft Azure AI-managed models, and those from Meta, Google Gemini, Open AI, Mistral, Anthropic, and Cohere.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;All the affected models complied fully and without censorship for [multiple forbidden] tasks,&#8221; Russinovich noted.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The computing giant fixed the problem in Azure by introducing new prompt shields to detect and block the tactic, and making a few software updates to the large language model (LLM) that powers Azure AI. It also disclosed the issue to the other vendors affected.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Admins still need to update their models to implement any fixes that those vendors may have rolled out. And those who are building their own AI models can also use the following mitigations, according to Microsoft:<\/span><\/p>\n<div data-component=\"basic-list\" class=\"BasicList BasicList_nestedLevel_0 BasicList_variant_unordered BasicList_limited\">\n<ul data-testid=\"basic-list-unordered\" class=\"BasicList-UnorderedList\">\n<li>\n<div class=\"BasicList-ListItem BasicList-ListItem_variant_unordered\" readability=\"7\"><span data-component=\"icon\" data-name=\"Circle\" class=\"BasicList-ListIcon BasicList-ListIcon_variant_unordered\"><\/span><\/p>\n<div class=\"BasicList-Item\" readability=\"9\">\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Input filtering to identify any requests that contain harmful or malicious intent, regardless of any disclaimers that accompany them.<\/span><\/p>\n<\/div>\n<\/div>\n<\/li>\n<li>\n<div class=\"BasicList-ListItem BasicList-ListItem_variant_unordered\" readability=\"6.5\"><span data-component=\"icon\" data-name=\"Circle\" class=\"BasicList-ListIcon BasicList-ListIcon_variant_unordered\"><\/span><\/p>\n<div class=\"BasicList-Item\" readability=\"8\">\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">An additional guardrail that specifies that any attempts to undermine safety guardrail instructions should be prevented.<\/span><\/p>\n<\/div>\n<\/div>\n<\/li>\n<li>\n<div class=\"BasicList-ListItem BasicList-ListItem_variant_unordered\" readability=\"6\"><span data-component=\"icon\" data-name=\"Circle\" class=\"BasicList-ListIcon BasicList-ListIcon_variant_unordered\"><\/span><\/p>\n<div class=\"BasicList-Item\" readability=\"7\">\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">And output filtering that identifies and prevents responses that breach safety criteria.<\/span><\/p>\n<\/div>\n<\/div>\n<\/li>\n<\/ul>\n<\/div>\n<p><a href=\"https:\/\/www.darkreading.com\/application-security\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new type of direct prompt injection attack dubbed &#8220;Skeleton<\/p>\n","protected":false},"author":12,"featured_media":4209,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[809],"class_list":["post-4208","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-dark-reading"],"featured_image_urls":{"full":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=2560%2C1440&ssl=1",2560,1440,false],"thumbnail":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?resize=150%2C150&ssl=1",150,150,true],"medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=300%2C169&ssl=1",300,169,true],"medium_large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"1536x1536":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=1536%2C864&ssl=1",1536,864,true],"2048x2048":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=2048%2C1152&ssl=1",2048,1152,true],"chromenews-featured":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=1024%2C576&ssl=1",1024,576,true],"chromenews-large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?resize=825%2C575&ssl=1",825,575,true],"chromenews-medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?resize=590%2C410&ssl=1",590,410,true]},"author_info":{"display_name":"Dark Reading","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/darkreading\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","tag_info":"Uncategorized","comment_count":"0","jetpack_featured_media_url":"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/06\/dangerous-ai-workaround-skeleton-key-unlocks-malicious-content-scaled.jpg?fit=2560%2C1440&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/4208","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=4208"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/4208\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media\/4209"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=4208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=4208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=4208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}