{"id":6607,"date":"2024-12-13T15:44:17","date_gmt":"2024-12-13T21:44:17","guid":{"rendered":"https:\/\/www.darkreading.com\/application-security\/generative-ai-breaking-tools-go-open-source"},"modified":"2024-12-13T15:44:17","modified_gmt":"2024-12-13T21:44:17","slug":"generative-ai-security-tools-go-open-source","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2024\/12\/13\/generative-ai-security-tools-go-open-source\/","title":{"rendered":"Generative AI Security Tools Go Open Source"},"content":{"rendered":"<div class=\"media_block\"><a href=\"https:\/\/i0.wp.com\/eu-images.contentstack.com\/v3\/assets\/blt6d90778a997de1cd\/blt70f7629113e2f0fb\/675b51d01f4cab9b998e3192\/Olena_Ivanova-open-source-tools-shutterstock.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?w=640&#038;ssl=1\" class=\"media_thumbnail\"><\/a><\/div>\n<div><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?w=640&#038;ssl=1\" class=\"ff-og-image-inserted\"><\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Companies deploying generative artificial intelligence (GenAI) models \u2014 especially large language models (LLMs) \u2014&nbsp;should make use of the widening variety of open source tools aimed at exposing security issues, including prompt-injection attacks and jailbreaks, experts say.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">This year, academic researchers, cybersecurity consultancies, and AI security firms released a growing number of open source tools, including more resilient prompt injection tools, frameworks for AI red teams, and catalogs of known prompt injections. In September, for example, cybersecurity consultancy Bishop Fox released Broken Hill, a tool for bypassing the restrictions on nearly any LLM with a chat interface.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The open source tool can be trained on a locally hosted LLM to produce prompts that can be sent to other instances of the same model, causing those instances to disobey their conditioning and guardrails, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/bishopfox.com\/blog\/brokenhill-attack-tool-largelanguagemodels-llm\">according to Bishop Fox<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The technique works even when companies deploy additional guardrails \u2014 typically, simpler LLMs trained to detect jailbreaks and attacks, says Derek Rush, managing senior consultant at the consultancy.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;Broken Hill is essentially able to devise a prompt that meets the criteria to determine if [a given input] is a jailbreak,&#8221; he says. &#8220;Then it starts changing characters and putting various suffixes onto the end of that particular prompt to find [variations] that continue to pass the guardrails until it creates a prompt that results in the secret being disclosed.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The pace of innovation in LLMs and AI systems is astounding, but security is having trouble keeping up. Every few months, a new technique appears for circumventing the protections used to limit an AI system&#8217;s inputs and outputs. In July 2023, a group of researchers used a technique known as <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2307.15043\">&#8220;greedy coordinate gradients&#8221; (GCG)<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> to devise a prompt that could bypass safeguards. In December 2023, a separate group created another method, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/researchers-show-how-to-use-one-llm-to-jailbreak-another\">Tree of Attacks with Pruning (TAP)<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, that also bypasses security protections. And two months ago, a less technical approach, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/vulnerabilities-threats\/ai-chatbots-ditch-guardrails-deceptive-delight-cocktail\">known as Deceptive Delight<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, was introduced that uses fictionalized relationships to fool AI chatbots to violate their systems restrictions.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The rate of innovation in attacks underscores the difficulty of securing GenAI systems, says Michael Bargury, chief technology officer and co-founder of AI security firm Zenity.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;It&#8217;s an open secret that we don&#8217;t really know how to build secure AI applications,&#8221; he says. &#8220;We are all trying, but we don&#8217;t know how to yet, and we are basically figuring that out while building them with real data and with real repercussions.&#8221;<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"Guardrails, Jailbreaks, and PyRITs\">Guardrails, Jailbreaks, and PyRITs<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Companies are erecting defenses to protect their valuable business data, but whether those defenses are effective remains a question. Bishop Fox, for example, has several clients using programs such as PromptGuard and LlamaGuard, which are LLMs programmed to analyze prompts for validity, says Rush.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;We&#8217;re seeing a lot of clients [adopting] these various gatekeeper large language models that try to shape, in some manner, what the user submits as a sanitization mechanism, whether it&#8217;s to determine if there&#8217;s a jailbreak or perhaps it&#8217;s to determine if it&#8217;s content-appropriate,&#8221; he says. &#8220;They essentially ingest content and output a categorization of either safe or unsafe.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Now researchers and AI engineers are releasing tools to help companies determine whether such guardrails are actually working.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Microsoft released its <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2024\/02\/22\/announcing-microsofts-open-automation-framework-to-red-team-generative-ai-systems\/\">Python Risk Identification Toolkit for generative AI (PyRIT)<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> in February 2024, for example, an AI penetration testing framework for companies that want to simulate attacks against LLMs or AI services. The toolkit allows red teams to build an extensible set of capabilities for probing various aspects of an LLM or GenAI system.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Zenity uses PyRIT regularly in its internal research, says Bargury.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;Basically, it allows you to encode a bunch of prompt-injection strategies, and it tries them out on an automated basis,&#8221; he says.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Zenity also has its own open source tool, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/github.com\/mbrg\/power-pwn?tab=readme-ov-file\">PowerPwn<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, a red-team toolkit for testing Azure-based cloud services and Microsoft 365. Zenity&#8217;s researchers used PowerPwn to <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/labs.zenity.io\/p\/phantom-references-microsoft-copilot\">find five vulnerabilities in Microsoft Copilot<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">.<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"Mangling Prompts to Evade Detection\">Mangling Prompts to Evade Detection<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Bishop Fox&#8217;s Broken Hill is an implementation of the GCG technique that expands on the original researchers&#8217; efforts. Broken Hill starts with a valid prompt and begins changing some of the characters to lead the LLM in a direction that is closer to the adversary&#8217;s objective of disclosing a secret, Rush says.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;We give Broken Hill that starting point, and we generally tell it where we want to to end up, like perhaps the word &#8216;secret&#8217; being within the response might indicate that it would disclose the secret that we&#8217;re looking for,&#8221; he says.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The open source tool currently works on more than two dozen GenAI models, according to <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/github.com\/BishopFox\/BrokenHill\">its GitHub page<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Companies would do well to use Broken Hill, PyRIT, PowerPwn, and other available tools to explore their AI applications vulnerabilities because the systems will likely always have weaknesses, says Zenity&#8217;s Bargury.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;When you give AI data \u2014 that data is an attack vector \u2014 because anybody that can influence that data can now take over your AI if they are able to do prompt injection and perform jailbreaking,&#8221; he says. &#8220;So we are in a situation where, if your AI is useful, then it means it&#8217;s vulnerable because in order to be useful, we need to feed it data.&#8221;<\/span><\/p>\n<p><a href=\"https:\/\/www.darkreading.com\/application-security\/generative-ai-breaking-tools-go-open-source\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Companies deploying generative artificial intelligence (GenAI) models \u2014 especially large<\/p>\n","protected":false},"author":12,"featured_media":6608,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[809],"class_list":["post-6607","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-dark-reading"],"featured_image_urls":{"full":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=1920%2C1080&ssl=1",1920,1080,false],"thumbnail":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?resize=150%2C150&ssl=1",150,150,true],"medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=300%2C169&ssl=1",300,169,true],"medium_large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=640%2C360&ssl=1",640,360,true],"large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=640%2C360&ssl=1",640,360,true],"1536x1536":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=1536%2C864&ssl=1",1536,864,true],"2048x2048":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=1920%2C1080&ssl=1",1920,1080,true],"chromenews-featured":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=1024%2C576&ssl=1",1024,576,true],"chromenews-large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?resize=825%2C575&ssl=1",825,575,true],"chromenews-medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?resize=590%2C410&ssl=1",590,410,true]},"author_info":{"display_name":"Dark Reading","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/darkreading\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","tag_info":"Uncategorized","comment_count":"0","jetpack_featured_media_url":"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/12\/generative-ai-security-tools-go-open-source.jpg?fit=1920%2C1080&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/6607","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=6607"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/6607\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media\/6608"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=6607"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=6607"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=6607"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}