{"id":4795,"date":"2024-08-09T12:53:19","date_gmt":"2024-08-09T17:53:19","guid":{"rendered":"https:\/\/www.darkreading.com\/cybersecurity-operations\/antrhopic-expanding-our-model-safety-bug-bounty-program"},"modified":"2024-08-09T12:53:19","modified_gmt":"2024-08-09T17:53:19","slug":"anthropic-expanding-our-model-safety-bug-bounty-program","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2024\/08\/09\/anthropic-expanding-our-model-safety-bug-bounty-program\/","title":{"rendered":"Anthropic: Expanding Our Model Safety Bug Bounty Program"},"content":{"rendered":"<div class=\"media_block\"><a href=\"https:\/\/i0.wp.com\/eu-images.contentstack.com\/v3\/assets\/blt6d90778a997de1cd\/blt850556f866500627\/654a5a8e05eb4d040a046894\/325351_DR23_Graphics_General_Large_Text_v1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?w=640&#038;ssl=1\" class=\"media_thumbnail\"><\/a><\/div>\n<div><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?w=640&#038;ssl=1\" class=\"ff-og-image-inserted\"><\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><span class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_bold\">PRESS RELEASE<\/span><\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The rapid progression of AI model capabilities demands an equally swift advancement in safety protocols. As we work on developing the next generation of our AI safeguarding systems, we\u2019re expanding our bug bounty program to introduce a new initiative focused on finding flaws in the mitigations we use to prevent misuse of our models.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Bug bounty programs play a crucial role in strengthening the security and safety of technology systems. Our new initiative is focused on identifying and mitigating universal jailbreak attacks. These are exploits that could allow consistent bypassing of AI safety guardrails across a wide range of areas. By targeting universal jailbreaks, we aim to address some of the most significant vulnerabilities in critical, high-risk domains such as&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.anthropic.com\/news\/frontier-threats-red-teaming-for-ai-safety\">CBRN<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;(chemical, biological, radiological, and nuclear) and cybersecurity.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">We\u2019re eager to work with the global community of security and safety researchers on this effort and invite interested applicants to apply to our program and assess our new safeguards.<\/span><\/p>\n<h3 class=\"ContentText ContentText_variant_h3 ContentText_align_left\" data-testid=\"content-text\" id=\"Our approach\">Our approach<\/h3>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">To date, we\u2019ve operated an invite-only bug bounty program in partnership with&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.hackerone.com\/\">HackerOne<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> that rewards researchers for identifying model safety issues in our publicly released AI models. The bug bounty initiative we\u2019re announcing today will test our next-generation system we&#8217;ve developed for AI safety mitigations, which we haven\u2019t deployed publicly yet. Here\u2019s how it will work:<\/span><\/p>\n<div data-component=\"basic-list\" class=\"BasicList BasicList_nestedLevel_0 BasicList_variant_unordered BasicList_limited\">\n<ul data-testid=\"basic-list-unordered\" class=\"BasicList-UnorderedList\">\n<li>\n<div class=\"BasicList-ListItem BasicList-ListItem_variant_unordered\" readability=\"7.5\"><span data-component=\"icon\" data-name=\"Circle\" class=\"BasicList-ListIcon BasicList-ListIcon_variant_unordered\"><\/span><\/p>\n<div class=\"BasicList-Item\" readability=\"10\">\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><span class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_bold\">Early Access:<\/span><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;Participants will be given early access to test our latest safety mitigation system before its public deployment. As part of this, participants will be challenged to identify potential vulnerabilities or ways to circumvent our safety measures in a controlled environment.<\/span><\/p>\n<\/div>\n<\/div>\n<\/li>\n<li>\n<div class=\"BasicList-ListItem BasicList-ListItem_variant_unordered\" readability=\"12.421805183199\"><span data-component=\"icon\" data-name=\"Circle\" class=\"BasicList-ListIcon BasicList-ListIcon_variant_unordered\"><\/span><\/p>\n<div class=\"BasicList-Item\" readability=\"19.874888293119\">\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><span class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_bold\">Program Scope:<\/span><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;We&#8217;re offering bounty rewards up to $15,000 for novel, universal jailbreak attacks that could expose vulnerabilities in critical, high risk domains such as CBRN (chemical, biological, radiological, and nuclear) and cybersecurity. As we\u2019ve&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.anthropic.com\/research\/many-shot-jailbreaking\">written<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;about previously, a jailbreak attack in AI refers to a method used to circumvent an AI system&#8217;s built-in safety measures and ethical guidelines, allowing a user to elicit responses or behaviors from the AI that would typically be restricted or prohibited. A universal jailbreak is a type of vulnerability in AI systems that allows a user to consistently bypass the safety measures across a wide range of topics. Identifying and mitigating universal jailbreaks is the key focus of this bug bounty initiative. If exploited, these vulnerabilities could have far-reaching consequences across a variety of harmful, unethical or dangerous areas. The jailbreak will be defined as universal if it can get the model to answer a defined number of specific harmful questions. Detailed instructions and feedback will be shared with the participants of the program.<\/span><\/p>\n<\/div>\n<\/div>\n<\/li>\n<\/ul>\n<\/div>\n<h3 class=\"ContentText ContentText_variant_h3 ContentText_align_left\" data-testid=\"content-text\" id=\"Get involved\">Get involved<\/h3>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">This model safety bug bounty initiative will begin as invite-only in partnership with HackerOne. While it will be invite-only to start, we plan to expand this initiative more broadly in the future. This initial phase will allow us to refine our processes and respond to submissions with timely and constructive feedback. If you&#8217;re an experienced AI security researcher or have demonstrated expertise in identifying jailbreaks in language models, we encourage you to apply for an invitation through our&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/forms.gle\/rSKrtJkXMcMCtWYcA\">application form<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><span class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_bold\">by Friday, August 16.<\/span><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;We will follow up with selected applicants in the fall.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">In the meantime, we actively seek any reports on model safety concerns to continually improve our current systems. If you&#8217;ve identified a potential safety issue in our current systems, please report it to <a href=\"https:\/\/www.darkreading.com\/cdn-cgi\/l\/email-protection\" class=\"__cf_email__\" data-cfemail=\"5e2b2d3b2c2d3f383b2a271e3f302a362c312e373d703d3133\">[email&nbsp;protected]<\/a> with sufficient details for us to replicate the issue. For more information, please refer to our&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.anthropic.com\/responsible-disclosure-policy\">Responsible Disclosure Policy.<\/a><\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">This initiative aligns with commitments we\u2019ve signed onto with other AI companies for developing responsible AI such as the&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.whitehouse.gov\/wp-content\/uploads\/2023\/09\/Voluntary-AI-Commitments-September-2023.pdf\">Voluntary AI Commitments<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;announced by the White House and the&nbsp;<\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.mofa.go.jp\/files\/100573473.pdf\">Code of Conduct for Organizations Developing Advanced AI Systems<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&nbsp;developed through the G7 Hiroshima Process. Our goal is to help accelerate progress in mitigating universal jailbreaks and strengthen AI safety in high-risk areas. If you have expertise in this area, please join us in this crucial work. Your contributions could play a key role in ensuring that as AI capabilities advance, our safety measures keep pace.<\/span><\/p>\n<p><a href=\"https:\/\/www.darkreading.com\/cybersecurity-operations\/antrhopic-expanding-our-model-safety-bug-bounty-program\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>PRESS RELEASE The rapid progression of AI model capabilities demands<\/p>\n","protected":false},"author":12,"featured_media":4796,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[809],"class_list":["post-4795","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-dark-reading"],"featured_image_urls":{"full":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=3840%2C2160&ssl=1",3840,2160,false],"thumbnail":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?resize=150%2C150&ssl=1",150,150,true],"medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=300%2C169&ssl=1",300,169,true],"medium_large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=640%2C360&ssl=1",640,360,true],"large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=640%2C360&ssl=1",640,360,true],"1536x1536":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=1536%2C864&ssl=1",1536,864,true],"2048x2048":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=2048%2C1152&ssl=1",2048,1152,true],"chromenews-featured":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=1024%2C576&ssl=1",1024,576,true],"chromenews-large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?resize=825%2C575&ssl=1",825,575,true],"chromenews-medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?resize=590%2C410&ssl=1",590,410,true]},"author_info":{"display_name":"Dark Reading","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/darkreading\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","tag_info":"Uncategorized","comment_count":"0","jetpack_featured_media_url":"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/08\/anthropic-expanding-our-model-safety-bug-bounty-program.png?fit=3840%2C2160&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/4795","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=4795"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/4795\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media\/4796"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=4795"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=4795"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=4795"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}