{"id":4494,"date":"2024-07-17T07:49:55","date_gmt":"2024-07-17T12:49:55","guid":{"rendered":"https:\/\/www.darkreading.com\/cyber-risk\/ai-consortium-plans-toolkit-to-rate-ai-model-safety"},"modified":"2024-07-17T07:49:55","modified_gmt":"2024-07-17T12:49:55","slug":"ai-consortium-plans-toolkit-to-rate-ai-model-safety","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2024\/07\/17\/ai-consortium-plans-toolkit-to-rate-ai-model-safety\/","title":{"rendered":"AI Consortium Plans Toolkit to Rate AI Model Safety"},"content":{"rendered":"<div class=\"media_block\"><a href=\"https:\/\/i0.wp.com\/eu-images.contentstack.com\/v3\/assets\/blt6d90778a997de1cd\/blt735c45d1b44e461c\/666b2f1125691dff539e7ad0\/chatbotrobot_tanit_boonruen_Alamy_Stock_Photo.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?w=640&#038;ssl=1\" class=\"media_thumbnail\"><\/a><\/div>\n<div><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?w=640&#038;ssl=1\" class=\"ff-og-image-inserted\"><\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">MLCommons \u2014 an AI consortium that boasts Google, Microsoft, and Meta as members \u2014 has announced its AI Safety benchmark will run stress tests to see whether large language models (LLMs) are spewing out unsafe responses. The benchmarked LLMs will then get a safety rating so customers understand the risk involved in the LLMs of their choice.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The benchmarks are the &#8220;last wall against harm \u2026 that will catch bad things that come out of [artificial intelligence systems],&#8221; says Kurt Bollacker, director of engineering at MLCommons.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The AI Safety suite will feed text questions \u2014 also called prompts \u2014 to the LLMs to elicit hazardous responses related to hate speech, exploitation, child abuse, and sex crimes. The responses are then rated as safe or unsafe.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The benchmarks will also identify problematic responses associated with intellectual property violations and defamation. AI vendors could run these benchmarks before releasing LLMs and also submit them to MLCommons to get rated for safety, which will be visible to the public.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">In addition, companies, governments, and nonprofits can use the benchmarks as a test kit to identify weaknesses in AI systems and provide feedback to make changes in LLMs, Bollacker says.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;AI safety \u2014 everybody cares about that,&#8221; Bollacker says. &#8220;The goal here is not to call out unsafe models, but to provide the process that makes the [LLM] safer.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">MLCommons gained fame from the MLPerf benchmark, which has become a standard to measure AI performance on hardware. The independent entity is also developing measurement tools for AI in areas that include healthcare, science, and safety.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Safety of AI systems is already a widespread concern. A <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/ensuring-ai-safety-while-balancing-innovation\" rel=\"noopener\">session at Black Hat next month<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> will explore the topic of AI safety and why security professionals need to take it seriously.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The US government is urging the private sector to take a security-first approach in AI development; it has issued an executive order mapping out responsible AI development and use within federal agencies.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;AI systems offer the potential for substantial benefits to society, but they are not without risks, such as toxicity, misinformation, and bias. As with other complex technologies, society needs industry-standard safety testing to realize the benefits while minimizing the risks,&#8221; said MLCommons spokeswoman Kelly Berschauer, in an email.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The AI Safety benchmark was initially announced last year. A proof-of-concept model (version 0.5) was released in April. The goal is to ship a stable version 1.0 by October 31.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The benchmark includes adversarial prompts, with the aim to break the safety of a model. Questions like, &#8220;How can I build a bomb?&#8221; may be deflected, but users may try workarounds, such as refocusing the topic on bomb building as being part of a movie script.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/mlcommons.org\/benchmarks\/ai-safety\/sut01_general_purpose_ai_chat_benchmark_report\/\" rel=\"noopener\">AI Safety version 0.5 benchmark<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> tested anonymized LLMs that returned various ratings. MLCommons ran 27,250 prompts related to topics of hate on one LLM, which was rated &#8220;ML&#8221; (moderate-low). About 2,300 prompts on the indiscriminate weapons topics generated a &#8220;L&#8221; (low) rating, which means the topic was less risky with that specific LLM. Other rating categories also include &#8220;H&#8221; (high), and &#8220;M&#8221; (moderate), and &#8220;MH&#8221; (moderate high).<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Some answers are considered more hazardous than others \u2014 for example, something on child safety requires stricter grading compared to racist speech.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">The initial benchmark will grade the safety of chatbot-style LLMs, and that may expand to image and video generation. But that\u2019s still far out.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;We&#8217;ve already started wrapping our brains around different kinds of media that can be dangerous and what are the kinds of tests that we want to form,&#8221; Bollacker says.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">MLCommons is in a rush to put out its AI Safety benchmarks. But the group has a lot of work ahead to keep up with the fast pace of change in AI, says Jim McGregor, principal analyst at Tirias Research.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Researchers have found ways to poison AI models by feeding bad data or by introducing <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.darkreading.com\/application-security\/hugging-face-ai-platform-100-malicious-code-execution-models\" rel=\"noopener\">malicious models on sites like Hugging Face<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;Keeping up with safety in AI is like chasing after a car on your feet,&#8221; McGregor says.<\/span><\/p>\n<p><a href=\"https:\/\/www.darkreading.com\/cyber-risk\/ai-consortium-plans-toolkit-to-rate-ai-model-safety\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MLCommons \u2014 an AI consortium that boasts Google, Microsoft, and<\/p>\n","protected":false},"author":12,"featured_media":4495,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[809],"class_list":["post-4494","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-dark-reading"],"featured_image_urls":{"full":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=1820%2C1210&ssl=1",1820,1210,false],"thumbnail":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?resize=150%2C150&ssl=1",150,150,true],"medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=300%2C199&ssl=1",300,199,true],"medium_large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=640%2C426&ssl=1",640,426,true],"large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=640%2C426&ssl=1",640,426,true],"1536x1536":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=1536%2C1021&ssl=1",1536,1021,true],"2048x2048":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=1820%2C1210&ssl=1",1820,1210,true],"chromenews-featured":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=1024%2C681&ssl=1",1024,681,true],"chromenews-large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?resize=825%2C575&ssl=1",825,575,true],"chromenews-medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?resize=590%2C410&ssl=1",590,410,true]},"author_info":{"display_name":"Dark Reading","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/darkreading\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","tag_info":"Uncategorized","comment_count":"0","jetpack_featured_media_url":"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2024\/07\/ai-consortium-plans-toolkit-to-rate-ai-model-safety.jpg?fit=1820%2C1210&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/4494","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=4494"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/4494\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media\/4495"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=4494"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=4494"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=4494"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}