{"id":7331,"date":"2025-02-14T09:00:00","date_gmt":"2025-02-14T15:00:00","guid":{"rendered":"https:\/\/www.darkreading.com\/cyber-risk\/open-source-ai-models-pose-risks-of-malicious-code-vulnerabilities"},"modified":"2025-02-14T09:00:00","modified_gmt":"2025-02-14T15:00:00","slug":"open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2025\/02\/14\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities\/","title":{"rendered":"Open Source AI Models: Perfect Storm for Malicious Code, Vulnerabilities"},"content":{"rendered":"<div class=\"media_block\"><a href=\"https:\/\/i0.wp.com\/eu-images.contentstack.com\/v3\/assets\/blt6d90778a997de1cd\/blt6f54153ad5bf556b\/67ae7562cd588ef0db105419\/robot-Zoonar_GmbH-Alamy.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities.jpg?w=640&#038;ssl=1\" class=\"media_thumbnail\"><\/a><\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Attackers are finding more and more ways to post malicious projects to Hugging Face and other repositories for open source artificial intelligence (AI) models, while dodging the sites&#8217; security checks. The escalating problem underscores the need for companies pursuing internal AI projects to have robust mechanisms to detect security flaws and malicious code within their supply chains.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Hugging Face&#8217;s automated checks, for example, recently failed to detect malicious code in two AI models hosted on the repository, according to a Feb. 3 analysis published by software supply chain security firm ReversingLabs. The threat actor used a common vector \u2014 data files using the Pickle format \u2014&nbsp;with a new technique, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.reversinglabs.com\/blog\/rl-identifies-malware-ml-model-hosted-on-hugging-face\">dubbed &#8220;NullifAI,&#8221;<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> to evade detection.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">While the attacks appeared to be proofs-of-concept, their success in being hosted with a &#8220;No issue&#8221; tag shows that companies should not rely on Hugging Face&#8217;s and other repositories&#8217; safety checks for their own security, says Tomislav Pericin, chief software architect at ReversingLabs.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;You have this public repository where any developer or machine learning expert can host their own stuff, and obviously malicious actors abuse that,&#8221; he says. &#8220;Depending on the ecosystem, the vector is going to be slightly different, but the idea is the same: Someone&#8217;s going to host a malicious version of a thing and hope for you to inadvertently install it.&#8221;<\/span><\/p>\n<p data-component=\"related-article\" class=\"RelatedArticle\"><span data-testid=\"related-article-title\" class=\"RelatedArticle-Title\">Related:<\/span><a class=\"RelatedArticle-RelatedContent\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/how-banks-adapt-rising-threat-financial-crime\" target=\"_self\" data-discover=\"true\">How Banks Can Adapt to the Rising Threat of Financial Crime<\/a><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Companies are quickly adopting AI, and the majority are also establishing internal projects using open source AI models from repositories \u2014 such as Hugging Face, TensorFlow Hub, and PyTorch Hub. Overall, 61% of companies are using models from the open source ecosystem to create their own AI tools, according to a <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/newsroom.ibm.com\/2024-12-19-IBM-Study-More-Companies-Turning-to-Open-Source-AI-Tools-to-Unlock-ROI\">Morning Consult survey<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> of 2,400 IT decision-makers sponsored by IBM.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Yet many of the components can contain executable code, leading to a variety of security risks, such as code execution, backdoors, prompt injections, and alignment issues \u2014 the latter being how well an AI model matches the intent of the developers and users.<\/span><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"In an Insecure Pickle\">In an Insecure Pickle<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">One significant issue is that a commonly used data format, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/docs.python.org\/3\/library\/pickle.html\">known as a Pickle file<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">, is not secure and can be used to execute arbitrary code. Despite vocal warnings from security researchers, the <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.darkreading.com\/cloud-security\/critical-bugs-hugging-face-ai-platform-pickle\">Pickle format<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"> continues to be used by many data scientists, says Tom Bonner, vice president of research at HiddenLayer, an AI-focused detection and response firm.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;I really hoped that we&#8217;d make enough noise about it that Pickle would&#8217;ve gone by now, but it&#8217;s not,&#8221; he says. &#8220;I&#8217;ve seen organizations compromised through machine learning models \u2014 multiple [organizations] at this point. So yeah, whilst it&#8217;s not an everyday occurrence such as ransomware or phishing campaigns, it does happen.&#8221;<\/span><\/p>\n<p data-component=\"related-article\" class=\"RelatedArticle\"><span data-testid=\"related-article-title\" class=\"RelatedArticle-Title\">Related:<\/span><a class=\"RelatedArticle-RelatedContent\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/warning-tunnel-of-love-leads-to-scams\" target=\"_self\" data-discover=\"true\">Warning: Tunnel of Love Leads to Scams<\/a><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">While Hugging Face has explicit checks for Pickle files, the malicious code discovered by ReversingLabs sidestepped those checks by using a different file compression for the data. Other research by application security firm Checkmarx found multiple ways to bypass the scanners, such as PickleScan used by Hugging Face, to detect dangerous Pickle files.<\/span><\/p>\n<div readability=\"9\"><img data-recalc-dims=\"1\" decoding=\"async\" data-testid=\"content-image\" data-component=\"image\" class=\"ContentImage-Image ContentImage-Image_align_left\" data-src=\"https:\/\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-1.jpg\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-1.jpg?w=640&#038;ssl=1\" loading=\"lazy\" alt=\"ReversingLabs example malicious model file\" title=\"ReversingLabs example malicious model file\"><\/p>\n<p class=\"ContentImage-Link\">Despite having malicious features, this model passes security checks on Hugging Face. Source: ReversingLabs<\/p>\n<\/div>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;PickleScan uses a blocklist which was successfully bypassed using both built-in Python dependencies,&#8221; Dor Tumarkin, director of application security research at Checkmarx, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/checkmarx.com\/blog\/free-hugs-what-to-be-wary-of-in-hugging-face-part-4\/\">stated in the analysis<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">. &#8220;It is plainly vulnerable, but by using third-party dependencies such as Pandas to bypass it, even if it were to consider <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><span class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_italic\">all <\/span><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">cases baked into Python, it would still be vulnerable with very popular imports in its scope.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Rather than Pickle files, data science and AI teams should move to Safetensors \u2014 a library for a new data format managed by Hugging Face, EleutherAI, and Stability AI \u2014&nbsp;which <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/huggingface.co\/blog\/safetensors-security-audit\">has been audited for security<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">. The Safetensors format is considered much safer than the Pickle format.<\/span><\/p>\n<p data-component=\"related-article\" class=\"RelatedArticle\"><span data-testid=\"related-article-title\" class=\"RelatedArticle-Title\">Related:<\/span><a class=\"RelatedArticle-RelatedContent\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/doge-flouting-cybersecurity-us-data\" target=\"_self\" data-discover=\"true\">Roundtable: Is DOGE Flouting Cybersecurity for US Data?<\/a><\/p>\n<h2 class=\"ContentText ContentText_variant_h2 ContentText_align_left\" data-testid=\"content-text\" id=\"Deep-Seated AI Vulnerabilities\">Deep-Seated AI Vulnerabilities<\/h2>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Executable data files are not the only threats, however. Licensing is another issue: While pretrained AI models are frequently called &#8220;open source AI,&#8221; they generally do not provide all the information needed to reproduce the AI model, such as code and training data. Instead, they provide the weights generated by the training and are covered by licenses that are not always open source compatible.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Creating commercial products or services from such models can potentially result in violating the licenses, says Andrew Stiefel, a senior product manager at Endor Labs.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;There&#8217;s a lot of complexity in the licenses for models,&#8221; he says. &#8220;You have the actual model binary itself, the weights, the training data, all of those could have different licenses, and you need to understand what that means for your business.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Model alignment \u2014 how well its output aligns with the developers&#8217; and users&#8217; values \u2014 is the final wildcard. DeepSeek, for example, allows users to create malware and viruses, <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.darkreading.com\/cyber-risk\/deepseek-fails-multiple-security-tests-business-use\">researchers found<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">. Other models \u2014 such as OpenAI&#8217;s o3-mini model, which boasts more stringent alignment \u2014 has <\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\"><a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.darkreading.com\/application-security\/researcher-jailbreaks-openai-o3-mini\">already been jail broken by researchers<\/a><\/span><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">These problems are unique to AI systems and the boundaries of how to test for such weaknesses remains a fertile field for researchers, says ReversingLabs&#8217; Pericin.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;There is already research about what kind of prompts would trigger the model to behave in an unpredictable way, divulge confidential information, or teach things that could be harmful,&#8221; he says. &#8220;That&#8217;s a whole other discipline of machine learning model safety that people are, in all honesty, mostly worried about today.&#8221;<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">Companies should make sure to understand any licenses covering the AI models they are using. In addition, they should pay attention to common signals of software safety, including the source of the model, development activity around the model, its popularity, and the operational and security risks, Endor&#8217;s Stiefel says.<\/span><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\"><span class=\"ContentText ContentText_variant_bodyNormal\" data-testid=\"content-text\">&#8220;You kind of need to manage AI models like you would any other open source dependencies,&#8221; Stiefel says. &#8220;They&#8217;re built by people outside of your organization and you&#8217;re bringing them in, and so that means you need to take that same holistic approach to looking at risks.&#8221;<\/span><\/p>\n<p><a href=\"https:\/\/www.darkreading.com\/cyber-risk\/open-source-ai-models-pose-risks-of-malicious-code-vulnerabilities\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Attackers are finding more and more ways to post malicious<\/p>\n","protected":false},"author":12,"featured_media":7332,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[809],"class_list":["post-7331","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-dark-reading"],"featured_image_urls":{"full":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=2560%2C1440&ssl=1",2560,1440,false],"thumbnail":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?resize=150%2C150&ssl=1",150,150,true],"medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=300%2C169&ssl=1",300,169,true],"medium_large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=640%2C360&ssl=1",640,360,true],"1536x1536":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=1536%2C864&ssl=1",1536,864,true],"2048x2048":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=2048%2C1152&ssl=1",2048,1152,true],"chromenews-featured":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=1024%2C576&ssl=1",1024,576,true],"chromenews-large":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?resize=825%2C575&ssl=1",825,575,true],"chromenews-medium":["https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?resize=590%2C410&ssl=1",590,410,true]},"author_info":{"display_name":"Dark Reading","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/darkreading\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","tag_info":"Uncategorized","comment_count":"0","jetpack_featured_media_url":"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2025\/02\/open-source-ai-models-perfect-storm-for-malicious-code-vulnerabilities-scaled.jpg?fit=2560%2C1440&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/7331","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=7331"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/7331\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media\/7332"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=7331"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=7331"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=7331"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}