{"id":2242,"date":"2023-12-21T21:07:18","date_gmt":"2023-12-21T21:07:18","guid":{"rendered":"https:\/\/www.dnsfilter.com\/blog\/thoughts-on-new-gen-ai-category"},"modified":"2023-12-21T21:07:18","modified_gmt":"2023-12-21T21:07:18","slug":"mid-winter-nights-hallucinations-some-thoughts-on-our-new-genai-category","status":"publish","type":"post","link":"https:\/\/ddi.mohflo.net\/index.php\/2023\/12\/21\/mid-winter-nights-hallucinations-some-thoughts-on-our-new-genai-category\/","title":{"rendered":"Mid-Winter Nights Hallucinations: Some Thoughts on Our New GenAI Category"},"content":{"rendered":"<div><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/ddi.mohflo.net\/wp-content\/uploads\/2023\/12\/mid-winter-nights-hallucinations-some-thoughts-on-our-new-genai-category.jpg?w=640&#038;ssl=1\" class=\"ff-og-image-inserted\"><\/div>\n<p><span>AI, LLM, generative content, NLP, big data, neural processing, machine learning, GPT. In 2023 it&#8217;s undeniable that these were some of the most heard terms from various businesses, news outlets and the social media sphere. Ultimately this alphabet soup can mean just as much as it sometimes doesn\u2019t\u2014and, as often is the case, the internet leans into the trend.<\/span><span><br \/><\/span><span><br \/><\/span><span>Sites popped up everywhere\u2014some reputable while others less so\u2014promising cyberpunk profile pictures, curated dating advice, a quick summary to that book you swore up and down you\u2019d read for book club, business propositions, tweets, essays, marketing releases, code. The list of capabilities is dizzying when you get right down to it. An abundance of tools ready for you at the drop of a hat. But at what cost?<\/span><\/p>\n<p><!--more--><\/p>\n<h2>Generative AI Content Filtering<\/h2>\n<p>DNSFilter has recently implemented a Generative AI (GenAI) content category and we want to take some time to discuss this new block category, as well as some security considerations about the sites that might fall under this category.<\/p>\n<p>First, let\u2019s define what you\u2019ll find in the GenAI category when you toggle it on. We\u2019ve focused primarily on the free and open Generative AI tools that will generate content at a prompt, with a few extra chatbots from more unusual places. \u201cBut my [insert legitimate tool]!\u201d I can hear you say\u2014worry not! Generally, the tools that you\u2019ve integrated into your workflows that are paid for and are approved by your IT or security department are not likely to be found in this category.<\/p>\n<h2>Why focus on free Generative AI tools?<\/h2>\n<p>Let\u2019s look at some basics of training Artificial Intelligence to do what you want it to do. To keep it simple, we will not be touching on the various algorithms, data science and statistical math involved in getting these things to work.<\/p>\n<p>Picture an untrained AI as a freshman in college\u2014let\u2019s call this program \u201cBrian.\u201d&nbsp;<\/p>\n<p>Brian has gone through high school and gotten the basics of some things down and can ballpark some concepts from \u201ccollege level\u201d questions. This is your first AI framework that you\u2019ve coded to answer a question. Now this question can frankly be anything, and in this scenario we\u2019ve made Brian passionate about writing and understanding poems, so Brian majors in English Literature. <\/p>\n<p>In order for the AI to write its own poems and understand underlying themes, they\u2019re going to need to spend *a lot* of time in the library reading and interpreting other authors&#8217; works. Doubly so if they want to write in a specific style like iambic pentameter or haiku. And then Brian is going to try, and fail, and try, and fail\u2014again and again. Until their professor says \u201cclose\u201d and Brian gets closer to being a proficient haiku writer. Rinse and repeat until 9 out of 10 times Brian can produce a haiku the professor is satisfied with. <\/p>\n<p><em>Good<\/em> <em>AI training<\/em><\/p>\n<p><em>Is incredibly complex<\/em><\/p>\n<p><em>It snows on Mt Fuji<\/em><\/p>\n<h2>Show me the (copyrighted) data<\/h2>\n<p>Now that you have an idea how it generally (sometimes) works: What\u2019s the problem? Part of the issue is how much data Brian needs to get even vaguely close to writing a haiku. Where\u2019s that data come from? How\u2019s it sourced? Maybe (or even probably) this data contains information you didn\u2019t mean to include such as phone numbers, addresses, and financial information.<\/p>\n<p>In most cases, free tools use data from anywhere they can get their hands on, and more often than not your prompts are being used to train that AI even more. Public data, third party paid data, general web scraping, and even libraries of images on the internet are commonly used for training sources. It\u2019s truly a case of \u201conce it\u2019s on the internet, it\u2019s fair play\u201d at its purest.<\/p>\n<p>You may find yourself asking, \u201cIsn\u2019t that piracy?\u201d or \u201cIsn\u2019t that copyright infringement?\u201d&nbsp;<\/p>\n<p><span>Those questions and boundaries are exactly what lawmakers are trying to answer. <\/span><span><br \/><\/span><span><br \/><\/span><span>An alternative risk with open AI tools in your work environment is there is zero clue or visibility into where the training data came from. There is a very low chance it will be mind-blowingly original. More likely than not it is going to feel akin to an off-brand knockoff. There is an old trope that humanity has been telling the same seven stories over and over since the dawn of time\u2014except with AI generated content it takes this quite literally. Mathematically AI is finding just another variation on the same theme that has been fed to it in the beginning.<\/span><\/p>\n<h2><span>The very real security risks of free Generative AI tools<\/span><\/h2>\n<p>Copyright risks aside, there are genuine security risks of using free Generative AI tools, or allowing their use on your network. Remember how indiscriminately some AI engines consume content, not considering permissions matters. If you were to connect to an internal database, you have to assume that your database is now part of an external training set. This puts you at risk for leaks of your proprietary information, and in some malicious cases the owner of the tool may \u201crun off\u201d with that data itself.&nbsp;<span><\/span><\/p>\n<p>Researchers recently discovered that even ChatGPT can be prone to leaks and faults by asking the engine to repeat a word infinitely. After a period of time, an error would cause it to begin displaying unrelated and sensitive data. This exploit is now against its Terms of Service. (For more information on this fascinating bug see<a href=\"https:\/\/www.darkreading.com\/cyber-risk\/researchers-simple-technique-extract-chatgpt-training-data\"><span> here<\/span><\/a>)<\/p>\n<p>On the other end of malicious use, these tools are not security bastions\u2014they can be just as vulnerable to attacks and exploits as any other tool. It just so happens that their free open nature increases those odds. When it\u2019s all just there as an interface on a page or a Git repo away, it tends to be open season to try to get it to break, bend, or leak.&nbsp;<\/p>\n<p>Those risks don\u2019t even take into consideration AIs whose purpose is malicious from inception. There have been dummy bots that can propagate malware, generate phishing emails, flat out give misinformation, poison the well for other chat bots, participate in cryptoscams and background mining, commit general identity theft, or credential theft\u2014the list goes on all while imitating a \u201cnormal\u201d bot experience. <\/p>\n<p>Overall, we feel it is a net positive to toggle on the new Gen AI content category. Covering your bases from both unintentional leaks and malicious behavior can\u2019t be a bad thing. It\u2019s a new category, so it will be constantly improving and getting better over time. The discussion around AI and generative content is also constantly evolving and is a moving target for both business and security professionals. We may as well give the best effort we can now to prevent the issues of tomorrow.&nbsp;<\/p>\n<p>Find a tool we overlooked or want to put one up for checking? Send us a message and we\u2019ll look into it.&nbsp;<span><\/span><\/p>\n<p>That\u2019s all from the Intelligence Desk today, thanks for reading.&nbsp;<\/p>\n<p><a href=\"https:\/\/www.dnsfilter.com\/blog\/thoughts-on-new-gen-ai-category\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI, LLM, generative content, NLP, big data, neural processing, machine<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[235,588,222],"tags":[236,591,230],"class_list":["post-2242","post","type-post","status-publish","format-standard","hentry","category-ai","category-content-filtering","category-featured","tag-ai","tag-content-filtering","tag-featured"],"featured_image_urls":{"full":"","thumbnail":"","medium":"","medium_large":"","large":"","1536x1536":"","2048x2048":"","chromenews-featured":"","chromenews-large":"","chromenews-medium":""},"author_info":{"display_name":"DNSFilter","author_link":"https:\/\/ddi.mohflo.net\/index.php\/author\/dnsfilter\/"},"category_info":"<a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/ai\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/content-filtering\/\" rel=\"category tag\">Content Filtering<\/a> <a href=\"https:\/\/ddi.mohflo.net\/index.php\/category\/featured\/\" rel=\"category tag\">Featured<\/a>","tag_info":"Featured","comment_count":"0","jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/2242","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/comments?post=2242"}],"version-history":[{"count":0,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/posts\/2242\/revisions"}],"wp:attachment":[{"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/media?parent=2242"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/categories?post=2242"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ddi.mohflo.net\/index.php\/wp-json\/wp\/v2\/tags?post=2242"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}