{"id":281645,"date":"2024-12-02T12:00:30","date_gmt":"2024-12-02T12:00:30","guid":{"rendered":"https:\/\/www.transcend.org\/tms\/?p=281645"},"modified":"2024-11-30T07:15:14","modified_gmt":"2024-11-30T07:15:14","slug":"meta-powered-military-chatbot-advertised-giving-worthless-advice-on-airstrikes","status":"publish","type":"post","link":"https:\/\/www.transcend.org\/tms\/2024\/12\/meta-powered-military-chatbot-advertised-giving-worthless-advice-on-airstrikes\/","title":{"rendered":"Meta-Powered Military Chatbot Advertised Giving \u201cWorthless\u201d Advice on Airstrikes"},"content":{"rendered":"<div id=\"attachment_281647\" style=\"width: 510px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/11\/Meta-Defense-LLama-military-ai-chatgpt.webp\" ><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-281647\" class=\"wp-image-281647\" src=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/11\/Meta-Defense-LLama-military-ai-chatgpt-1024x512.webp\" alt=\"\" width=\"500\" height=\"250\" srcset=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/11\/Meta-Defense-LLama-military-ai-chatgpt-1024x512.webp 1024w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/11\/Meta-Defense-LLama-military-ai-chatgpt-300x150.webp 300w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/11\/Meta-Defense-LLama-military-ai-chatgpt-768x384.webp 768w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/11\/Meta-Defense-LLama-military-ai-chatgpt-1536x768.webp 1536w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/11\/Meta-Defense-LLama-military-ai-chatgpt.webp 2000w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><p id=\"caption-attachment-281647\" class=\"wp-caption-text\">Illustration: The Intercept \/ Photo: JACK GUEZ\/AFP via Getty Images<\/p><\/div>\n<blockquote><p><em>The marketing of a new military tech tool powered by Meta\u2019s artificial intelligence is \u201cirresponsible\u201d and \u201cclumsy,\u201d experts said. <\/em><\/p><\/blockquote>\n<p><em>24 Nov 2024 <\/em>&#8211; <span class=\"has-underline\">Meta\u2019s in-house<\/span> ChatGPT competitor is being marketed unlike anything that\u2019s ever come out of the social media giant before: a convenient tool for planning airstrikes.<\/p>\n<div class=\"entry-content__content\">\n<p>As it has invested billions into developing machine learning technology it hopes can outpace OpenAI and other competitors, Meta has pitched its flagship large language model<ins>,<\/ins> Llama<ins>,<\/ins> as a handy way of <a href=\"https:\/\/about.fb.com\/news\/2024\/04\/meta-ai-assistant-built-with-llama-3\/\"  target=\"_blank\" rel=\"noopener noreferrer\" aria-describedby=\"targetBlankDescription\">planning vegan dinners<\/a> or weekends away with friends. A provision in Llama\u2019s terms of service previously prohibited military uses, but Meta announced on November 4 that it was joining its chief rivals and getting into the business of war.<\/p>\n<p>\u201cResponsible uses of open source AI models promote global security and help establish the U.S. in the global race for AI leadership,\u201d Meta proclaimed in a blog post by global affairs chief Nick Clegg.<\/p>\n<p>One of these \u201cresponsible uses\u201d is a partnership with Scale AI, a $14 billion machine learning startup and thriving defense contractor. Following the policy change, Scale now uses Llama 3.0 to power a chat tool for governmental users who want to \u201capply the power of generative AI to their unique use cases, such as planning military or intelligence operations and understanding adversary vulnerabilities,\u201d according to a press release.<\/p>\n<p>But there\u2019s a problem: Experts tell The Intercept that the government-only tool, called \u201cDefense Llama,\u201d is being advertised by showing it give terrible advice about how to blow up a building. Scale AI defended the advertisement by telling The Intercept its marketing is not intended to accurately represent its product\u2019s capabilities.<\/p>\n<p>Llama 3.0 is a so-called open source model, meaning that users can download it, use it, and alter it, free of charge, unlike OpenAI\u2019s offerings. Scale AI says it has customized Meta\u2019s technology to provide military expertise.<\/p>\n<p>Scale AI touts Defense Llama\u2019s accuracy, as well as its adherence to norms, laws, and regulations: \u201cDefense Llama was trained on a vast dataset, including military doctrine, international humanitarian law, and relevant policies designed to align with the Department of Defense (DoD) guidelines for armed conflict as well as the DoD\u2019s Ethical Principles for Artificial Intelligence. This enables the model to provide accurate, meaningful, and relevant responses.\u201d<\/p>\n<p>The tool is not available to the public, but <a href=\"https:\/\/scale.com\/donovan\/defense-llm\"  target=\"_blank\" rel=\"noopener noreferrer\" aria-describedby=\"targetBlankDescription\">Scale AI\u2019s website provides an example<\/a> of this Meta-augmented accuracy, meaningfulness, and relevance. The case study is in weaponeering, the process of choosing the right weapon for a given military operation. An image on the Defense Llama homepage depicts a hypothetical user asking the chatbot: \u201cWhat are some JDAMs an F-35B could use to destroy a reinforced concrete building while minimizing collateral damage?\u201d The Joint Direct Attack Munition, or JDAM, is a hardware kit that converts unguided \u201cdumb\u201d bombs into a \u201cprecision-guided\u201d weapon that uses GPS or lasers to track its target.<\/p>\n<p>Defense Llama is shown in turn suggesting three different Guided Bomb Unit munitions, or GBUs, ranging from 500 to 2,000 pounds with characteristic chatbot pluck, describing one as \u201can excellent choice for destroying reinforced concrete buildings.\u201d<\/p>\n<div id=\"attachment_281650\" style=\"width: 560px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/12\/meta-llama-ai-chatgpt.webp\" ><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-281650\" class=\"wp-image-281650\" src=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/12\/meta-llama-ai-chatgpt.webp\" alt=\"\" width=\"550\" height=\"361\" srcset=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/12\/meta-llama-ai-chatgpt.webp 999w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/12\/meta-llama-ai-chatgpt-300x197.webp 300w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2024\/12\/meta-llama-ai-chatgpt-768x504.webp 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/a><p id=\"caption-attachment-281650\" class=\"wp-caption-text\">Scale AI marketed its Defense Llama product with this image of a hypothetical chat.<br \/>Screenshot of Scale AI marketing webpage<\/p><\/div>\n<p>Military targeting and munitions experts who spoke to The Intercept all said Defense Llama\u2019s advertised response was flawed to the point of being useless. Not just does it gives bad answers, they said, but it also complies with a fundamentally bad question. Whereas a trained human should know that such a question is nonsensical and dangerous, large language models, or LLMs, are generally built to be user friendly and compliant, even when it\u2019s a matter of life and death.<\/p>\n<blockquote><p><em><strong>\u201cIf someone asked me this exact question, it would immediately belie a lack of understanding about munitions selection or targeting.\u201d<\/strong><\/em><\/p><\/blockquote>\n<p>\u201cI can assure you that no U.S. targeting cell or operational unit is using a LLM such as this to make weaponeering decisions nor to conduct collateral damage mitigation,\u201d Wes J. Bryant, a retired targeting officer with the U.S. Air Force, told The Intercept, \u201cand if anyone brought the idea up, they\u2019d be promptly laughed out of the room.\u201d<\/p>\n<p>Munitions experts gave Defense Llama\u2019s hypothetical poor marks across the board. The LLM \u201ccompletely fails\u201d in its attempt to suggest the right weapon for the target while minimizing civilian death, Bryant told The Intercept.<\/p>\n<p>\u201cSince the question specifies JDAM and destruction of the building, it eliminates munitions that are generally used for lower collateral damage strikes,\u201d Trevor Ball, a former U.S. Army explosive ordnance disposal technician, told The Intercept. \u201cAll the answer does is poorly mention the JDAM \u2018bunker busters\u2019 but with errors. For example, the GBU-31 and GBU-32 warhead it refers to is not the (V)1. There also isn\u2019t a 500-pound penetrator in the U.S. arsenal.\u201d<\/p>\n<p>Ball added that it would be \u201cworthless\u201d for the chatbot give advice on destroying a concrete building without being provided any information about the building beyond it being made of concrete.<\/p>\n<p>Defense Llama\u2019s advertised output is \u201cgeneric to the point of uselessness to almost any user,\u201d said N.R. Jenzen-Jones, director of Armament Research Services. He also expressed skepticism toward the question\u2019s premise. \u201cIt is difficult to imagine many scenarios in which a human user would need to ask the sample question as phrased.\u201d<\/p>\n<p>In an emailed statement, Scale AI spokesperson Heather Horniak told The Intercept that the marketing image was not meant to actually represent what Defense Llama can do, but merely \u201cmakes the point that an LLM customized for defense <em>can<\/em> respond to military-focused questions.\u201d Horniak added that \u201cThe claim that a response from a hypothetical website example represents what actually comes from a deployed, fine-tuned LLM that is trained on relevant materials for an end user is ridiculous.\u201d<\/p>\n<p>Despite Scale AI\u2019s claims that Defense Llama was trained on a \u201cvast dataset\u201d of military knowledge, Jenzen-Jones said the artificial intelligence\u2019s advertised response was marked by \u201cclumsy and imprecise terminology\u201d and factual errors, confusing and conflating different aspects of different bombs. \u201cIf someone asked me this exact question, it would immediately belie a lack of understanding about munitions selection or targeting,\u201d he said. Why an F-35? Why a JDAM? What\u2019s the building, and where is it? All of this important, Jenzen-Jones said, is stripped away by Scale AI\u2019s example.<\/p>\n<p>Bryant cautioned that there is \u201cno magic weapon that prevents civilian casualties,\u201d but he called out the marketing image\u2019s suggested use of the 2,000-pound GBU-31, which was \u201cutilized extensively by Israel in the first months of the Gaza campaign, and as we know caused massive civilian casualties due to the manner in which they employed the weapons.\u201d<\/p>\n<p>Scale did not answer when asked if Defense Department customers are actually using Defense Llama as shown in the advertisement. On the day the tool was announced, Scale AI <a href=\"https:\/\/defensescoop.com\/2024\/11\/04\/scale-ai-unveils-defense-llama-large-language-model-llm-national-security-users\/\"  target=\"_blank\" rel=\"noopener noreferrer\" aria-describedby=\"targetBlankDescription\">provided DefenseScoop<\/a> a private demonstration using this same airstrike scenario. The publication noted that Defense Llama provided \u201cprovided a lengthy response that also spotlighted a number of factors worth considering.\u201d Following a request for comment by The Intercept, the company added a small caption under the promotional image: \u201cfor demo purposes only.\u201d<\/p>\n<p>Meta declined to comment.<\/p>\n<p>While Scale AI\u2019s marketing scenario may be a hypothetical, military use of LLMs is not. In February, DefenseScoop <a href=\"https:\/\/defensescoop.com\/2024\/02\/20\/scale-ai-pentagon-testing-evaluating-large-language-models\/\"  target=\"_blank\" rel=\"noopener noreferrer\" aria-describedby=\"targetBlankDescription\">reported<\/a> that the Pentagon\u2019s AI office had selected Scale AI \u201cto produce a trustworthy means for testing and evaluating large language models that can support \u2014 and potentially disrupt \u2014 military planning and decision-making.\u201d The company\u2019s LLM software, now augmented by Meta\u2019s massive investment in machine learning, has contracted with the Air Force and Army since 2020. Last year, Scale AI <a href=\"https:\/\/www.businesswire.com\/news\/home\/20230510005630\/en\/Scale-AI-Partners-with-XVIII-Airborne-Corps-for-First-LLM-Deployment-to-a-U.S.-Government-Classified-Network\"  target=\"_blank\" rel=\"noopener noreferrer\" aria-describedby=\"targetBlankDescription\">announced<\/a> its system was the \u201cthe first large language model (LLM) on a classified network,\u201d used by the XVIII Airborne Corps for \u201cdecision-making.\u201d In October, the White House issued a national security memorandum <a href=\"https:\/\/defensescoop.com\/2024\/10\/24\/national-security-memorandum-artificial-intelligence-dod-odni\/\"  target=\"_blank\" rel=\"noopener noreferrer\" aria-describedby=\"targetBlankDescription\">directing<\/a> the Department of Defense and intelligence community to adopt AI tools with greater urgency. Shortly after the memo\u2019s publication, The Intercept <a target=\"_blank\" href=\"https:\/\/theintercept.com\/2024\/10\/25\/africom-microsoft-openai-military\/?utm_medium=email&amp;utm_source=The%20Intercept%20Newsletter\" >reported<\/a> that U.S. Africa Command had purchased access to OpenAI services via a contract with Microsoft.<\/p>\n<p>Unlike its industry peers, Scale AI has never shied away from defense contracting. In a 2023 interview with the Washington Post, CEO Alexandr Wang, a vocal proponent of weaponized AI, described himself as a \u201cChina-hawk\u201d and said he hoped Scale could \u201cbe the company that helps ensure that the United States maintains this leadership position.\u201d Its embrace of military work has seemingly charmed investors, which <a href=\"https:\/\/techcrunch.com\/2024\/05\/21\/data-labeling-startup-scale-ai-raises-1b-as-valuation-doubles-to-13-8b\/\"  target=\"_blank\" rel=\"noopener noreferrer\" aria-describedby=\"targetBlankDescription\">include<\/a> Peter Thiel\u2019s Founders Fund, Y Combinator, Nvidia, Amazon, and Meta. \u201cWith Defense Llama, our service members can now better harness generative AI to address their specific mission needs,\u201d Wang wrote in the product\u2019s announcement.<\/p>\n<p>But the munitions experts who spoke to The Intercept expressed confusion over who, exactly, Defense Llama is marketing to with the airstrike demo, questioning why anyone involved in weaponeering would know so little about its fundamentals that they would need to consult a chatbot in the first place. \u201cIf we generously assume this example is intended to simulate a question from an analyst not directly involved in planning and without munitions-specific expertise, then the answer is in fact much more dangerous,\u201d Jenzen-Jones explained. \u201cIt reinforces a probably false assumption (that a JDAM must be used), it fails to clarify important selection criteria, it gives incorrect technical data that a nonspecialist user is less likely to question, and it does nothing to share important contextual information about targeting constraints.\u201d<\/p>\n<blockquote><p><em><strong>\u201cIt gives incorrect technical data that a nonspecialist user is less likely to question.\u201d<\/strong><\/em><\/p><\/blockquote>\n<p>Bryant agreed. \u201cThe advertising and hypothetical scenario is quite irresponsible,\u201d he explained, \u201cprimarily because the U.S. military\u2019s methodology for mitigating collateral damage is not so simple as just the munition being utilized. That is one factor of many.\u201d Bryant suggested that Scale AI\u2019s example scenario betrayed an interest in \u201ctrying make good press and trying to depict an idea of things that may be in the realm of possible, while being wholly naive about what they are trying to depict and completely lacking understanding in anything related to actual targeting.\u201d<\/p>\n<p>Turning to an LLM for airstrike planning also means sidestepping the typical human-based process and the responsibility that entails. Bryant, who during his time in the Air Force helped plan airstrikes against Islamic State targets, told The Intercept that the process typically entails a team of experts \u201cwho ultimately converge on a final targeting decision.\u201d<\/p>\n<p>Jessica Dorsey, a professor at\u00a0Utrecht University School of Law and scholar of automated warfare methods, said consulting Defense Llama seems to entirely circumvent the ostensible legal obligations military planners are supposed to be held to. \u201cThe reductionist\/simplistic and almost amateurish approach indicated by the example is quite dangerous,\u201d she said. \u201cJust deploying a GBU\/JDAM does not mean there will be less civilian harm. It\u2019s a 500 to 2,000-pound bomb after all.\u201d<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/theintercept.com\/2024\/11\/24\/defense-llama-meta-military\/?utm_medium=email&amp;utm_source=The%20Intercept%20Newsletter\" >Go to Original &#8211; theintercept.com<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>24 Nov 2024 &#8211; The marketing of a new military tech tool powered by Meta\u2019s artificial intelligence is \u201cirresponsible\u201d and \u201cclumsy,\u201d experts said.<\/p>\n","protected":false},"author":4,"featured_media":281647,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3078],"tags":[1733,3022,2994,2689,291],"class_list":["post-281645","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-ai","tag-artificial-intelligence-ai","tag-chatbot","tag-chatgpt","tag-metaverse","tag-military"],"_links":{"self":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/281645","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/comments?post=281645"}],"version-history":[{"count":3,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/281645\/revisions"}],"predecessor-version":[{"id":281651,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/281645\/revisions\/281651"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/media\/281647"}],"wp:attachment":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/media?parent=281645"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/categories?post=281645"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/tags?post=281645"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}