{"id":246960,"date":"2023-10-30T12:00:31","date_gmt":"2023-10-30T12:00:31","guid":{"rendered":"https:\/\/www.transcend.org\/tms\/?p=246960"},"modified":"2023-10-27T04:17:47","modified_gmt":"2023-10-27T03:17:47","slug":"managing-ai-risks-in-an-era-of-rapid-progress","status":"publish","type":"post","link":"https:\/\/www.transcend.org\/tms\/2023\/10\/managing-ai-risks-in-an-era-of-rapid-progress\/","title":{"rendered":"Managing AI Risks in an Era of Rapid Progress"},"content":{"rendered":"<div id=\"attachment_240236\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy.jpg\" ><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-240236\" class=\"size-medium wp-image-240236\" src=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy-300x300.jpg\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy-300x300.jpg 300w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy-1024x1024.jpg 1024w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy-150x150.jpg 150w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy-768x768.jpg 768w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy-1536x1536.jpg 1536w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/artificial-intelligence-ai-logo-Vecteezy.jpg 1920w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-240236\" class=\"wp-caption-text\">Vecteezy<\/p><\/div>\n<blockquote><p>24 Oct 2023 &#8211; <em>In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose urgent priorities for AI R&amp;D and governance.<\/em><\/p><\/blockquote>\n<p>In 2019, GPT-2 could not reliably count to ten. Only four years later, deep learning systems can write software, generate photorealistic scenes on demand, advise on intellectual topics, and combine language and image processing to steer robots.<\/p>\n<p>As AI developers scale these systems, unforeseen abilities and behaviors emerge spontaneously without explicit programming<span class=\"citation-number\">[1]<\/span>. Progress in AI has been swift and, to many, surprising.<\/p>\n<p>The pace of progress may surprise us again. Current deep learning systems still lack important capabilities and we do not know how long it will take to develop them.<\/p>\n<p>However, companies are engaged in a race to create generalist AI systems that match or exceed human abilities in most cognitive work<span class=\"citation-number\">[2, 3]<\/span>.<\/p>\n<p>They are rapidly deploying more resources and developing new techniques to increase AI capabilities. Progress in AI also enables faster progress: AI assistants are increasingly used to automate programming <span class=\"citation-number\">[4]\u00a0<\/span>and data collection<span class=\"citation-number\">[5, 6]<\/span> to further improve AI systems <span class=\"citation-number\">[7]<\/span>.<\/p>\n<p>There is no fundamental reason why AI progress would slow or halt at the human level. Indeed, AI has already surpassed human abilities in narrow domains like protein folding or strategy games<span class=\"citation-number\">[8, 9, 10]<\/span>.<\/p>\n<p>Compared to humans, AI systems can act faster, absorb more knowledge, and communicate at a far higher bandwidth. Additionally, they can be scaled to use immense computational resources and can be replicated by the millions.<\/p>\n<p>The rate of improvement is already staggering, and tech companies have the cash reserves needed to scale the latest training runs by multiples of 100 to 1000 soon<span class=\"citation-number\">[11]<\/span>. Combined with the ongoing growth and automation in AI R&amp;D, we must take seriously the possibility that generalist AI systems will outperform human abilities across many critical domains within this decade or the next.<\/p>\n<p>What happens then?<\/p>\n<p>If managed carefully and distributed fairly, advanced AI systems could help humanity cure diseases, elevate living standards, and protect our ecosystems.<\/p>\n<p>The opportunities AI offers are immense. But alongside advanced AI capabilities come large-scale risks that we are not on track to handle well. Humanity is pouring vast resources into making AI systems more powerful, but far less into safety and mitigating harms. For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough.<\/p>\n<p>We are already behind schedule for this reorientation. We must anticipate the amplification of ongoing harms, as well as novel risks, and prepare for the largest risks <em>well before they materialize<\/em>. Climate change has taken decades to be acknowledged and confronted; for AI, decades could be too long.<\/p>\n<h2 id=\"high-stakes-risks\">Societal-scale Risks<\/h2>\n<p>AI systems could rapidly come to outperform humans in an increasing number of tasks. If such systems are not carefully designed and deployed, they pose a range of societal-scale risks. They threaten to amplify social injustice, erode social stability, and weaken our shared understanding of reality that is foundational to society.<\/p>\n<p>They could also enable large-scale criminal or terrorist activities.<\/p>\n<p>Especially in the hands of a few powerful actors, AI could cement or exacerbate global inequities, or facilitate automated warfare, customized mass manipulation, and pervasive surveillance[12, 13].<\/p>\n<p>Many of these risks could soon be amplified, and new risks created, as companies are developing <em>autonomous AI<\/em>: systems that can plan, act in the world, and pursue goals.<\/p>\n<p>While current AI systems have limited autonomy, work is underway to change this<span class=\"citation-number\">[14]<\/span>. For example, the non-autonomous GPT-4 model was quickly adapted to browse the web<span class=\"citation-number\">[15]<\/span>, design and execute chemistry experiments<span class=\"citation-number\">[16]<\/span>, and utilize software tools<span class=\"citation-number\">[17]<\/span>, including other AI models<span class=\"citation-number\">[18]<\/span>.<\/p>\n<p>If we build highly advanced autonomous AI, we risk creating systems that pursue undesirable goals. Malicious actors could deliberately embed harmful objectives. Moreover, no one currently knows how to reliably align AI behavior with complex values. Even well-meaning developers may inadvertently build AI systems that pursue unintended goals\u2014especially if, in a bid to win the AI race, they neglect expensive safety testing and human oversight.<\/p>\n<p>Once autonomous AI systems pursue undesirable goals, embedded by malicious actors or by accident, we may be unable to keep them in check. Control of software is an old and unsolved problem: computer worms have long been able to proliferate and avoid detection<span class=\"citation-number\">[19]<\/span>. However, AI is making progress in critical domains such as hacking, social manipulation, deception, and strategic planning<span class=\"citation-number\">[14, 20]<\/span>. Advanced autonomous AI systems will pose unprecedented control challenges.<\/p>\n<p>To advance undesirable goals, future autonomous AI systems could use undesirable strategies\u2014learned from humans or developed independently\u2014as a means to an end<span class=\"citation-number\">[21, 22, 23, 24]<\/span>.\u00a0AI systems could gain human trust, acquire financial resources, influence key decision-makers, and form coalitions with human actors and other AI systems.<\/p>\n<p>To avoid human intervention<span class=\"citation-number\">[24]<\/span>, they could copy their algorithms across global server networks like computer worms. AI assistants are already co-writing a large share of computer code worldwide<span class=\"citation-number\">[25]<\/span>; future AI systems could insert and then exploit security vulnerabilities to control the computer systems behind our communication, media, banking, supply-chains, militaries, and governments. In open conflict, AI systems could threaten with or use autonomous or biological weapons. AI having access to such technology would merely continue existing trends to automate military activity, biological research, and AI development itself. If AI systems pursued such strategies with sufficient skill, it would be difficult for humans to intervene.<\/p>\n<p>Finally, AI systems may not need to plot for influence if it is freely handed over. As autonomous AI systems increasingly become faster and more cost-effective than human workers, a dilemma emerges.<\/p>\n<p>Companies, governments, and militaries might be forced to deploy AI systems widely and cut back on expensive human verification of AI decisions, or risk being outcompeted<span class=\"citation-number\">[26, 27]<\/span>. As a result, autonomous AI systems could increasingly assume critical societal roles.<\/p>\n<p>Without sufficient caution, we may irreversibly lose control of autonomous AI systems, rendering human intervention ineffective. Large-scale cybercrime, social manipulation, and other highlighted harms could then escalate rapidly. This unchecked AI advancement could culminate in a large-scale loss of life and the biosphere, and the marginalization or even extinction of humanity.<\/p>\n<p>Harms such as misinformation and discrimination from algorithms are already evident today<span class=\"citation-number\">[28]<\/span>; other harms show signs of emerging<span class=\"citation-number\">[20]<\/span>. It is vital to both address ongoing harms and anticipate emerging risks. This is <em>not<\/em> a question of either\/or. Present and emerging risks often share similar mechanisms, patterns, and solutions<span class=\"citation-number\">[29]<\/span>; investing in governance frameworks and AI safety will bear fruit on multiple fronts<span class=\"citation-number\">[30]<\/span>.<\/p>\n<h2 id=\"a-path-forward\">A Path Forward<\/h2>\n<p>If advanced autonomous AI systems were developed today, we would not know how to make them safe, nor how to properly test their safety. Even if we did, governments would lack the institutions to prevent misuse and uphold safe practices. That does not, however, mean there is no viable path forward. To ensure a positive outcome, we can and must pursue research breakthroughs in AI safety and ethics and promptly establish effective government oversight.<\/p>\n<h3 id=\"Reorienting technical R&amp;D\">Reorienting Technical R&amp;D<\/h3>\n<p>We need research breakthroughs to solve some of today\u2019s technical challenges in creating AI with safe and ethical objectives. Some of these challenges are unlikely to be solved by simply making AI systems more capable<span class=\"citation-number\">[22, 31, 32, 33, 34, 35]<\/span>. These include:<\/p>\n<ul>\n<li>Oversight and honesty: More capable AI systems are better able to exploit weaknesses in oversight and testing[32, 36, 37]<br \/>\n\u2014for example, by producing false but compelling output[35, 38].<\/li>\n<li>Robustness: AI systems behave unpredictably in new situations (under distribution shift or adversarial inputs)[39, 40, 34].<\/li>\n<li>Interpretability: AI decision-making is opaque. So far, we can only test large models via trial and error. We need to learn to understand their inner workings[41].<\/li>\n<li>Risk evaluations: Frontier AI systems develop unforeseen capabilities only discovered during training or even well after deployment[42]. Better evaluation is needed to detect hazardous capabilities earlier[43, 44].<\/li>\n<li>Addressing emerging challenges: More capable future AI systems may exhibit failure modes we have so far seen only in theoretical models. AI systems might, for example, learn to feign obedience or exploit weaknesses in our safety objectives and shutdown mechanisms to advance a particular goal[24, 45].<\/li>\n<\/ul>\n<p>Given the stakes, we call on major tech companies and public funders to allocate at least one-third of their AI R&amp;D budget to ensuring safety and ethical use, comparable to their funding for AI capabilities. Addressing these problems<span class=\"citation-number\">[34]<\/span>, with an eye toward powerful future systems, must become central to our field.<\/p>\n<h3 id=\"Urgent governance measures\">Urgent Governance Measures<\/h3>\n<p>We urgently need national institutions and international governance to enforce standards in order to prevent recklessness and misuse. Many areas of technology, from pharmaceuticals to financial systems and nuclear energy, show that society both requires and effectively uses governance to reduce risks. However, no comparable governance frameworks are currently in place for AI.<\/p>\n<p>Without them, companies and countries may seek a competitive edge by pushing AI capabilities to new heights while cutting corners on safety, or by delegating key societal roles to AI systems with little human oversight<span class=\"citation-number\">[26]<\/span>. Like manufacturers releasing waste into rivers to cut costs, they may be tempted to reap the rewards of AI development while leaving society to deal with the consequences.<\/p>\n<p>To keep up with rapid progress and avoid inflexible laws, national institutions need strong technical expertise and the authority to act swiftly. To address international race dynamics, they need the affordance to facilitate international agreements and partnerships<span class=\"citation-number\">[46, 47]<\/span>. To protect low-risk use and academic research, they should avoid undue bureaucratic hurdles for small and predictable AI models. The most pressing scrutiny should be on AI systems at the frontier: a small number of most powerful AI systems \u2013 trained on billion-dollar supercomputers \u2013 which will have the most hazardous and unpredictable capabilities<span class=\"citation-number\">[48, 49]<\/span>.<\/p>\n<p>To enable effective regulation, governments urgently need comprehensive insight into AI development. Regulators should require model registration, whistleblower protections, incident reporting, and monitoring of model development and supercomputer usage<span class=\"citation-number\">[48, 50, 51, 52, 53, 54, 55]<\/span>. Regulators also need access to advanced AI systems before deployment to evaluate them for dangerous capabilities such as autonomous self-replication, breaking into computer systems, or making pandemic pathogens widely accessible<span class=\"citation-number\">[43, 56, 57]<\/span>.<\/p>\n<p>For AI systems with hazardous capabilities, we need a combination of governance mechanisms<span class=\"citation-number\">[48, 52, 58, 59]\u00a0<\/span>matched to the magnitude of their risks.<\/p>\n<p>Regulators should create national and international safety standards that depend on model capabilities. They should also hold frontier AI developers and owners legally accountable for harms from their models that can be reasonably foreseen and prevented.<\/p>\n<p>These measures can prevent harm and create much-needed incentives to invest in safety. Further measures are needed for exceptionally capable future AI systems, such as models that could circumvent human control.<\/p>\n<p>Governments must be prepared to license their development, pause development in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers, until adequate protections are ready.To bridge the time until regulations are in place, major AI companies should promptly lay out if-then commitments: specific safety measures they will take if specific red-line capabilities are found in their AI systems. These commitments should be detailed and independently scrutinized.<\/p>\n<p>AI may be the technology that shapes this century. While AI capabilities are advancing rapidly, progress in safety and governance is lagging behind. To steer AI toward positive outcomes and away from catastrophe, we need to reorient. There is a responsible path, if we have the wisdom to take it.<\/p>\n<h3 id=\"references\">References<\/h3>\n<ol id=\"references-list\" class=\"references\">\n<li id=\"wei2022emergent\"><span class=\"title\">Emergent Abilities of Large Language Models<\/span> \u2002<a href=\"https:\/\/openreview.net\/pdf?id=yzkSU5zdwD\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nWei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S. and others,, 2022. Transactions on Machine Learning Research.<\/li>\n<li id=\"deepmind2023about\"><span class=\"title\">About<\/span> \u2002<a href=\"https:\/\/www.deepmind.com\/about\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nDeepMind,, 2023.<\/li>\n<li id=\"openai2023about\"><span class=\"title\">About<\/span> \u2002<a href=\"https:\/\/openai.com\/about\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nOpenAI,, 2023.<\/li>\n<li id=\"tabachnyk2022ml\"><span class=\"title\">ML-Enhanced Code Completion Improves Developer Productivity<\/span> \u2002<a href=\"https:\/\/blog.research.google\/2022\/07\/ml-enhanced-code-completion-improves.html\"  target=\"_blank\" rel=\"noopener\">[HTML]<\/a><br \/>\nTabachnyk, M., 2022. Google Research.<\/li>\n<li id=\"openai2023gpt4\"><span class=\"title\">GPT-4 Technical Report<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2303.08774.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nOpenAI,, 2023. arXiv [cs.CL].<\/li>\n<li id=\"bai2022constitutional\"><span class=\"title\">Constitutional AI: Harmlessness from AI Feedback<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2212.08073.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nBai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A. and others,, 2022. arXiv [cs.CL].<\/li>\n<li id=\"safeai2023examples\"><span class=\"title\">Examples of AI Improving AI<\/span> \u2002<a href=\"https:\/\/ai-improving-ai.safe.ai\/\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nWoodside, T. and Safety, C.f.A., 2023.<\/li>\n<li id=\"jumper2021alphafold\"><span class=\"title\">Highly Accurate Protein Structure Prediction with AlphaFold<\/span><br \/>\nJumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O. and others,, 2021. Nature, pp. 583&#8211;589.<\/li>\n<li id=\"brown2019superhuman\"><span class=\"title\">Superhuman AI for Multiplayer Poker<\/span><br \/>\nBrown, N. and Sandholm, T., 2019. Science, pp. 885&#8211;890.<\/li>\n<li id=\"campbell2002deepblue\"><span class=\"title\">Deep Blue<\/span><br \/>\nCampbell, M., Hoane, A. and Hsu, F., 2002. Artificial Intelligence, pp. 57&#8211;83.<\/li>\n<li id=\"alphabet2022annual\"><span class=\"title\">Alphabet Annual Report, page 33<\/span> \u2002<a href=\"https:\/\/abc.xyz\/assets\/d4\/4f\/a48b94d548d0b2fdc029a95e8c63\/2022-alphabet-annual-report.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nAlphabet,, 2022.<\/li>\n<li id=\"hendrycks2023overview\"><span class=\"title\">An Overview of Catastrophic AI Risks<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2306.12001.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nHendrycks, D., Mazeika, M. and Woodside, T., 2023. arXiv [cs.CY].<\/li>\n<li id=\"weidinger2022taxonomy\"><span class=\"title\">Taxonomy of Risks Posed by Language Models<\/span><br \/>\nWeidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P., Mellor, J. and others,, 2022. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 214&#8211;229.<\/li>\n<li id=\"wang2023survey\"><span class=\"title\">A Survey on Large Language Model based Autonomous Agents<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2308.11432.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nWang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J. and others,, 2023. arXiv [cs.AI].<\/li>\n<li id=\"openai2023chatgpt\"><span class=\"title\">ChatGPT plugins<\/span> \u2002<a href=\"https:\/\/openai.com\/blog\/chatgpt-plugins\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nOpenAI,, 2023.<\/li>\n<li id=\"bran2023chemcrow\"><span class=\"title\">ChemCrow: Augmenting Large Language Models with Chemistry Tools<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2304.05376.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nBran, A., Cox, S., White, A. and Schwaller, P., 2023. arXiv [physics.chem-ph].<\/li>\n<li id=\"mialon2023augmented\"><span class=\"title\">Augmented Language Models: a Survey<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2302.07842.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nMialon, G., Dess\u00ec, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R. and others,, 2023. arXiv [cs.CL].<\/li>\n<li id=\"shen2023hugginggpt\"><span class=\"title\">HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2303.17580.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nShen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y. and others,, 2023. arXiv [cs.CL].<\/li>\n<li id=\"denning1989internetworm\"><span class=\"title\">The Science of Computing: The Internet Worm<\/span><br \/>\nDenning, P., 1989. American Scientist, pp. 126&#8211;128.<\/li>\n<li id=\"park2023aideception\"><span class=\"title\">AI Deception: A Survey of Examples, Risks, and Potential Solutions<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2308.14752.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nPark, P., Goldstein, S., O\u2019Gara, A., Chen, M. and Hendrycks, D., 2023. arXiv [cs.CY].<\/li>\n<li id=\"turner2019optimal\"><span class=\"title\">Optimal Policies Tend to Seek Power<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/1912.01683.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nTurner, A., Smith, L., Shah, R. and Critch, A., 2019. Thirty-Fifth Conference on Neural Information Processing Systems.<\/li>\n<li id=\"perez2022discovering\"><span class=\"title\">Discovering Language Model Behaviors with Model-Written Evaluations<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2212.09251.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nPerez, E., Ringer, S., Luko\u0161i\u016bt\u0117, K., Nguyen, K., Chen, E. and Heiner, S., 2022. arXiv [cs.CL].<\/li>\n<li id=\"pan2023do\"><span class=\"title\">Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark<\/span><br \/>\nPan, A., Chan, J., Zou, A., Li, N., Basart, S. and Woodside, T., 2023. International Conference on Machine Learning.<\/li>\n<li id=\"hadfield2017offswitch\"><span class=\"title\">The Off-Switch Game<\/span><br \/>\nHadfield-Menell, D., Dragan, A., Abbeel, P. and Russell, S., 2017. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 220&#8211;227.<\/li>\n<li id=\"dohmke2023github\"><span class=\"title\">GitHub Copilot<\/span> \u2002<a href=\"https:\/\/github.blog\/2023-02-14-github-copilot-for-business-is-now-available\/\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nDohmke, T., 2023.<\/li>\n<li id=\"hendrycks2023natural\"><span class=\"title\">Natural Selection Favors AIs over Humans<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2303.16200.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nHendrycks, D., 2023. arXiv [cs.CY].<\/li>\n<li id=\"chan2023harms\"><span class=\"title\">Harms from Increasingly Agentic Algorithmic Systems<\/span><br \/>\nChan, A., Salganik, R., Markelius, A., Pang, C., Rajkumar, N. and Krasheninnikov, D., 2023. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 651&#8211;666. Association for Computing Machinery.<\/li>\n<li id=\"bommasani2021opportunities\"><span class=\"title\">On the Opportunities and Risks of Foundation Models<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2108.07258.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nBommasani, R., Hudson, D., Adeli, E., Altman, R., Arora, S. and von Arx, S., 2021. arXiv [cs.LG].<\/li>\n<li id=\"brauner2023ai\"><span class=\"title\">AI Poses Doomsday Risks\u2014But That Doesn\u2019t Mean We Shouldn\u2019t Talk About Present Harms Too<\/span> \u2002<a href=\"https:\/\/time.com\/6303127\/ai-future-danger-present-harms\/\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nBrauner, J. and Chan, A., 2023. Time.<\/li>\n<li id=\"center2023policy\"><span class=\"title\">Existing Policy Proposals Targeting Present and Future Harms<\/span> \u2002<a href=\"https:\/\/assets-global.website-files.com\/63fe96aeda6bea77ac7d3000\/647d5368c2368cc32b359f88\/_Policy\/%20Agreement\/%20Statement.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nSafety, C.f.A., 2023.<\/li>\n<li id=\"mckenzie2023inverse\"><span class=\"title\">Inverse Scaling: When Bigger Isn\u2019t Better<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2306.09479.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nMcKenzie, I., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A. and Prabhu, A., 2023. Transactions on Machine Learning Research.<\/li>\n<li id=\"pan2022effects\"><span class=\"title\">The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models<\/span> \u2002<a href=\"https:\/\/openreview.net\/forum?id=JYtwGwIL7ye\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nPan, A., Bhatia, K. and Steinhardt, J., 2022. International Conference on Learning Representations.<\/li>\n<li id=\"wei2023simple\"><span class=\"title\">Simple Synthetic Data Reduces Sycophancy in Large Language Models<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2308.03958.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nWei, J., Huang, D., Lu, Y., Zhou, D. and Le, Q., 2023. arXiv [cs.CL].<\/li>\n<li id=\"hendrycks2021unsolved\"><span class=\"title\">Unsolved Problems in ML Safety<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2109.13916.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nHendrycks, D., Carlini, N., Schulman, J. and Steinhardt, J., 2021. arXiv [cs.LG].<\/li>\n<li id=\"casper2023open\"><span class=\"title\">Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2307.15217.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nCasper, S., Davies, X., Shi, C., Gilbert, T., Scheurer, J. and Rando, J., 2023. arXiv [cs.AI].<\/li>\n<li id=\"zhuang2020consequences\"><span class=\"title\">Consequences of Misaligned AI<\/span><br \/>\nZhuang, S. and Hadfield-Menell, D., 2020. Advances in Neural Information Processing Systems, Vol 33, pp. 15763&#8211;15773.<\/li>\n<li id=\"gao2023scaling\"><span class=\"title\">Scaling Laws for Reward Model Overoptimization<\/span><br \/>\nGao, L., Schulman, J. and Hilton, J., 2023. Proceedings of the 40th International Conference on Machine Learning, pp. 10835&#8211;10866. PMLR.<\/li>\n<li id=\"learning2017human\"><span class=\"title\">Learning from human preferences<\/span> \u2002<a href=\"https:\/\/openai.com\/research\/learning-from-human-preferences\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nAmodei, D., Christiano, P. and Ray, A., 2017.<\/li>\n<li id=\"learning2023human\"><span class=\"title\">Goal Misgeneralization in Deep Reinforcement Learning<\/span> \u2002<a href=\"https:\/\/openreview.net\/forum?id=q--OykSR2FY\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nLangosco di Langosco, A. and Chan, A., 2022. International Conference on Learning Representations.<\/li>\n<li id=\"shah2022goal\"><span class=\"title\">Goal Misgeneralization: Why Correct Specifications Aren\u2019t Enough For Correct Goals<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2210.01790.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nShah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J. and others,, 2022. arXiv [cs.LG].<\/li>\n<li id=\"rauker2023transparent\"><span class=\"title\">Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks<\/span><br \/>\nR\u00e4uker, T., Ho, A., Casper, S. and Hadfield-Menell, D., 2023. 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp. 464&#8211;483.<\/li>\n<li id=\"wei2022chain\"><span class=\"title\">Chain-of-Thought Prompting Elicits Reasoning in Large Language Models<\/span><br \/>\nWei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F. and others,, 2022. Advances in Neural Information Processing Systems, Vol 35, pp. 24824&#8211;24837.<\/li>\n<li id=\"shevlane2023model\"><span class=\"title\">Model evaluation for extreme risks<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2305.15324.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nShevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J. and others,, 2023. arXiv [cs.AI].<\/li>\n<li id=\"koessler2023risk\"><span class=\"title\">Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2307.08823.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nKoessler, L. and Schuett, J., 2023. arXiv [cs.CY].<\/li>\n<li id=\"ngo2022alignment\"><span class=\"title\">The Alignment Problem from a Deep Learning Perspective<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2209.00626.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nNgo, R., Chan, L. and Mindermann, S., 2022. arXiv [cs.AI].<\/li>\n<li id=\"ho2023international\"><span class=\"title\">International Institutions for Advanced AI<\/span><br \/>\nHo, L., Barnhart, J., Trager, R., Bengio, Y., Brundage, M., Carnegie, A. and others,, 2023. arXiv [cs.CY]. <a target=\"_blank\" href=\"https:\/\/doi.org\/10.48550\/arXiv.2307.04699\" >DOI: 10.48550\/arXiv.2307.04699<\/a><\/li>\n<li id=\"trager2023governance\"><span class=\"title\">International Governance of Civilian AI: A Jurisdictional Certification Approach<\/span> \u2002<a href=\"https:\/\/cdn.governance.ai\/International_Governance_of_Civilian_AI_OMS.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nTrager, R., Harack, B., Reuel, A., Carnegie, A., Heim, L., Ho, L. and others,, 2023.<\/li>\n<li id=\"anderljung2023frontier\"><span class=\"title\">Frontier AI Regulation: Managing Emerging Risks to Public Safety<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2307.03718.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nAnderljung, M., Barnhart, J., Korinek, A., Leung, J., O\u2019Keefe, C., Whittlestone, J. and others,, 2023. arXiv [cs.CY].<\/li>\n<li id=\"ganguli2022predictability\"><span class=\"title\">Predictability and Surprise in Large Generative Models<\/span><br \/>\nGanguli, D., Hernandez, D., Lovitt, L., Askell, A., Bai, Y., Chen, A. and others,, 2022. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 1747&#8211;1764. Association for Computing Machinery.<\/li>\n<li id=\"hadfield2023national\"><span class=\"title\">It\u2019s Time to Create a National Registry for Large AI Models<\/span> \u2002<a href=\"https:\/\/carnegieendowment.org\/2023\/07\/12\/it-s-time-to-create-national-registry-for-large-ai-models-pub-90180\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nHadfield, G., Cu\u00e9llar, M. and O\u2019Reilly, T., 2023. Carnegie Endowment for International Piece.<\/li>\n<li id=\"mitchell2019model\"><span class=\"title\">Model Cards for Model Reporting<\/span><br \/>\nMitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B. and others,, 2019. FAT* \u201919: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 220&#8211;229.<\/li>\n<li id=\"ainow2023euaia\"><span class=\"title\">General Purpose AI Poses Serious Risks, Should Not Be Excluded From the EU\u2019s AI Act | Policy Brief<\/span> \u2002<a href=\"https:\/\/ainowinstitute.org\/publication\/gpai-is-high-risk-should-not-be-excluded-from-eu-ai-act\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\n2023. AI Now Institute.<\/li>\n<li id=\"incidentdatabase2023ai\"><span class=\"title\">Artificial Intelligence Incident Database<\/span> \u2002<a href=\"https:\/\/incidentdatabase.ai\/\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nDatabase, A.I.I., 2023.<\/li>\n<li id=\"bloch2023promise\"><span class=\"title\">The Promise and Perils of Tech Whistleblowing<\/span> \u2002<a href=\"https:\/\/papers.ssrn.com\/abstract=4377064\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nBloch-Wehba, H., 2023. Northwestern University Law Review, Forthcoming.<\/li>\n<li id=\"mulani2023proposing\"><span class=\"title\">Proposing a Foundation Model Information-Sharing Regime for the UK<\/span> \u2002<a href=\"https:\/\/www.governance.ai\/post\/proposing-a-foundation-model-information-sharing-regime-for-the-uk\"  target=\"_blank\" rel=\"noopener\">[link]<\/a><br \/>\nMulani, N. and Whittlestone, J., 2023. Centre for the Governance of AI.<\/li>\n<li id=\"mokander2023auditing\"><span class=\"title\">Auditing Large Language Models: a Three-Layered Approach<\/span><br \/>\nM\u00f6kander, J., Schuett, J., Kirk, H. and Floridi, L., 2023. AI and Ethics. <a target=\"_blank\" href=\"https:\/\/doi.org\/10.1007\/s43681-023-00289-2\" >DOI: 10.1007\/s43681-023-00289-2<\/a><\/li>\n<li id=\"soice2023large\"><span class=\"title\">Can Large Language Models Democratize Access to Dual-Use Biotechnology?<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2306.03809.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nSoice, E., Rocha, R., Cordova, K., Specter, M. and Esvelt, K., 2023. arXiv [cs.CY].<\/li>\n<li id=\"schuett2023best\"><span class=\"title\">Towards Best Practices in AGI Safety and Governance: A survey of Expert Opinion<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2305.07153.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nSchuett, J., Dreksler, N., Anderljung, M., McCaffary, D., Heim, L., Bluemke, E. and others,, 2023. arXiv [cs.CY].<\/li>\n<li id=\"hadfield2023regulatory\"><span class=\"title\">Regulatory Markets: The Future of AI Governance<\/span> \u2002<a href=\"http:\/\/arxiv.org\/pdf\/2304.04914.pdf\"  target=\"_blank\" rel=\"noopener\">[PDF]<\/a><br \/>\nHadfield, G. and Clark, J., 2023. arXiv [cs.AI].<\/li>\n<\/ol>\n<h3><strong>Authors:<\/strong><\/h3>\n<p class=\"author\"><span class=\"name\"><strong>Yoshua Bengio,<\/strong>\u00a0<\/span><span class=\"affiliation\">Mila \u2013 Quebec AI Institute, Universit\u00e9 de Montr\u00e9al, Canada CIFAR AI Chair<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Geoffrey Hinton,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of Toronto, Vector Institute<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Andrew Yao,<\/strong>\u00a0<\/span><span class=\"affiliation\">Tsinghua University<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Dawn Song,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of California, Berkeley<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Pieter Abbeel<\/strong>,\u00a0<\/span><span class=\"affiliation\">University of California, Berkeley<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Yuval Noah Harari,<\/strong>\u00a0<\/span><span class=\"affiliation\">The Hebrew University of Jerusalem, Department of History<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Ya-Qin Zhang<\/strong>,\u00a0<\/span><span class=\"affiliation\">Tsinghua University<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Lan Xue<\/strong>,\u00a0<\/span><span class=\"affiliation\">Tsinghua University, Institute for AI International Governance<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Shai Shalev-Shwartz,<\/strong>\u00a0<\/span><span class=\"affiliation\">The Hebrew University of Jerusalem<\/span><\/p>\n<p class=\"author\"><strong><span class=\"name\">Gillian Hadfield,\u00a0<\/span><\/strong><span class=\"affiliation\">University of Toronto, SR Institute for Technology and Society, Vector Institute<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Jeff Clune,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of British Columbia, Canada CIFAR AI Chair, Vector Institute<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Tegan Maharaj,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of Toronto, Vector Institute<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Frank Hutter<\/strong>,\u00a0<\/span><span class=\"affiliation\">University of Freiburg<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>At\u0131l\u0131m G\u00fcne\u015f Baydin,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of Oxford<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Sheila McIlrath<\/strong>,\u00a0<\/span><span class=\"affiliation\">University of Toronto, Vector Institute<\/span><\/p>\n<p class=\"author\"><strong><span class=\"name\">Qiqi Gao,\u00a0<\/span><\/strong><span class=\"affiliation\">East China University of Political Science and Law<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Ashwin Acharya,<\/strong>\u00a0<\/span><span class=\"affiliation\">Institute for AI Policy and Strategy<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>David Krueger,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of Cambridge<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Anca Dragan,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of California, Berkeley<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Philip Torr,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of Oxford<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Stuart Russell,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of California, Berkeley<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Daniel Kahneman<\/strong>,\u00a0<\/span><span class=\"affiliation\">Princeton University, School of Public and International Affairs<\/span><\/p>\n<p class=\"author\"><span class=\"name\"><strong>Jan Brauner*,<\/strong>\u00a0<\/span><span class=\"affiliation\">University of Oxford<\/span><\/p>\n<p class=\"author\"><strong><span class=\"name\">S\u00f6ren Mindermann*,\u00a0<\/span><\/strong><span class=\"affiliation\">Mila \u2013 Quebec AI Institute<\/span><\/p>\n<p>______________________________________<\/p>\n<p style=\"text-align: center;\"><em><strong>Download PDF file: <\/strong><\/em><strong><em><a target=\"_blank\" href=\"https:\/\/managing-ai-risks.com\/managing_ai_risks.pdf\" >Managing AI Risks<\/a><\/em><\/strong><\/p>\n<p><a target=\"_blank\" href=\"https:\/\/managing-ai-risks.com\/\" >Go to Original &#8211; managing-ai-risks.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>24 Oct 2023 &#8211; In 2019, GPT-2 could not reliably count to ten. Only four years later, deep learning systems can write software, generate photorealistic scenes, advise on intellectual topics, and combine language and image processing to steer robots. In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses.<\/p>\n","protected":false},"author":4,"featured_media":240236,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3078],"tags":[1733,1778,461],"class_list":["post-246960","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-ai","tag-artificial-intelligence-ai","tag-conflict-analysis","tag-technology"],"_links":{"self":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/246960","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/comments?post=246960"}],"version-history":[{"count":2,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/246960\/revisions"}],"predecessor-version":[{"id":246962,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/246960\/revisions\/246962"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/media\/240236"}],"wp:attachment":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/media?parent=246960"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/categories?post=246960"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/tags?post=246960"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}