{"id":238712,"date":"2023-07-10T12:00:41","date_gmt":"2023-07-10T11:00:41","guid":{"rendered":"https:\/\/www.transcend.org\/tms\/?p=238712"},"modified":"2023-07-06T09:57:20","modified_gmt":"2023-07-06T08:57:20","slug":"authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books","status":"publish","type":"post","link":"https:\/\/www.transcend.org\/tms\/2023\/07\/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books\/","title":{"rendered":"Authors File a Lawsuit against OpenAI for Unlawfully \u2018Ingesting\u2019 Their Books"},"content":{"rendered":"<blockquote><p><em>Mona Awad and Paul Tremblay allege that their books, which are copyrighted, were \u2018used to train\u2019 ChatGPT because the Chatbot generated \u2018very accurate summaries\u2019 of the works.<\/em><\/p><\/blockquote>\n<div id=\"attachment_238713\" style=\"width: 410px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/Mona-Awad-Paul-Tremblay.webp\" ><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-238713\" class=\"wp-image-238713\" src=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/Mona-Awad-Paul-Tremblay-1024x614.webp\" alt=\"\" width=\"400\" height=\"240\" srcset=\"https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/Mona-Awad-Paul-Tremblay-1024x614.webp 1024w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/Mona-Awad-Paul-Tremblay-300x180.webp 300w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/Mona-Awad-Paul-Tremblay-768x461.webp 768w, https:\/\/www.transcend.org\/tms\/wp-content\/uploads\/2023\/07\/Mona-Awad-Paul-Tremblay.webp 1240w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-238713\" class=\"wp-caption-text\">Mona Awad and Paul Tremblay.<br \/>Composite: Angela Sterling, Titan Books<\/p><\/div>\n<p><em>5 Jul 2023 &#8211; <\/em>Two authors have filed a lawsuit against OpenAI, the company behind the artificial intelligence tool <a target=\"_blank\" href=\"https:\/\/www.theguardian.com\/technology\/2023\/jan\/13\/chatgpt-explainer-what-can-artificial-intelligence-chatbot-do-ai\" >ChatGPT<\/a>, claiming that the organisation breached copyright law by \u201ctraining\u201d its model on novels without the permission of authors.<\/p>\n<p class=\"dcr-1a568om\">Mona Awad, whose books include Bunny and 13 Ways of Looking at a Fat Girl, and Paul Tremblay, author of The Cabin at the End of the World, filed the class action complaint to a San Francisco federal court last week.<\/p>\n<p class=\"dcr-1a568om\">ChatGPT allows users to ask questions and type commands into a <a target=\"_blank\" href=\"https:\/\/openai.com\/blog\/chatgpt\" >chatbot<\/a> and responds with text that resembles human language patterns. The model underlying ChatGPT is trained with data that is publicly available on the internet.<\/p>\n<p class=\"dcr-1a568om\">Yet, Awad and Tremblay believe their books, which are copyrighted, were unlawfully \u201cingested\u201d and \u201cused to train\u201d ChatGPT because the chatbot generated \u201cvery accurate summaries\u201d of the novels, according to the <a target=\"_blank\" href=\"https:\/\/llmlitigation.com\/pdf\/03223\/tremblay-openai-complaint.pdf\" >complaint<\/a>. Sample summaries are included in the lawsuit as <a target=\"_blank\" href=\"https:\/\/llmlitigation.com\/pdf\/03223\/tremblay-openai-complaint-exhibits.pdf\" >exhibits<\/a>.<\/p>\n<p class=\"dcr-1a568om\">This is the first lawsuit against <a target=\"_blank\" href=\"https:\/\/www.theguardian.com\/technology\/chatgpt\" >ChatGPT<\/a> that concerns copyright, according to Andres Guadamuz, a reader in intellectual property law at the University of Sussex. The lawsuit will explore the uncertain \u201cborders of the legality\u201d of actions within the generative AI space, he adds.<\/p>\n<p class=\"dcr-1a568om\">Books are ideal for training large language models because they tend to contain \u201chigh-quality, well-edited, long-form prose,\u201d said the authors\u2019 lawyers, Joseph Saveri and Matthew Butterick, in an email to the Guardian. \u201cIt\u2019s the gold standard of idea storage for our species.\u201d<\/p>\n<p class=\"dcr-1a568om\">The complaint said that <a target=\"_blank\" href=\"https:\/\/www.theguardian.com\/technology\/openai\" >OpenAI<\/a> \u201cunfairly\u201d profits from \u201cstolen writing and ideas\u201d and calls for monetary damages on behalf of all US-based authors whose works were allegedly used to train ChatGPT. Though authors with copyrighted works have \u201cgreat legal protection\u201d, said Saveri and Butterick, they are confronting companies \u201clike OpenAI who behave as if these laws don\u2019t apply to them\u201d.<\/p>\n<p class=\"dcr-1a568om\">However, it may be difficult to prove that authors have suffered financial losses specifically because of ChatGPT being trained on copyrighted material, even if the latter turned out to be true. ChatGPT may work \u201cexactly the same\u201d if it had not ingested the books, said Guadamuz, because it is trained on a wealth of internet information that includes, for example, internet users discussing the books.<\/p>\n<p class=\"dcr-1a568om\">OpenAI has become \u201cincreasingly secretive\u201d about its training data, said Saveri and Butterick. In papers released alongside early iterations of ChatGPT, OpenAI gave some clues as to the size of the \u201cinternet-based books corpora\u201d it used as training material, which it called only \u201cBooks2\u201d. The lawyers deduce that the size of this dataset \u2013 estimated to contain 294,000 titles \u2013 means the books could only be drawn from shadow libraries such as Library Genesis (LibGen) and Z-Library, through which books can be secured in bulk via torrent systems.<\/p>\n<p class=\"dcr-1a568om\">This case will \u201clikely rest on whether courts view the use of copyright material in this way as \u2018fair use\u2019\u201d, said Lilian Edwards, professor of law, innovation and society at Newcastle University, \u201cor as simple unauthorised copying.\u201d Edwards and Guadamuz both emphasise that a similar lawsuit brought in the UK would not be decided in the same way, because the UK does not have the same \u201cfair use\u201d defence.<\/p>\n<p class=\"dcr-1a568om\">The UK government has been \u201ckeen on promoting an exception to copyright that would allow free use of copyright material for text and data mining, even for commercial purposes,\u201d said Edwards, but the reform was \u201cspiked\u201d after authors, publishers and the music industry were \u201cappalled\u201d.<\/p>\n<p class=\"dcr-1a568om\">Since ChatGPT was launched in November 2022, the publishing industry has been in discussion over how to protect authors from the potential harms of AI technology. Last month, The Society of Authors (SoA) published a <a target=\"_blank\" href=\"https:\/\/www2.societyofauthors.org\/2023\/06\/07\/artificial-intelligence-practical-steps-for-members\/\" >list of<\/a> \u201cpractical steps for members\u201d to \u201csafeguard\u201d themselves and their work. Yesterday, the SoA\u2019s chief executive, Nicola Solomon told the trade magazine the Bookseller that the organisation was \u201cvery pleased\u201d to see authors suing OpenAI, having \u201clong been concerned\u201d about the \u201cwholesale copying\u201d of authors\u2019 work to train large language models.<\/p>\n<p class=\"dcr-1a568om\">Richard Combes, head of rights and licensing at the Authors\u2019 Licensing and Collecting Society (ALCS), said that current regulation around AI is \u201cfragmented, inconsistent across different jurisdictions and struggling to keep pace with technological developments\u201d. He encouraged policymakers to consult <a target=\"_blank\" href=\"https:\/\/www.alcs.co.uk\/news\/our-principles-for-ai-and-authors#:~:text=We%20believe%20that%20AI%20has,effective%20and%20appropriate%20policy%20framework\" >principles<\/a> that the ALCS has drawn up which \u201cprotect the true value that human authorship brings to our lives and, notably in the case of the UK, our economy and international identity\u201d.<\/p>\n<p class=\"dcr-1a568om\">Saveri and Butterick believe that AI will eventually resemble \u201cwhat happened with digital music and TV and movies\u201d and comply with copyright law. \u201cThey will be based on licensed data, with the sources disclosed.\u201d<\/p>\n<p class=\"dcr-1a568om\">The lawyers also noted it is \u201cironic\u201d that \u201cso-called \u2018artificial intelligence\u2019\u201d tools rely on data made by humans. \u201cTheir systems depend entirely on human creativity. If they bankrupt human creators, they will soon bankrupt themselves.\u201d<\/p>\n<p class=\"dcr-1a568om\">OpenAI were approached for comment.<\/p>\n<p>___________________________________________<\/p>\n<p style=\"padding-left: 40px;\"><em>Ella Creamer is a freelance politics and culture journalist.<\/em><\/p>\n<p><a target=\"_blank\" href=\"https:\/\/www.theguardian.com\/books\/2023\/jul\/05\/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books\" >Go to Original &#8211; theguardian.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>5 Jul 2023 &#8211; Mona Awad and Paul Tremblay allege that their books, which are copyrighted, were \u2018used to train\u2019 ChatGPT because the Chatbot generated \u2018very accurate summaries\u2019 of the works.<\/p>\n","protected":false},"author":4,"featured_media":238713,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3078],"tags":[1733,641,3022,2994,651],"class_list":["post-238712","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-ai","tag-artificial-intelligence-ai","tag-books","tag-chatbot","tag-chatgpt","tag-justice"],"_links":{"self":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/238712","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/comments?post=238712"}],"version-history":[{"count":1,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/238712\/revisions"}],"predecessor-version":[{"id":238714,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/posts\/238712\/revisions\/238714"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/media\/238713"}],"wp:attachment":[{"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/media?parent=238712"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/categories?post=238712"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.transcend.org\/tms\/wp-json\/wp\/v2\/tags?post=238712"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}