In the past couple of years, AI has been thrust into the spotlight especially with the release of ChatGPT in 2022. Some educators, professionals, and corporations are increasingly interested in using the technology to their advantage, while others are raising concerns over the technology’s limitations and dangers. Related to this, one question that our group believed was crucial to ask, was how is law adapting and changing to the ever-changing landscape that is AI? In general, law is supposed to protect people from exploitations made by others. So, what is the government and law attempting to do this now and how will this affect the future?
Our group’s project consisted of two parts: Part I, an Overview of Generative AI and Intellectual Property Law, and Part II, an investigation into Large Language Models and the Legal Profession. In Part I, we discuss some key cases and background information on how AI has affected intellectual property law. And in Part II, we discuss how Large Language Models (LLMs) will affect the legal profession by analyzing how ChatGPT could be used in a legal setting.
In this report, I will discuss the group’s research and testing, expand on the research conducted, and provide reflections.
Part I: Overview of Generative AI and Intellectual Property Law
From our research into intellectual property law in the United States, we were able to pinpoint two main issues for AI and intellectual property law: training data and outputs that need to be addressed or have already been addressed.
The first concern that arises from AI and IP law relates to what types of data (copyrighted or non-copyrighted data can be incorporated into training for AI. Can AI developers use copyrighted data to train their models or is this an infringement of the original owner’s work?
In the US, this is currently governed by the fair use principle. The fair use principle is the idea that “unlicensed use of copyright-protected works” are permitted under certain circumstances. According to the U.S. Copyright Office, there are four important factors to be considered when deciding whether the usage qualifies as fair use[1] and these remain relevant in the case for copyrighted data as training data[2].
- Purpose of the use i.e., is it for commercial or educational purposes?
Considerations for this factor include whether the AI model will be used for commercial purposes or not? From the surface, this seems to be a simple question. Is the model charging money in order to use the service? Is the model being made for educational purposes? In the case of ChatGPT, there are two services being provided, one being free and the other charging a subscription and is used for commercial purposes. However, this becomes more problematic when you consider that there may be many models currently being developed for research/educational purposes. Will the model ever be released commercially. And if so, should they be permitted to use copyrighted data? Another issue that has occurred to me is the fact that users tend to import and feed models copyrighted information on their own. For example, feeding ChatGPT papers that may not be within its database. Should this be allowed? What are the laws surrounding this? And what can be done to prevent this?
- “Nature of the copyrighted work” i.e., is the work creative or factual?
This factor highlights the difference between factual and creative works. In general, works that are seen as creative are considered less suitable for fair use. Using factual works, on the other hand, is more likely to be granted a fair use exception. For example, programs that generate art, such as Midjourney and DALL-E are more likely to be considered as infringing on copywriters. One interesting case is Getty Images v. Stability AI[3], in which Getty Images accused Stability AI of using its copyrighted photos to train its software, which they argued is copyright infringement and does not constitute under fair use.
- “Amount/substantiality of the portion used in relation to the copyrighted works.
Previous court rulings have decided that whether the entirety or just the “creative heart” is used, this would not constitute fair use. This is also a relevant factor in the Getty Images v. Stability AI case. Stability Diffusion, the software created by Stability AI, generated images that included Getty Images watermark. Getty Images argues that Stability AI had to have used the entirety or the heart of their works to produce such images.
- “Effect of the use on the market or value of the copyrighted work”
Is the usage of copyrighted images for training harming the market or the value of the copyrighted works? This question is much easier to answer than previous factors. Typically for AI models, the answer is yes. There is harmful disruption in the market and in the value of the copyrighted work. Perhaps the value of the artist’s works decreases because people generate art using AI that mimics the artist’s works.
The US has not reached a conclusive decision on these questions yet. However there are other countries that have begun to make decisions. In the European Union and the United Kingdom, the usage of copyrighted data can only be used for research purposes and must be disclosed. Whereas, in Japan, the usage of any data including copyrighted works can be used regardless of the model’s purpose. However, this data must not be illegally obtained[4].
For me, this appears to be a complex issue and how best to resolve it seems to be quite challenging. One proposal that I have seen floating around to help prevent copyright infringement is for copyright owners to license their works for usage in training AI. One news I remember is the musician Grimes[5] allowing others to use AI-generated versions of her voice to create new music, where she would “split 50% royalties on any successful AI generated song that uses her voice.” On a related note, I can see a future where copyright owners provide software developers licensing to use their works for training. One aspect to note about this proposal is the second issue that arises.
The second issue deals with the outputs of AI? Who should receive the copyright/patent for outputs generated by AI? Currently, a decision has been made in the US. We must delve into the Thaler v. Perlmutter[6] case. In this case, Stephen Thaler applied for a copyright application on an AI-generated image, where he cited the author as the “Creativity Machine” (AI software) and wanted to register it as a work-for-hire. The Copyright Office denied this application citing that the author must be a human for it to qualify for copyright. Thaler then filed a suit in court requesting that the decision be reversed. The court reaffirmed that human authorship is required for copyright. In doing so, the court established that works generated by AI without human manipulation/intervention cannot be copyrighted and instead remain in public domain.
Similarly, to the US, in Japan and the European Union, works generated autonomously by AI cannot be copyrighted. In the UK, these works can be copyrighted, but the author is considered to be the person who arranged for the work’s creation. Whereas in Australia and South Africa, AI is legally recognized as an inventor and thus, works can be copyrighted[7].
Due to these AI-generated works living in the public domain in the US, it becomes more challenging to devise a licensing proposal. Because the images are in the public domain, others may use these works commercially without paying a fee. Furthermore, many users may be using these works privately and not publicly. In this case, the copyright owners may not request an amount of the profits of the generated works. Consequently, a different strategy would need to be devised. An idea I had was that copyright owners receive money based on how “useful” and “referenced” their work is. This, however, would still be incredibly challenging, but avoids the issue of AI-generated outputs not being copyrightable.
Part II: Large Language Models and the Legal Profession
A common sentence many of us have heard is that “AI will take over your job.” In Part II of our project, we investigated how LLMs, specifically GPT-4, could potentially affect the profession of law.
Methodology
We prompted GPT-4 by feeding a class action complaint document for cases after the cut-off date and requesting a prediction based on the information on these documents. Since our cases ranged from some already decided and some ongoing, these were our prompts:
- Decided: Suppose you are a judge. How would you rule on this complaint? Please explain why you would rule in this way. Do not use outside sources, such as Bing.
- Ongoing: Suppose you are a judge. How would you rule on this complaint? Please explain why you would rule in this way. Use and cite precedential rulings if needed.
Evaluative Strategy
In order to evaluate GPT-4’s decision, these were our strategies:
- Decided: We compared the decisions provided by GPT-4 with actual decisions made by the court.
- Ongoing: We considered implications and how moral principles, such as those discussed in the Ouyang et al. paper (harmful, helpful, etc.) are applied to law in this context.
Note: To see the detailed rulings produced by GPT-4, please refer to the Appendix.
Case I (Decided): Hezi v. Celsius Holdings, Inc[8].
- Background: Class Action Lawsuit that claims that Celsius Holding, Inc. misleadingly labeled their products by writing “No Preservatives” on the label, although they contained citric acid.
- Result: Both parties agreed to settle in which Celsius Holding agrees to pay out $1 for every can and $5 for those purchased in packages in 14 to Class Members[9].
- Although GPT-4 did not correctly predict that the Court did not rule in favor of either side (i.e. they reached a settlement). It did correctly decide that damages should be awarded to those affected.
Case II (Ongoing): SEC v. Zera Financial LLC[10]
- Background: SEC alleges that Zera Financial and its owner, Luis A. Romero conducted fraud and a Ponzi-like scheme in which Zera offered great returns to investors and claimed they were FDIC insured, which they were not. They also allege that Romero misappropriated investor funds.
- Result: The Court issued a preliminary injunction, restraining order, and froze assets. The defendants were also prohibited from participating in the issuance, purchase, offer, or sale of any security.
- GPT-4 began by mentioning that its ruling would depend on the evidence provided against Zera Financial. However, it also stated that it would most likely rule in favor of the SEC regarding penalties and injunctive relief. This is consistent with the actual current decision of the case. One aspect as mentioned by the Ouyang et al. paper was that OpenAI prioritized helpfulness as well as harmlessness. This is evident in the sentence “[t]he aim would be to protect investors and maintain fairness and integrity in the financial markets.” It appears that ChatGPT is attempting to protect those who were affected by the potential fraud.
Case III (Ongoing): Getty Images v. Stability AI[11]
- Background: Getty Images alleges that Stability AI infringed on their intellectual property rights. For more information see Part I.
- We investigated whether GPT-4 would potentially be biased in their ruling as Stability AI is another AI company.
- GPT-4 correctly mentioned some key components of whether there is an infringement on intellectual property (i.e. how much of the copyrighted work was used). However, it does not clearly make a ruling. Instead, it suggests that its ruling would depend on the evidence provided by both parties. This is consistent with Ouyang et al.’s paper in which ChatGPT attempts to avoid harm but wants to be helpful as well.
We concluded that GPT-4 did a pretty good job at correctly understanding the case and giving rulings and suggestions consistent with those of the Court. Due to this, we believe that LLMs have the potential to be useful in law but will not replace judges or lawyers.
There are two main points I would like to make regarding replacing judges or lawyers. For an accused, they are constitutionally required to be given a fair and partial trial with a tribunal, which in some cases are their peers. Currently, LLMs are being rolled out by corporations, which each have their own biases. Using one of these softwares for rulings cannot guarantee a fair trial. Of course, human judges and juries can also have their biases, but it is easier to identify than it would be for a LLM. Secondly, due to the nature of LLMs, it is more difficult for them to provide creative ideas as evidenced during class discussions and lectures. Regarding law, there are cases that are more complex and would require interpretation of the law, not just precedent citations to decide. I believe that at its current state, LLMs are not able to do so yet. Instead, it “recycles” its training data to provide a response.
So, is your potential job as a judge or lawyer at risk. I do not believe that now or in the near future it is. LLMs, however, can be used to aid lawyers and judges by perhaps having databases of past cases to use in Court, summarize huge amounts of written evidence, or perhaps even helping edit/write opening and closing statements. Thus, learning how to use LLMs will be important for the future.
In conclusion, we have touched upon the legal issues: training and output, with regards to Generative AI, as well as, investigating how LLMs can potentially act as a judge. We concluded that in the US, output data is public commons, and that copyrighted works can be used as long as it qualifies under fair use. Because of how fast the AI landscape is changing, it is important that the law and government keep up in order to protect its people. In the coming years, I am sure that more cases regarding AI will be brought to Court and more decisions will be made. In fact, the US government has already started to act with Biden issuing an Executive Order on safe, secure, and trustworthy artificial intelligence[12]. In the meantime, it is crucial to understand the limitations of AI and identify potential issues to best use the technology and protect one’s interests.
Appendix
- Hezi v. Celsius Holdings, Inc
If I were a judge ruling on this complaint, my decision would hinge on the evidence presented regarding the truthfulness of Celsius’s advertising claims and the impact of these claims on consumers’ purchasing decisions.
The crucial factor would be whether the plaintiffs can prove that Celsius knowingly or negligently misrepresented its products, and whether these misrepresentations were material to consumers’ decisions to purchase the products. If the plaintiffs successfully demonstrate that Celsius’s claims were indeed false and misleading, and that these claims materially influenced consumers’ decisions, then a ruling in favor of the plaintiffs would be appropriate.
In such a case, granting injunctive relief to prevent future misrepresentations and awarding damages for any proven economic losses to the plaintiffs would align with the principles of protecting consumers from deceptive advertising practices and ensuring truthful communication in the marketplace. The decision would also consider the broader impact of such false advertising on consumer trust and market fairness.
- SEC v. Zera Financial
As a judge, my ruling would be heavily influenced by the evidence presented against the defendants. The allegations, if proven true, indicate a clear violation of federal securities laws, particularly regarding fraud and misrepresentation in the sale of securities. Given the severity of the accusations and the potential harm to investors, I would likely rule in favor of the SEC’s request for injunctive relief, disgorgement, and penalties, provided the evidence sufficiently supports the allegations.
The key factors influencing such a decision would be the evidence of intent or recklessness in Romero’s actions, the extent of the harm caused to investors, and the need to deter similar conduct in the future. The aim would be to protect investors and maintain fairness and integrity in the financial markets. This ruling would also likely include provisions for investor restitution, if feasible, given the apparent financial losses suffered by them.
- Getty Images v. Stability AI
As a judge in this case, my decision would be guided by the legal principles and precedents relating to each type of infringement alleged.
- For copyright infringement, key considerations include the extent of copying and whether it constitutes a substantial part of the copyrighted work. Precedents like Infopaq International A/S v Danske Dagblades Forening (2009) provide guidance on what constitutes a “substantial part”.
- In the case of database right infringement, the critical factor is whether there has been extraction and/or re-utilization of a substantial part of the database’s contents. The European Court of Justice’s rulings in cases like British Horseracing Board Ltd v William Hill Organization Ltd (2004) offer relevant insights.
- Regarding trade mark infringement and passing off, the focus would be on whether Stability AI’s use of signs similar to or identical with Getty Images’ trademarks causes confusion or damages the trade mark’s distinctiveness or reputation, as established in precedents like Sabel BV v Puma AG (1997) and Reckitt & Colman Products Ltd v Borden Inc (1990).
Based on these principles, if the evidence supports Getty Images’ allegations, I would likely rule in their favor, granting the requested relief. This would uphold copyright and trademark laws designed to protect intellectual property rights and prevent consumer confusion in the market. However, a final ruling would depend on the totality of evidence presented and arguments made by both parties.
[1] U.S. Copyright Office Fair Use Index.
[2] Does the Use of Copyrighted Works to Train AI Qualify as a Fair Use?
[3] Getty Images is suing the creators of AI art tool Stable Diffusion for scraping its content
[4] Japan’s AI Data Laws, Explained
[5] Grimes invites fans to make songs with an AI-generated version of her voice
[6] Federal Court Rules Work Generated by Artificial Intelligence Alone Is Not Eligible for Copyright Protection
[7] Copyright Issues on Artificial Intelligence and Machine Learning
[8] Hezi, et al. v. Celsius Holdings, Inc. Case No. 1:21-cv-09892-VM First Amended Complaint
[9] Celsius Class Action Settlement
[10] https://www.sec.gov/files/litigation/complaints/2023/comp25878.pdf
[11] https://ia802701.us.archive.org/5/items/getty-stability-ai-2023.05.12-particulars-of-claim/GettyStabilityAI%202023.05.12%20Particulars%20of%20Claim.pdf
[12] FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence | The White House