US Copyright Office found AI companies breach copyright. Its boss was fired

436 points 319 comments a day ago
jhaile

One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit. This will mean that US LLM companies will either fall behind or be too expensive. Which means China and other countries will probably surge ahead in AI, at least in terms of how useful the AI is.

That is not to say that we shouldn't do the right thing regardless, but I do think there is a feeling of "who is going to rule the world in the future?" tha underlies governmental decision-making on how much to regulate AI.

oooyay

Well hell, by that logic average citizens should be able to launder corporate intellectual property because China will never follow suit in adhering to intellectual property law. I'm game if you are.

jowea

Isn't that sort of logic precisely why China doesn't adhere to IP law?

oooyay

Yes, I was being a bit facetious. It was snark intended to point out that corporations don't get to have their cake and eat it too. Either everything is free and there are no boundaries or we live by our own principles.

gruez

>It was snark intended to point out that corporations don't get to have their cake and eat it too.

"have their cake and eat it too" allegations only work if you're talking about the same entity. The copyright maximalist corporations (ie. publishers) aren't the same as the permissive ones (ie. AI companies). Making such characterizations make as much sense as saying "citizens don't get to eat their cake and eat it too", when referring to the fact that citizens are anti-AI, but freely pirate movies.

_aavaa_

Yes they are. Look at what happened when deepseek came out. Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony

gruez

>Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony

Can you link to the exact comments he made? My impression was that he was upset at the fact that they broke T&C of openai, and deepseek's claim of being much cheaper to train than openai didn't factor in the fact that it requried openai's model to bootstrap the training process. Neither of them directly contradict the claim that training is copyright infringement.

rubslopes

Another example: Microsoft suing pirated Windows distributors.

r053bud

It’s barely facetious though. What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?

gruez

>What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?

Nothing. You don't even need the LLC. I don't think anyone got prosecuted for only downloading. All prosecutions were for distribution. Note that if you're torrenting, even if you stop the moment it's finished (and thus never goes to "seeding"), you're still uploading, and would count as distribution for the purposes of copyright law.

Pooge

Which is still what Facebook did, if I'm not mistaken. There's no way they torrented and managed to upload less than 1 bit.

FireBeyond

You're right. They claimed they made efforts to minimize seeding, but minimal is not none, as you say.

gruez

You can make a patched torrent client that never uploads any pieces to peers. It'd definitely be within Meta's capability to do so. The real problem is that unlike typical torrenting lawusits, they weren't caught red-handed in the act, and would therefore be hard to go after them. This might seem unfair, but it's not any different than you openly posting on Reddit that you torrent, but it'd be tough for rights holders to go after you even with such admission.

breakingcups

> Previously, a Meta executive in charge of project management, Michael Clark, had testified that Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur," which seems to support authors' claims that some seeding occurred. And an internal message from Meta researcher Frank Zhang appeared to show that Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers. Once this information came to light, authors asked the court for a chance to depose Meta executives again, alleging that new facts "contradict prior deposition testimony."

gruez

>Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur,"

>Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers

Sounds like they used a VPN, set the upload speed to 1kb/s and stopped after the download is done. If the average Joe copied that setup there's 0% chance he'd get sued, so I don't really see a double standard here. If anything, Meta might get additional scrutiny because they're big enough of a target that rights holders will go through the effort of suing them.

FireBeyond

> If the average Joe copied that setup there's 0% chance he'd get sued

Citation needed. RIAA used to just watch torrents and sent cease and desists to everyone who connected, whether for a minute or for months. It was very much a dragnet, and I highly doubt there was any nuance of "but Your Honor, I only seeded 1MB back so it's all good".

gruez

Did you miss the part about using a VPN?

snozolli

Either everything is free and there are no boundaries or we live by our own principles.

Or C) large corporations (and the wealthy) do whatever they want while you still get extortion letters because your kid torrented a movie.

They really do get to have their cake and eat it too, and I don't see any end to it.

rollcat

Well I always felt rebellious about the contemporary face of "rules for thee but not for me", specifically regarding copyright.

Musicians remain subject to abuse by the recording industry; they're making pennies on each dollar you spend on buying CDs^W^W streaming services. I used to say, don't buy that; go to a concert, buy beer, buy merch, support directly. Nowadays live shows are being swallowed whole through exclusivity deals (both for artists and venues). I used to say, support your favourite artist on Bandcamp, Patreon, etc. But most of these new middlemen are ready for their turn to squeeze.

And now on top of all that, these artists' work is being swallowed whole by yet another machine, disregarding what was left of their rights.

What else do you do? Go busking?

johnnyanmac

We regulate it like how we did centuries ago that lead to copyright. If we already have rules we enforce it. If no one in power wants to, we put in people who will.

In the end this all comes down to needing the people to care enough.

seanmcdirmid

In the long run private IP will eventually become very public despite laws you have, it’s been like that since the Stone Age. The American Industrial Revolution was built partially on stolen IP from Britain. The internet has just sped up diffusion. You can stop it if you are willing to cut the line, but legal action is only some friction and even then only in the short term

Bjorkbat

I broadly agree in that sure, unfettered access to copyrighted material will AI more capable, but more capable of what exactly?

For national security reasons I'm perfectly fine with giving LLMs unfettered access to various academic publications, scientific and technical information, that sort of thing. I'm a little more on the fence about proprietary code, but I have a hard time believing there isn't enough code out there already for LLMs to ingest.

Otherwise though, what is an LLM with unfettered access to copyrighted material better at vs one that merely has unfettered access to scientific / technical information + licensed copyrighted material? I would suppose that besides maybe being a more creative writer, the other LLM is far more capable of reproducing copyrighted works.

In effect, the other LLM is a more capable plagiarism machine compared to the other, and not necessarily more intelligent, and otherwise doesn't really add any more value. What do we have to gain from condoning it?

I think the argument I'm making is a little easier to see in the case of image and video models. The model that has unfettered access to copyrighted material is more capable, sure, but more capable of what? Capable of making images? Capable of reproducing Mario and Luigi in an infinite number of funny scenarios? What do we have to gain from that? What reason do we have for not banning such models outright? Not like we're really missing out on any critical security or economic advantages here.

Teever

If common culture is an effective substrate to communicate ideas as in we can use shared pop culture references to make metaphors to explain complex ideas then the common culture that large companies have ensnared in excessively long copyrights and trademarks to generate massive profits is a useful thing for an LLM that is designed to convey ideas to have embedded in it.

If I'm learning about kinematics maybe it would be more effective to have comparisons to Superman flying faster than a speeding bullet and no amount of dry textbooks and academic papers will make up for the lack of such a comparison.

This is especially relevant when we're talking about science-fiction which has served as the inspiration for many of the leading edge technologies that we use including stuff like LLMs and AI.

Bjorkbat

Fair point, we use metaphor to explain and understand a variety of topics, and a lot of those metaphors are best understood through pop culture analogies.

A reasonable compromise then is that you can train an AI on Wikipedia, more-or-less. An AI trained this way will have a robust understanding of Superman, enough that it can communicate through metaphor, but it won't have the training data necessary to create a ton of infringing content about Superman (well, it won't be able to create good infringing content anyway. It'll probably have access to a lot of plot summaries but nothing that would help it make a particularly interesting Superman comic or video).

To me it seems like encyclopedias use copyrighted pop culture in a way that constitutes fair use, and so training on them seems fine as long as they consent to it.

1vuio0pswjnm7

The design, manufacture and supply of electronics is far more important than one particular usage, e.g, "LLMs". It has never been a requirement to violate copyrights to produce electronics, or computer software. In fact, arguably there would be no "MicroSoft" were it not for Gates' lobbying for the existence and enforcement of "software copyright". The "Windows" franchise, among others, relies on it. The irony of Microsoft's support for OpenAI is amusing. Copyright enforcement for me but not for thee.

bigbuppo

The real problem here is that AI companies aren't even willing to follow the norms of big business and get the laws changed to meet their needs.

johnnyanmac

This is pre iselt why we need proportional fees for courts. We can't just let companies treat the law as a cost benefits analysis. They should live in fear of a court result against their favor.

arp242

I get what you're saying, but this is just a race to the bottom, no?

It's annoying to see the current pushback against China focusing so much on inconsequential matters with so much nonsense mixed in, because I do think we do need to push back against China on some things.

therouwboat

If AI is so important, maybe it should be owned by the government and free to use for all citizens.

pc86

Name two non-military things that the government owns and aren't complete dumpster fires that barely do the thing they're supposed to do.

Even (especially?) the military is a dumpster fire but it's at least very good at doing what it exists to do.

pergadad

The government doesn't make tanks, it just shells out gigantic amounts to companies to make them.

That said, there are plenty of successful government actions across the world, where Europe or Japan probably have a good advantage with solid public services. Think streets, healthcare, energy infrastructure, water infrastructure, rail, ...

pc86

We're talking about the US government though

const_cast

There's nothing special about the US government that makes it uniquely shit.

The difference here is that we have people like yourself: those who have zero faith in our government and as such act as double agents or saboteurs. When people such as yourself gain power in the legislator they "starve the beast". Meaning, purposefully deconstruct sections of our government such that they have justification for their ideological belief that our government doesn't work.

You guys work backwards. The foregone conclusion is that government programs never work, and then you develop convoluted strategies to prove that.

azemetre

Medicaid, Medicare, and Social Security are all three programs that have massive approval from US citizens.

Even saying the military is a dumpster fire isn't accurate. The military has led trillions of dollars worth of extraction for the wealthy and elite across the globe.

In no sane world can you say that the ability to protect GLOBAL shipping lanes as a failure. That one service alone has probably paid for itself thousands of times.

We aren't even talking about things like public education (high school education use to be privatized and something only the elites enjoyed 100 years ago; yes public high school education isn't even 100 years old) or libraries or public parks.

---

I really don't understand this "gobermint iz bad" meme you see in tech circles.

I get more out of my taxes compared to equivalent corporate bills that it's laughable.

Government is comprised of people and the last 50 years has been the government mostly giving money and establishing programs to the small cohorts that have been hoarding all the wealth. Somehow this is never an issue with the government however.

Also never understand the arguments from these types either because if you think the government is bad then you should want it to be better. Better mostly meaning having more money to redistribute and more personal to run programs, but it's never about these things. It's always attacking the government to make it worse at the expense of the people.

sklargh

Hi. Assuming the US here. Depends on scope of analysis and dumpster fire definition.

1. The National Weather Service. Crown jewel and very effective at predicting the weather and forecasting life threatening events.

2. IRS, generally very good at collecting revenue. 3. National Interagency Fire Service / US Forest service tactical fire suppression

4. NTSB/US Chemicals Safety Board - Both highly regarded.

5. Medicare - Basically clung to with talons by seniors, revealed preference is that they love it.

6. DOE National Labs

7. NIH (spicy pick)

8. Highway System

There are valid critiques of all of these but I don’t think any of them could be universally categorized as a complete dumpster fire.

Buttons840

Weather Forecasting

nilamo

1) art museums, specifically the Smithsonian, but nearly every major city has a decent one.

2) state parks are pretty rad.

standardUser

The US federal government doesn't run most museums, but it does run the massive parks system with 20k employees (pre-Musk) and that system enjoys extremely high ratings from guests.

bongodongobob

National Weather Service

Library of Congress

National Park Service

U.S. Geological Survey (USGS)

NASA

Smithsonian Institution

Centers for Disease Control and Prevention (CDC)

Social Security Administration (SSA)

Federal Aviation Administration (FAA) air traffic control

U.S. Postal Service (USPS)

zem

post office and USDA (pre trump regime slash-and-burn of course)

lappet

Highways

johnnyanmac

Roads and telecommunication. You can argue they are indeed a dumpster fire, but imagine the alternatives full of tolls and incompatible wavelengths.

bilbo0s

That's a trick question.

I mean, name 2 things anyone owns that aren't dumpster fires?

Long time ago industrial engineers used to say, "Even Toyota has recalls."

Something being a dumpster fire is so common nowadays that you really need a better reason to argue in support of a given entity's ownership. (Or even non-ownership for that matter.)

bgwalter

The same president that is putting 145% tariffs on China could put 1000% tariffs on Internet chat bots located in China. Or order the Internet cables to be cut as a last resort (citing a national emergency as is the new practice).

I'm not sure at all what China will do. I find it likely that they'll forbid AI at least for minors so that they do not become less intelligent.

Military applications are another matter that are not really related to these copyright issues.

pc86

How exactly does one add a tariff to a foreign-based chat bot?

Ekaros

Build a big firewall. And then fine massively any ISP that allows traffic to reach bad hosts...

bilbo0s

You know that 20 bucks a month a lot of people pay for chatgpt?

Yeah..

you tax it if the "chatgpt" is foreign.

gruez

>Or order the Internet cables to be cut as a last resort (citing a national emergency as is the new practice).

what if they route through third countries?

asddubs

you could apply that same logic to any IP breaches though, not just AI

Ekaros

Your employee steals your source code and sells it to multiple competitors. Why should you have any right to go after those competitors?

johnnyanmac

Because they bought code from someone not authorized to sell it?

This isn't some new phenomenon. We do indeed seize assets from buyers if the seller stole them.

stonogo

Big "Mr. President, we cannot allow a mineshaft gap" energy going on, even if it's difficult for me personally to believe that LLMs contribute in any sense to ruling the world.

hulitu

> One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit.

Oh really ? They didn't had any problem when people installed copyrighted Windows to come after them. BSA. But now Microsoft turns a blind eye because it suits them.

mattxxx

Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

madeofpalk

> Humans can read a book, get inspiration, and write a new book and not be litigated against

Humans get litigated against this all the time. There is such thing as, charitably, being too inspired.

https://en.wikipedia.org/wiki/List_of_songs_subject_to_plagi...

jrajav

If you follow these cases more closely over time you'll find that they're less an example of humans stealing work from others and more an example of typical human greed and pride. Old, well established musicians arguing that younger musicians stole from them for using a chord progression used in dozens of songs before their own original, or a melody on the pentatonic scale that sounds like many melodies on the pentatonic scale do. It gets ridiculous.

Plus, all art is derivative in some sense, it's almost always just a matter of degree.

johnnyanmac

> art is derivative in some sense, it's almost always just a matter of degree.

Yes, that's why we judge on a case by case basis. The line is blurry.

I think when you're storing copies of such assets in your database that you're well past the line, though.

FireBeyond

To the point that Billy Joel "famously" credited the songwriter for one of his songs ("This Night") as "Billy Joel, Ludwig van Beethoven".

zelphirkalt

The law covers these cases pretty well, it is just that the law has very powerful extremely rich adversaries, whose greed has gotten the better of them again and again. They could use work released sufficiently long ago to be legally available, or they could take work released as creative commons, or they could run a lookup, to make sure to never output verbatim copies of input or outputs, that are within a certain string editing distance, depending on output length, or they could have paid people to reach out to all the people, whose work they are infringing upon. But they didn't do any of that, of course, because they think they are above the law.

nadermx

I'm confused, so you're saying its illegal? Because last I checked it's still in the process of going through the courts. And need we forget that copyright's purpose is to advance the arts and sciences. Fair use is codified into law, which states each case is seen on a use by use basis, hence the litigation to determine if it is in fact, legal.

mdhb

It’s so fucking obviously illegal when you think about it rationally for more than a few seconds. We aren’t even talking about “fair use” we are talking about how it works in practice which was Meta torrenting pirated books, never paying anyone a cent and straight up stealing the content at scale.

nadermx

The fact you are even using the word stealing, is telling to your lack of knowledge in this field. Copyright infringement is not stealing[0]. The propaganda of the copyright cartel has gotten to you.

[0] https://en.wikipedia.org/wiki/Dowling_v._United_States_(1985...

johnnyanmac

> Copyright infringement is not stealing

If we can agree that taking away of your time is theft (wage theft, to be precise), we as those who rely on intellect in our careers should be able to agree that the taking of our ideas is also theft.

>moved to the Ninth Circuit Court of Appeals, where he argued that the goods he was distributing were not "stolen, converted or taken by fraud", according to the language of 18 U.S.C. 2314 - the interstate transportation statute under which he was convicted. The court disagreed, affirming the original decision and upholding the conviction. Dowling then took the case to the Supreme Court, which sided with his argument and reversed the convictions.

This just tells me that the definition is highly contentious. Having the supreme court reverse a federal ruling already shows misalignment.

hulitu

> The fact you are even using the word stealing, is telling to your lack of knowledge in this field.

I agree. If you can pay the judge, the congress or the president, it is definitely not stealing. It is (the best) democracy (money can buy). /s

nadermx

So when someone steals something from you, you no longer have it. Yet here they paid the judge(s) because the person who's been "robbed" still has their thing?

Intralexical

A test to apply here: If you or I did this, would it be illegal? Would we even be having this conversation?

The law is supposed to be impartial. So if the answer is different, then it's not really a law problem we're talking about.

ashoeafoot

Obviously a revenue tracking weight should be trained in allowing the tracking and collection of all values generated from derivative works.

hochstenbach

Humans are not allowed to do what AI firms want to do. That was one of the copyright office arguments: a student can't just walk into a library and say "I want a copy of all your books, because I need them for learning".

Humans are also very useful and transformative.

timdiggerm

Or we could acknowledge that something could be a bad idea, despite its utility

ActionHank

Assuming this means copyright is dead, companies will be vary upset and patents will likely follow.

The hold US companies have on the world will be dead too.

I also suspect that media piracy will be labelled as the only reason we need copyright, an existing agency will be bolstered to address this concern and then twisted into a censorship bureau.

ceejayoz

> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

You're still not gonna be allowed to commercially publish "Hairy Plotter and the Philosophizer's Rock".

anigbrowl

You are if it's parody, cf 'Bored of the Rings'.

WesolyKubeczek

No, but you are most likely allowed to commercially publish "Hairy Potter and the Philosophizer's Rock", a story about a prehistoric community. The hero is literally a hairy potter who steals a rock from a lazy deadbeat dude who is pestering the rest of the group with his weird ideas.

zelphirkalt

Not sure what you are getting at?

dns_snek

The problem with this kind of analysis is that it doesn't even try to address the reasons why copyright exists in the first place. This belief that training LLMs on content without permission should be allowed is incompatible with the belief that copyright is useful, you really have to pick a lane here.

Go back to the roots of copyright and the answers should be obvious. According to the US constitution, copyright exists "To promote the Progress of Science and useful Arts" and according to the EU, "Copyright ensures that authors, composers, artists, film makers and other creators receive recognition, payment and protection for their works. It rewards creativity and stimulates investment in the creative sector."

If I publish a book and tech companies are allowed to copy it, use it for "training", and later regurgitate the knowledge contained within to their customers then those people have no reason to buy my book. It is a market substitute even though it might not be considered such under our current copyright law. If that is allowed to happen then investment will stop and these books simply won't get written anymore.

franczesko

> Piracy refers to the illegal act of copying, distributing, or using copyrighted material without authorization. It can occur in various forms

Professing of IP without a license AND offering it as a model for money doesn't seem like an unknown use-case to me

regularjack

Then they need to be changed for everyone and not just AI companies, but we all know that ain't happening.

SilasX

>My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Huh? If you agree that "learning from copyrighted works to make new ones" has traditionally not been considered infringement, then can you elaborate on why you think it fundamentally changes when you do it with bots? That would, if anything, seem to be a reversal of classic copyright jurisprudence. Up until 2022, pretty much everyone agreed that "learning from copyrighted works to make new ones" is exactly how it's supposed to work, and would be horrified at the idea of having to separately license that.

Sure, some fundamental dynamic might change when you do it with bots, but you need to make that case in an enforceable, operationalized way.

bitfilped

Sorry but AI isn't that useful and I don't see it becoming any more useful in the near term. It's taken since ~1950 to get LLMs working well enough to become popular and they still don't work well.

jeroenhd

Pirating movies is also useful, because I can watch movies without paying on devices that apps and accounts don't work on.

That doesn't make piracy legal, even though I get a lot of use out of it.

Also, a person isn't a computer so the "but I can read a book and get inspired" argument is complete nonsense.

Workaccount2

It's only complete non-sense if you understand how humans learn. Which we don't.

What we do know though is that LLMs, similar to humans, do not directly copy information into their "storage". LLMs, like humans, are pretty lossy with their recall.

Compare this to something like a search indexed database, where the recall of information given to it is perfect.

zelphirkalt

Well, you don't get to pick and choose in which situations an LLM is considered similar to a human being and in which not. If you argue that it similarly to a human is lossy, well let's go ahead and get most output checked by organizations and courts for violations of the law and licenses, just like human work is. Oh wait, I forgot, LLMs are run by companies with too much cash to successfully sue them. I guess we just have to live with it then, what a pity.

philipkglass

There are a couple of ways to theoretically prevent copyright violations in output. For closed models that aren't distributed as weights, companies could index perceptual hashes of all the training data at a granular level (like individual paragraphs of text) and check/retry output so that no duplicates or near-duplicates of copyrighted training data ever get served as a response to end users.

Another way would be to train an internal model directly on published works, use that model to generate a corpus of sanitary rewritten/reformatted data about the works still under copyright, then use the sanitized corpus to train a final model. For example, the sanitized corpus might describe the Harry Potter books in minute detail but not contain a single sentence taken from the originals. Models trained that way wouldn't be able to reproduce excerpts from Harry Potter books even if the models were distributed as open weights.

Workaccount2

Youtube built probably the most complex and proactive copyright system any organization has ever seen, for the sole purpose of appeasing copyright holders. There is no reason to believe they won't do the same thing for LLM output.

datavirtue

And everyone here is downloading every show and movie in existence without even a hint of guilt.

encipriano

Why would u have guilt of using an unlimited resource? Youre not stealing

apercu

>Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

Corporations are not humans. (It's ridiculous that they have some legal protections in the US like humans, but that's a different issue). AI is also not human. AI is also not a chipmunk.

Why the comparison?

stevenAthompson

Doing a cover song requires permission, and doing it without that permission can be illegal. Being inspired by a song to write your own is very legal.

AI is fine as long as the work it generates is substantially new and transformative. If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem.

Yes, I'm aware that machines aren't people and can't be "inspired", but if the functional results are the same the law should be the same. Vaguely defined ideas like your soul or "inspiration" aren't real. The output is real, measurable, and quantifiable and that's how it should be judged.

mjburgess

I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have. But go ahead, tare down the law to whatever inanity can be described by the trivial machines of the world's current popular charlatans. Presumably you weren't using society's presumption of your agency anyway.

stevenAthompson

> I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have.

Society doesn't need to measure my mind, they need to measure the output of it. If I behave like a conscious being, I am a conscious being. Alternatively you might phrase it such that "Anything that claims to be conscious must be assumed to be conscious."

It's the only answer to the p-zombie problem that makes sense. None of this is new, philosophers have been debating it for ages. See: https://en.wikipedia.org/wiki/Philosophical_zombie

However, for copyright purposes we can make it even simpler. If the work is new, it's not covered by the original copyright. If it is substantially the same, it isn't. Forget the arguments about the ghost in the machine and the philosophical mumbo-jumbo. It's the output that matters.

mjburgess

In your case, it isnt the output that matters. Your saying "I'm conscious" isn't why we attribute consciousness to you. We would do so regardless of your ability to verbalise anything in particular.

Your radical behaviourism seems an advantage to you when you want to delete one disfavoured part of copyright law, but I assure you, it isn't in your interest. It doesnt universalise well at all. You do not want to be defined by how you happen to verbalise anything, unmoored from your intention, goals, and so on.

The law, and society, imparts much to you that is never measured and much that is unmeasurable. What can be measured is, at least, extremely ambiguous with respect to those mental states which are being attributed. Because we do not attribute mental states by what people say -- this plays very little role (consider what a mess this would make of watching movies). And none of course in the large number of animals which share relevant mental states.

Nothing of relevance is measured by an LLM's output. It is highly unambigious: the LLM has no mental states, and thus is irrelevant to the law, morality, society and everything else.

It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort. Hype for LLMs does not lead to a credible theory of minds.

stevenAthompson

> We would do so regardless of your ability to verbalise anything in particular

I don't mean to say that they literally have to speak the words by using their meat to make the air vibrate. Just that, presuming it has some physical means, it be capable (and willing) to express it in some way.

> It's a obcene sort of self-injury to assume that whatever kind of radical behaviourism is necessary to hype the LLM is the right sort.

I appreciate why you might feel that way. However, I feel it's far worse to pretend we have some undetectable magic within us that allows us to perceive the "realness" of others peoples consciousness by other than physical means.

Fundamentally, you seem to be arguing that something with outputs identical to a human is not human (or even human like), and should not be viewed within the same framework. Do you see how dangerous an idea that is? It is only a short hop from "Humans are different than robots, because of subjective magic" to "Humans are different than <insert race you don't like>, because of subjective magic."

toast0

> Doing a cover song requires permission, and doing it without that permission can be illegal.

I believe cover song licensing is available mechanically; you don't need permission, you just need to follow the procedures including sending the licensing fees to a rights clearing house. Music has a lot of mechanical licenses and clearing houses, as opposed to other categories of works.

stevenAthompson

> you don't need permission, you just need to follow the procedures

Those procedures are how you ask for permission. As you say, it usually involves a fee but doesn't have to.

toast0

(in the US) Mechanical licenses are compulsory; you don't need permission, you can just follow the forms and pay the fees set by the Copyright Royalty Board (appointed by the Librarian of Congress). You can ask the rightsholder to negotiate a lower fee, but there's no need for consent of the rightsholder if you notify as required (within 30 days of recording and before distribution) and pay the set fees.

stevenAthompson

Thanks for clarifying. Sometimes I forget that HN has a lot experts floating around who take things in a very literal and legalistic way. I was speaking in more general terms, and missed that you were being very precise with your language.

Compulsory licenses are interesting aren't they? It just feels wrong. If Metallica doesn't want me to butcher their songs, why should the be forced to allow it?

toast0

They are very interesting. IMHO, it's a nice compromise between making sure the artists are paid for their work, and giving them complete control over their work. Licensing for radio-style play is also compulsory, and terrestrial radio used to not even have to pay the recording artists (I think this changed?), but did have to track and pay to ASCAP.

As a consumer, it would amazing if there were compulsory licenses for film and tv; then we wouldn't have to subscribe to 70 different services to get to the things we want to see. And there would likely be services that spring up to redistribute media where the rightsholders aren't able to or don't care to; it might be pulled from VHS that fans recorded off of TV in the old days, but at least it'd be something.

skolskoly

Any live band performing a song is subject to mechanical licensing as much as a recording artist. Typically the venue pays it, just like how radio stations pay royalties. This system exists because historically, that's how music reproduction worked. You hire some musicians to play the music you want to hear. Copyright applied to the score, the lyrics, and so on. The 'mechanical' rights had to come later, because recording hadn't been invented yet!

datavirtue

"If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem."

Why is that? Seems all logic gets thrown out the window when invoking AI around here. References are given. If the user publishes the output without attribution, NOW you have a problem. People are being so rabid and unreasonable here. Totally bat shit.

stevenAthompson

> If the user publishes the output without attribution, NOW you have a problem.

I didn't meant to imply that the AI can't quote Shakespeare in Context, just that it shouldn't try to pass off Shakespeare as it's own or plagiarize huge swathes of the source text.

> People are being so rabid and unreasonable here.

People here are more reasonable than average. Wait until mainstream society starts to really feel the impact of all this.

vessenes

Thank you - a voice of sanity on this important topic.

I understand people who create IP of any sort being upset that software might be able to recreate their IP or stuff adjacent to it without permission. It could be upsetting. But I don't understand how people jump to "Copyright Violation" for the fact of reading. Or even downloading in bulk. The Copyright controls, and has always controlled, creation and distribution of a work. In the nature even of the notice is embedded the concept that the work will be read.

Reading and summarizing have only ever been controlled in western countries via State's secrets type acts, or alternately, non-disclosure agreements between parties. It's just way, way past reality to claim that we have existing laws to cover AI training ingesting information. Not only do we not, such rules would seem insane if you substitute the word human for "AI" in most of these conversations.

"People should not be allowed to read the book I distributed online if I don't want them to."

"People should not be allowed to write Harry Potter fanfic in my writing style."

"People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics, the societal tradeoffs we've made so far, and is able to discuss where we might want to go, and what would be best.

caconym_

If it was as obvious as you claim, the legal issues would already be settled, and your characterization of what LLMs are doing as "reading and summarizing" is hilariously disingenuous and ignores essentially the entire substance of the debate (which is happening not just on internet forums but in real courts, where real legal professionals and scholars are grappling with how to fit AI into our framework of existing copyright law, e.g.^[1]).

Of course, if you start your thought by dismissing anybody who doesn't share your position as not sane, it's easy to see how you could fail to capture any of that.

^[1] https://arstechnica.com/tech-policy/2025/05/judge-on-metas-a...

datavirtue

Exactly, it is an immense privilege to have your works preserved and promulgated through the ages for instant recall and automated publishing. It's literally what everyone wants. The creators and the consumers. The AI companies are not robbing your money or IP. Period.

jasonlotito

> But I don't understand how people jump to "Copyright Violation" for the fact of reading.

The article specificaly talks about the creation and distribution of a work. Creation and distribution of a work alone is not a copyright violation. However, if you take in input from something you don't own, and genAI outputs something, it could be considered a copyright violation.

Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.

> "People should not be allowed to read the book I distributed online if I don't want them to."

This is already done. It's been done for decades. See any case where content is locked behind an account. Only select people can view the content. The license to use the site limits who or what can use things.

So it's odd you would use "insane" to describe this.

> "People should not be allowed to write Harry Potter fanfic in my writing style."

Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it. Most cases of fan fiction are allowed because the author allows it. But no, generally, fan fiction is illegal. This is well known in the fan fiction community. Obviously, if you don't distribute it, that's fine. But we aren't talking about non-distribution cases here.

> "People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

Same with fan fiction. If you replicate a copyrighted piece of art, if you distribute it, that's illegal. If you simply do it for practice, that's fine. But no, if you go around replicating a painting and distribute it, that's illegal.

Of course, technically speaking, none of this is what gen AI models are doing.

> We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics

I agree. Personifying gen AI is useless. We should stick to the technical aspects of what it's doing, rather than trying to pretend it's doing human things when it's 100% not doing that in any capacity. I mean, that's fine for the the layman, but anyone with any ounce of technical skill knows that's not true.

Aerroon

>Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it.

Which is a clear failure of the copyright system. Millions of people are expanding our cultural artifacts with their own additions, but all of it is illegal, because they haven't waited another 100 years.

People are interested in these pieces of culture, but they're not going to remain interested in them forever. At least not interested enough to make their own contributions.

vessenes

> Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.

My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like. They sense and fear change. For instance here you say it's an issue when AI uses something as a source that you don't have Copyright to. Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue". What you said just isn't true. The copyright refers to the right to copy a work.

Distribution: Sure. License your content however you want. That said, in the US a license prohibiting you from READING something just wouldn't be possible. You can limit distribution, copying, etc. This is how journalists can write about sneak previews or leaked information or misfiled court documents released when they should be under seal. The leaking <-- the distribution might violate a contract or a license, but the reading thereof is really not a thing that US law or Common law think they have a right to control, except in the case of the state classifying secrets. As well, here we have people saying "my song in 1983 that I put out on the radio, I don't want AI listening to that song." Did your license in 1983 prohibit computers from processing your song? Does that mean digital radio can't send it out? Essentially that ship has sailed, full stop, without new legislation.

On my last points, I think you're missing my point, Fan fiction is legal if you're not trying to profit from it. It is almost impossible to perfectly copy a painting, although some people are pretty good at it. I think it's perfectly legal to paint a super close copy of say Starry Night, and sell it as "Starry night by Jason Lotito." In any event, the discourse right now claims its wrong for AI to look at and learn from paintings and photographs.

jasonlotito

> My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like.

Your proposal is moving goal posts.

> Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue".

No, I never said that. Fair Use exists.

> Fan fiction is legal if you're not trying to profit from it.

No, it's not.[1] You can make arguments that it should be, but, no.

[1] https://jipel.law.nyu.edu/is-fanfiction-legal/

> I think you're missing my point

I think you got called out, and you are now trying to reframe your original comment so it comes across as having accounted for the things you were called out on.

You think you know what you are talking about, but you don't. But, you rely on the fact that you think you do to lose the money you do.

datavirtue

"However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to."

Absolute horse shit. I can start a 1-900 answer line and use any reference I want to answer your question.

jasonlotito

> Absolute horse shit.

I agree, what followed was.

> I can start a 1-900 answer line and use any reference I want to answer your question

Yeah, that's not what we are talking about. If you think it was, you should probably do some more research on the topic.

wnevets

> Minnesota woman to pay $220,000 fine for 24 illegally downloaded songs [1]

https://www.theguardian.com/technology/2012/sep/11/minnesota... [1]

gruez

How is this relevant?

>The RIAA accused her of downloading and distributing more than 1,700 music files on file-sharing site KaZaA

Emphasis mine. I think most people would agree that whatever AI companies are doing with training AI models is different than sending verbatim copies to random people on the internet.

breakingcups

Well, Facebook torrented the copyrighted material they used for training, which means they distributed all those files too. With the personal approval of Zuck. What is the difference according to you?

Source: https://futurism.com/the-byte/facebook-trained-ai-pirated-bo...

gruez

Addressed this in another comment: https://news.ycombinator.com/item?id=43966888

wnevets

> I think most people would agree that whatever AI companies are doing with training AI models is different than sending verbatim copies to random people on the internet.

I think most artist who had their works "trained by AI" without compensation would disagree with you.

EMIRELADERO

The question is: would that disagreement have the same basis as the news above? I don't think so. Artists that are against GenAI take that stance out of a perceived abstract unfairness of the situation, where the AI companies aren't copy-pasting the works per-se with each generation, but rather "taking" the "sweat of the brow" of those artists. You can agree or not about this being an actual problem, but that's where the main claim is.

wnevets

> would that disagreement have the same basis as the news above?

Yes. An artist's style can and sometimes is their IP.

EMIRELADERO

No it's not? Style has been ruled pretty specifically to be uncopyrightable. Perhaps you could show me some examples?

wnevets

Waits v. Frito-Lay. The court held that his voice and style were part of his brand and thus protected.

https://www.youtube.com/watch?v=k0H_hcRc0MA

EMIRELADERO

That has nothing to do with IP, it's a personality rights claim. The decision explicitly refuses to involve Copyright, saying that voices (and by proxy the styles) are not copyrightable. What mattered there were specific rights of publicity, not IP.

wnevets

> That has nothing to do with IP, it's a personality rights claim.

The US Supreme Court disagrees, the right of publicity and intellectual property law are explicitly linked.

> The broadcast of a performer’s entire act may undercut the economic value of that performance in a manner analogous to the infringement of a copyright or patent. — Justice White

EMIRELADERO

That's just an analog in an opinion, it's not binding. Also, that's just a new IP term then, but we were talking about copyright, not any abstract form of IP.

Again, show me an example where an artist's style was used for copyright infringement in court. Can you produce even one example?

wnevets

All squares are rectangles, but not all rectangles are squares.

All right of publicity laws are intellectual property laws but not all intellectual property laws are right of publicity laws.

All copyright laws are intellectual property laws but not all intellectual property laws are copyright laws.

Right of publicity laws are intellectual property laws because the right of publicity is intellectual property. I don't know how else to articulate this over the internet, maybe its time to consult an AI?

EMIRELADERO

My point is that the kind of IP at issue in this post and discussion is copyright, not personality rights. If we're talking about the views of the copyright office and how that relates to artists, it's implicit that we're staying in copyright land, because there has never been a case about style-as-IP in visual art.

wnevets

> . If we're talking about the views of the copyright office and how that relates to artists, it's implicit that we're staying in copyright land, because there has never been a case about style-as-IP in visual art.

This article is literally about the copyright office finding AI companies violating copyright law by training their models on copyrighted material. I'm not even sure what you're arguing about anymore.

EMIRELADERO

The Copyright Office is not an authority in this context, it's just an opinion. They did not make any "finding". To a judge they may as well be any other amicus curiae.

My opinion on the matter at hand is this: Artists who complain about GenAI use the hypothetical that you mentioned, where if you can accurately recreate a copyrighted work through specific model usage, then any distribution of the model is a copyright violation. That's why, according to the argument, fair use does not apply.

The real problem with that is that there's a mismatch between the fair use analysis and the actual use at issue. The complaining artists want the fair use inquiry to focus on the damage to the potential market to works in their particular style. That's where the harm is according to them. However, what they use to even get into that stage is the copyright infringement allegation that I described earlier: that the models contain their works on a fixed manner which can be derived without permission.

Not to mention the fact that this position means putting the malicious usage of the models for outright copyright infringement at the output level above the entire class of new works that can be created by its usage. It's effectively saying "because these models can technically be used in an infringing way, it infringes our copyright and any creative potential that these models could help with are insignificant in comparison to that simple fact. Of course, that's not the actual real problem, which is that they output completely new works that compete with our originals, even when they aren't derivatives of, nor substantially similar to, any individual copyrighted work".

Here's a very good article outlining my position in a more articulate way: https://andymasley.substack.com/p/a-defense-of-ai-art

gruez

Studio ghibli[1] might object to both people pirating their films and AI companies allowing their art style to be duplicated, but that's not the same as saying those two things are the same. Sharing a movie rip on bittorrent is obviously different than training an AI model that can reproduce the studio ghbili style, even to diehard AI opponents.

[1] used purely as an example

hulitu

> Sharing a movie rip on bittorrent is obviously different than training an AI model that can reproduce the studio ghbili style, even to diehard AI opponents.

Ok, how about training AI on leaked Windows source code ?

gruez

arguably different from both, because you microsoft could say it's a trade secret. Note I'm not claiming that because it's different, it must be okay, just that it's unfair to compare torrenting with AI training.

TiredOfLife

I think most people who had even basic understanding how AI works would disagree with you.

jofla_net

Who knew alls she needed was to change the tempo, pitch, timbre, add/remove lyrics, add/subtract a few notes, rearrange harmony, put it behind a web portal with a fancy name, claim it had an inspirational muse or assume all mortal beings as being without one in the first place so it doesn't matter, and proceed to make millions off of said process methodically rather than giving it away for free, and she'd be right as rain.

glimshe

You just described pop music making. Change tempo, pitch, add/remove lyrics, etc from prior art.

hulitu

> How is this relevant?

She was training RI (real intelligence). Is now relevant ? Or does she has to be rich and pay some senators to be relevant ?

prvc

The released draft report seems merely to be a litany of copyright holder complaints repeated verbatim, with little depth of reasoning to support the conclusions it makes.

bgwalter

The required reasoning is not very deep though: If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.

If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.

Only large corporations get away with it.

scraptor

Plagiarism is not an issue of copyright law, it's an entirely separate system of rules maintained by academia. The US Copyright Office has no business having opinions about it. If a AI^W human reads 100 papers and then churns out a new one this is usually called research.

palmotea

> Plagiarism is not an issue of copyright law, it's an entirely separate system of rules maintained by academia. The US Copyright Office has no business having opinions about it. If a AI^W human reads 100 papers and then churns out a new one this is usually called research.

If you draw a Venn Diagram of plagiarism and copyright violations, there's a big intersection. For example: if I take your paper, scratch off your name, make some minor tweaks, and submit it; I'm guilty of both plagiarism and copyright violation.

dfxm12

Please argue in good faith. A new research paper is obviously materially different from "rearranging that text to create a marginally new text".

int_19h

"Rearranging text" is not what modern LLMs do though, unless you specifically ask them to.

dfxm12

I didn't make this claim. Feel free to bring a cogent argument to a commenter who did.

gruez

>I didn't make this claim

???

Did you not literally comment the following?

>A new research paper is obviously materially different from "rearranging that text to create a marginally new text".

What did you mean by that, if that's not your claim?

dfxm12

I made that comment, but the bit in quotes is not my claim. I was quoting a grandparent post. If you read from the top, the quotation marks and general flow of the thread should make this clear.

shkkmo

The comment is responding to this line:

> If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.

That is a specific claim that is being directly addressed and pretty clearly qualifies as "good faith".

biophysboy

Having actually done research and published scientific papers, the key limitation is experimentation. Review papers are useful, and AI is useful, but creating new knowledge is more useful. I haven't had much luck using LLMs to extrapolate well beyond their knowledge domain.

scraptor

I certainly don't see much value in AI generated papers myself, I just object to the claim that the mere act of reading a large number of existing papers before writing yours is inherently plagiarism.

ta1243

Only when those papers are referenced

anigbrowl

You were supposed to keep reading past the first sentence, instead of trying to refute the first thing you saw that you found disagreeable. By doing so, you missed the point that plagiarism is substantively different from copyright infringement.

glial

It reminds me of the old joke.

"To steal ideas from one person is plagiarism; to steal from many is research."

slipnslider

Einstein once said "the key to genius is to hide your sources well"

And honestly there is truth to it. Some people (at work, in rea life, wherever) might come off very inteligent but the moment they say "oh I just read that relevant fact on reddit/twitter/news site 5 minutes ago" you realize they are just like you and repeating relevant information that was consumed recently.

wizee

Is reading and memorizing a copyrighted text a breach of copyright? I.e. is creating a copy of the text in your mind a breach of copyright or fair fair use? Is it a breach of copyright if a digital “mind” similarly memorizes copyrighted text? Or is it only a breach of copyright to output and publish that memorized text?

What about loosely memorizing the gist of a copyrighted text. Is that a breach or fair use? What if a machine does something similar?

This falls under a rather murky area of the law that is not well defined.

aeonik

"Filthy eidetics. Their freeloading had become too much for our society to bear. Something had to be done. We found the mutation in their hippocampus and released a new CRISPR-mRNA-based gene suppression system.

Those who were immune were put under the scalpel."

satanfirst

That's not logical. If the savant has perfect recall and makes minor edits they are like a digital copy and aren't really like a human, neural network or by extension any other ML model that isn't over-fitted.

tantalor

If AI really could "churn out a new scientific paper" we would all be ecstatically rejoicing in the dawning of an age of AGI. We are nowhere near that.

viraptor

We're relatively close already https://openreview.net/pdf?id=12T3Nt22av And we don't need anything even close to AGI to achieve that.

JKCalhoun

My understanding — LLMs are nothing at all like a "savant with perfect recall".

More like a speed-reader who retains a schema-level grasp of what they’ve read.

Maxatar

Plagiarism isn't illegal, has nothing to do with the law.

shkkmo

Plagarism is often illegal. If you use plagarism to obtain a financial or other benefit, that can be fraud.

jobigoud

That further drives the point that the issue is not what the AI is doing but what people using it are doing.

shkkmo

> If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.

Any suits would be based on the degree the marginally new copy was fair use. You wouldn't be able to sue the savant for reading and remembering the text.

Using AI to creat marginally new copies of copyrighted work is ALREADY a violation. We don't need a dramatic expansion of copyright law that says that just giving the savant the book to real is a copyright violation.

Plagarism and copyright are two entirely different things. Plagarism is about citations and intellectual integrity. Copyright is a about protecting economic interests, has nothing to to with intellectual integrity, and isn't resolved by citing the original work. In fact most of the contexts where you would be accused of plagarism, would be places like reporting, criticism, education or research goals make fair use arguments much easier.

mr_toad

> If a savant has perfect recall

AI don’t have perfect recall.

nadermx

Not only does it read like a litany[0]. It seems like the copyright holders are not happy with how the meta case is working through court and are trying to sidestep fair use entirely.

https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

mr_toad

Copywriter holders have always hated fair use, and often like to pretend it doesn’t exist.

The average copywrite holder would like you to think that the law only allows use of their works in ways that they specifically permit, i.e. that which is not explicitly permitted is forbidden.

But the law is largely the reverse; it only denies use of copyright works in certain ways. That which is not specifically forbidden is permitted.

ls612

That used to be how it worked. Then the DMCA 1201 provisions arrived and so now anything not expressly permitted by the enumerated exceptions is forbidden. Even talking about how it works is punishable as a felony (upheld by SCOTUS in like 2000 or 2001, they basically said the Copyright clause is in the constitution so the government can censor information on how to defeat DRM).

nadermx
raverbashing

I don't have much spare sympathy here honestly

Workaccount2

I have yet to see someone explain in detail how transformer model training works (showing they understand the technical nitty gritty and the overall architecture of transformers) and also layout a case for why it is clearly a violation of copyright.

You can find lots of people talking about training, and you can find lots (way more) of people talking about AI training being a violation of copyright, but you can't find anyone talking about both.

Edit: Let me just clarify that I am talking about training, not inference (output).

jfengel

I'm not sure I understand your question. It's reasonably clear that transformers get caught reproducing material that they have no right to. The kind of thing that would potentially result in a lawsuit if you did it by hand.

It's less clear whether taking vast amounts of copyrighted material and using it to generate other things rises to the level of copyright violation or not. It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it. (Which probably means that the Web becomes a much smaller place.)

Your comment seems to suggest that writers and artists have absolutely no conceivable stake in products derived from their work, and that it's purely a misunderstanding on their part. But I'm both a computer scientist and an artist and I don't see how you could reach that conclusion. If my work is not relevant then leave it out.

gruez

>I'm not sure I understand your question. It's reasonably clear that transformers get caught reproducing material that they have no right to. The kind of thing that would potentially result in a lawsuit if you did it by hand.

Is that a problem with the tool, or the person using it? A photocopier can copy an entire book verbatim. Should that be illegal? Or is it the problem that the "training" process can produce a model that has the ability to reproduce copyrighted work? If so, what implication does that hold for human learning? Many people can recite an entire song's lyrics from scratch, and reproducing an entire song's lyrics verbatim is probably enough to be considered copyright infringement. Does that mean the process of a human listening to music counts as copyright infringement?

empath75

Let's start with I think a case that everyone agrees with.

If I were to take an image, and compress it or encrypt it, and then show you data file, you would not be able to see the original copyrighted material anywhere in the data.

But if you had the right computer program, you could use it to regenerate the original image flawlessly.

I think most people would easily agree that distributing the encrypted file without permission is still a distribution of a copyrighted work and against the law.

What if you used _lossy_ encryption, and can merely reproduce a poor quality jpeg of the original image? I think still copyright infringement, right?

Would it matter if you distributed it with an executable that only rendered the image non-deterministically? Maybe one out of 10 times? Or if the command to reproduce it was undocumented?

Okay, so now we have AI. We can ignore the algorithm entirely and how it works, because it's not relevant. There is a large amount of data that it operates on, the weights of the model and so on. You _can_ with the correct prompts, sometimes generate a copy of a copyrighted work, to some degree of fidelity or another.

I do not think it is meaningfully different from the simpler example, just with a lot of extra steps.

I think, legally, it's pretty clear that it is illegally distributing copyrighted material without permission. I think calling it an "ai" just needlessly anthropomorphizes everything. It's a computer program that distributes copyrighted work without permission. It doesn't matter if it's the primary purpose or not.

I think probably there needs to be some kind of new law to fix this situation, but under the current law as it exists, it seems to me to be clearly illegal.

Workaccount2

The crux of the debate is a motte and bailey.

AI is capable of reproducing copyright (motte) therefore training on copyright is illegal (bailey).

kevlened

This critique deserves more attention.

Humans are capable of reproducing copyright illegally, but we allow them to train on copyrighted material legally.

Perhaps measures should be taken to prevent illegal reproduction, but if that's impossible, or too onerous, there should be utilitarian considerations.

Then the crux becomes a debate over utility, which often becomes a religious debate.

nickpsecurity

That's just the reproducing part. They also shared copies of scraped web sites, etc without the authors' permission. Unauthorized copying has been widely known to be illegal for a long time. They've already broken the law before the training process even begins.

mr_toad

The model is not compressed data, it’s the compression algorithm. The prompt is compressed data. When you feed it a prompt it produces the uncompressed result (usually with some loss). This is not an analogy by the way, it’s a mathematical equivalence.

You can try and argue that a compression algorithm is some kind of copy of the training data, but that’s an untested legal theory.

halkony

> I do not think it is meaningfully different from the simpler example, just with a lot of extra steps.

Those extra steps are meaningfully different. In your description, a casual observer could compare the two JPEGs and recognize the inferior copy. However, AI has become so advanced that such detection is becoming impossible. It is clearly voodoo.

gruez

>Okay, so now we have AI. We can ignore the algorithm entirely and how it works, because it's not relevant. There is a large amount of data that it operates on, the weights of the model and so on. You _can_ with the correct prompts, sometimes generate a copy of a copyrighted work, to some degree of fidelity or another.

Suppose we accept all of the above. What does that hold for human learning?

empath75

If a human were to reproduce, from memory, a copyrighted work, that would be illegal as well, and multiple people have been sued over it, even doing it unintentionally.

I'm not talking about learning. I'm talking about the complete reproduction of a copyrighted work. It doesn't matter how it happens.

gruez

>I'm not talking about learning. I'm talking about the complete reproduction of a copyrighted work. It doesn't matter how it happens.

In that case I don't think there's anything controversial here? Nobody thinks that if you ask AI to reproduce something verbatim, that you should get a pass because it's AI. All the controversy in this thread seems to be around the training process and whether that breaks copyright laws.

empath75

No -- the controversy is also over whether distributing the weights and software is a copyright violation. I believe that is. The copyrighted material is present in the software in some form, even if the process for regenerating it is quite convoluted.

gruez

It's not as clear-cut as you think. The courts have held that both google thumbnails and google books are fair use, even though they're far closer to verbatim copies than an AI model.

const_cast

The reason those are allowed is because they don't compete with the source material. A thumbnail of a movie is never a substitute for a movie.

LLMs seek to be a for-profit replacement for a variety of paid sources. They say "hey, you can get the same thing as Service X for less money with us!"

That's a problem, regardless of how you go about it. It's probably fine if I watch a movie with my friends, who cares. But distributing it over the internet for free is a different issue.

gruez

>The reason those are allowed is because they don't compete with the source material. A thumbnail of a movie is never a substitute for a movie.

>LLMs seek to be a for-profit replacement for a variety of paid sources. They say "hey, you can get the same thing as Service X for less money with us!"

What's an LLM supposed to be a substitute for? Are people using them to generate entire books or news articles, rather than buying a book or an issue of the new york times? Same goes for movies. No one is substituting marvel movies with sora video.

const_cast

> Are people using them to generate entire books or news articles, rather than buying a book or an issue of the new york times?

Yes.

> No one is substituting marvel movies with sora video.

Yeah because sora kind of sucks. It's great technology, but turns out text is just a little bit easier to generate than 3D videos.

Once sora gets good, you bet your ass they will.

nickpsecurity

Whereas, my report showed they were breaking copyright before the training process. Meta was sued for what I said they'd be sued for, too.

Like Napster et al, their data sets make copies of hundreds of GB of copyrighted works without authors' permission. Ex: The Pile, Commons Crawl, Refined Web, Github Pages. Many copyrighted works on the Internet also have strict terms of use. Some have copyright licenses that say personal use only or non-commercial use.

So, like many prior cases, just posting what isn't yours on HughingFace is already infringement. Copying it from HF to your training cluster is also infringement. It's already illegal until we get laws like Singapore's that allow copyrighted works. Even they have a weakness in the access requirement which might require following terms of use or licenses in the sources.

Only safe routes are public domain, permissive code, and explicit licenses from copyright holders (or those with sub-license permissions).

So, what do you think about the argument that making copies of copyrighted works violates copyright law? That these data sets are themselves copyright violations?

tensor

If I write a math book, and you read it, then tell someone about the math within it. You are not violating copyright. In fact, you could write your OWN math book, or history book, or whatever, and as long as you're not copying my actual text, you are not violating copyright.

However, when an LLM does the same, people now what it to be illegal. It seems pretty straightforward to apply existing copyright law to LLMs in the same way we apply them to humans. If the actual text they generate is substantially similar to a source material that it would constitute a copyright violation if a human were to have done it, then it should be illegal. Otherwise it should not.

edit: and in fact it's not even whether an LLM reproduces text, it's wether someone subsequently publishes that text. The person publishing that text should be the one taking on the legal hit.

rrook

That mathematical formulas already cannot be copyrighted makes this a kinda nonsense example?

mr_toad

> It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it.

The AI companies will likely be arguing that they don’t need a license, so any terms of use in the license are irrelevant.

Workaccount2

My comment is about training models, not model inference.

Most artists can readily violate copyright, that doesn't me we block them from seeing copyright.

gitremote

The judgement was about model inference, not training.

Workaccount2

>"But making commercial use of vast troves of copyrighted works to produce expressive content"

This can only be referring to training, the models themselves are a rounding error in size compared to their training sets.

gitremote

They never said model training is a violation of copyright. The ruling says model training on copyrighted material for analysis and research is NOT copyright infringement, but the commercial use of the resulting model is:

"When a model is deployed for purposes such as analysis or research… the outputs are unlikely to substitute for expressive works used in training. But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."

Workaccount2

The vast trove of copyright work has to refer to training. ChatGPT is likely on the order of 5-10TB in size. (Yes, Terabyte).

There are college kids with bigger "copyright collections" than that...

gitremote

No. The paragraph as a whole refers to the "outputs" of vast troves of copyrighted work.

Disk size is irrelevant. If you lossy-compress a copyrighted bitmap image to small JPEG image and then sell the JPEG image, it's still copyright infringement.

nickpsecurity

I won't say it's irrelevant. How much you use is part of fair use considerations. Their huge collections of copyrighted works make them look worse in legal analyses.

belorn

I would also like to see such explanation, especially one that explains how it differ from regular transformers found in video codecs. Why is a lossy compression a clear violation of copyright, but not a generative AI?

jsiepkes

This isn't about training AI on a book, but AI companies never paying for the book at all. As in: They "downloaded the e-book from a warez site" and then used it for training.

xhkkffbf

This is what's most offensive about it. At least buy one friggin copy.

autobodie

I have yet to see someone explain in detail how writing the same words as another person works (showing they understand the technical nitty gritty and the overall architecture of the human mind) and also layout a case for why it is clearly a violation of copyright. You can find lots of people talking about reading, and you can find lots (way more) of people talking about plagarism being a violation of copyright, but you can't find anyone talking about both.

xhkkffbf

A big part of copyright law is protecting the market for the original creator. Not guaranteeing them anything. Just preventing someone else from coming along and copying someone else's work in a way that hurts their sales.

While AIs don't reproduce things verbatim like pirates, I can see how they really undermine the market, especially for non-fiction books. If people can get the facts without buying the original book, there's much less incentive for the original author to do the hard research and writing.

dmoy

Not a ton of expert programmer + copyright lawyers, but I bet they're out there

You can probably find a good number of expert programmer + patent lawyers. And presumably some of those osmose enough copyright knowledge from their coworkers to give a knowledgeable answer.

At the end of the day though, the intersection of both doesn't matter. The lawyers win, so what really matters is who has the pulse on how the Fed Circuit will rule on this

Also in this specific case from the article, it's irrelevant?

kranke155

It doesn’t matter how they work, it only matters what they do.

moralestapia

Because it's a machine that reproduces other people's work, who are copyrighted. Copyright protects the essence of original work even after its present in or turned into derivative work.

Some try to make the argument of "but that's what humans do and it's allowed", but that's not a real argument as it has not been proven, nor it is easy to prove, that machine learning equates human reasoning. In the absence of evidence, the law assumes NO.

nickpsecurity

I did here with proofs of infingement:

https://gethisword.com/tech/exploringai/

anhner

because people who understand how training works also understand that it's not a violation of copyright...

elif

Intellectual property law is quickly becoming an institution of hegemonic corporate litigation of the spreading of ideas.

If it's illegal to know the entire contents of a book it is arbitrary to what degree you are able to codify that knowing itself into symbols.

If judges are permitted to rule here it is not about reproduction of commercial goods but about control of humanity's collective understanding.

stevetron

It's amazing the amount of bad deeds coming out of the current administration in support of special interests.

throw0101c

See "Copyright and Artificial Intelligence Part 3: Generative AI Training" (PDF):

* https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

KoolKat23

"But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."

I honestly can't see how this directly addresses fair use, it's a odd sweeping statement. It implies inventing something that borrows little from many different copyrighted items is somehow not fair use? If it was one for one yes, but it's not it's basically saying creativity is not fair use. If it's not saying this and refers to competition in the existing market they're making a statement about the public good, not fair use. Basically a matter for legislators and what the purpose of copyright is.

Popeyes

Maybe we should review copyright and the length of it.

evanjrowley

If AI companies in the US are penalized for this, then the effect on copyright holders will only be slowed until foriegn AI companies overtake them. In such cases the legal recourse will be much slower and significantly limited.

mitthrowaway2

Access to copyrighted materials might make for slightly better-trained models the way that access to more powerful GPUs does. But I don't think it will accelerate foundational advances in the underlying technology. If anything, maybe having to compete under tight constraints means AI companies will have to innovate more, rather than merely push scale.

int_19h

The problem is that regardless of any innovations, scale still matters. If you figure out the technique to, say, make a model that is significantly better given N parameters - where N is just large enough to be the perfect fit for the amount of training data that you have access to - then someone else with access to more data will use the same technique to make a model with >N parameters, and it will be better than yours.

mr_toad

But AI is mostly scale and only a little bit innovation. It’s undergraduate maths and a whole lot of computing power and data. Not being able to train on data on the internet would be a significant handicap.

anigbrowl

Gee, perhaps we should not have done this in the first place. 'Foreigners might copy the irresponsible thing we did so we have do more of it' is not the most brilliant argument.

Molitor5901

Representative Joe Morelle (D-NY), wrote the termination was “…surely no coincidence he acted less than a day after she refused to rubber-stamp Elon Musk’s efforts to mine troves of copyrighted works to train AI models.”

Interesting, but everyone is mining copyrighted works to train AI models.

ChrisArchitect
brador

Lifetime for human copyright, 20 years for corporate copyright. That’s the golden zone.

Zambyte

Zero (0) years for corporate copyright, zero (0) years for human copyright is the golden zone in my book.

umanwizard

Why?

Zambyte

It took me a while to be convinced that copyright is strictly a bad idea, but these two articles were very convincing to me.

https://drewdevault.com/2020/08/24/Alice-in-Wonderland.html

https://drewdevault.com/2021/12/23/Sustainable-creativity-po...

SketchySeaBeast

The first article is saying that "Copyright is bad because of corporations", and I can kind of get behind that, especially the very long term copyrights that have lost the intent, but the second article says that artists will be happier without copyright if we just solve capitalism first. I don't know about you, but that reads to me like "If you wish to make an apple pie from scratch you must first invent the universe".

If an artist produces a work they should have the rights to that work. If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.

int_19h

The problem of "how do artists earn enough money to eat?" is legitimate, but I don't think it's a good idea to solve it by making things that inherently don't work like real property to work like it, just so that we can shove them into the same framework. And this is exactly what copyright does - it takes information, which can be copied essentially for free by its very fundamental nature, and tries to make it scarce through legal means solely so that it can be sold as if it were a real good.

There are two reasons why it's a problem. The first reason is that any such abstraction is leaky, and those leaks are ripe for abuse. For example, in case of copyright on information, we made it behave like physical property for the consumers, but not for the producers (who still only need to expend resources to create a single work from scratch, and then duplicate it for free while still selling each copy for $$$). This means that selling information is much more lucrative than selling physical things, which is a big reason why our economy is so distorted towards the former now - just look at what the most profitable corporations on the market do.

The second reason is that it artificially entrenches capitalism by enmeshing large parts of the economy into those mechanics, even if they aren't naturally a good fit. This then gets used as an argument to prop up the whole arrangement - "we can't change this, it would break too much!".

SketchySeaBeast

The end product being inexpensive is a good thing - it means that the producer can sell it well below the cost it took to produce it, otherwise a novel would cost whatever it takes for Stephen King to live for 3 months.

I feel like you're shoving all information under the same label. The most profitable corporations are trading in information that isn't subject to copyright, and it's facts - how you drive, what you eat, where you live. It's newly generated ideas. Maybe it is in how the data is sorted, but they aren't copyrighting that either.

If we're going to overthrow artificial entrenchments of capitalism, I feel like there's better places to start than a lot of copyright. Does it need changes? Absolutely, there's certainly exploitation, but I still don't see "get rid of copyright entirely" as being a good approach. Weirdly, it's one of the places that people are arguing for that. Sometimes the criminal justice system convicts the wrong person, and there should be reform. It's also often criticized as a measure of control for capitalistic oligarchs. Should step one be getting rid of the legal system entirely?

jasonjayr

But in this idealized copyright-free world, those self-publishing companies could just as easily take Penguin's top sellers and reproduce those.

The thing that'd set apart these companies are the services + quality of their work.

SketchySeaBeast

Is not part of the quality of the work the contents of the book? What are these companies putting within the pages? We've taken the greatest and longest part of the effort and made it meaningless.

Zambyte

> If an artist produces a work they should have the rights to that work.

That would indeed be nice, but as the article says, that's usually not the case. The rights holder and the author are almost never the same entity in commercial artistic endeavors. I know I'm not the rights holder for my erroneously-considered-art work (software).

> If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.

Why? You created influential art and its influence was spread. Is that not the point of (good) art?

noirscape

It may surprise you, but artists need to buy things like food, water and pay for their basic necessities like electricity, rent and taxes. Otherwise they die or go bankrupt.

In our current society, that means they need some sort of means to make money from their work. Copyright, at least in theory, exists to incentivize the creation of art by protecting an artists ability to monetize it.

If you abolish copyright today, under our current economic framework, what will happen is that people create less art because it goes from a (semi-)viable career to just being completely worthless to pursue. It's simply not a feasible option unless you fundamentally restructure society (which is a different argument entirely.)

Zambyte

Amazing. Have you considered reading the articles I linked? They aren't even that long.

noirscape

I did and they aren't convincing. The first is an argument of how a popular interpretation of a work still under copyright can subsume the fact the original work is in the public domain, using Alice in Wonderland as an example. (I also happen to think it's a particularly terrible example - if you want to make this argument, The Little Mermaid is by far the stronger version of this argument.) It also misidentifies Disney as the copyright boogeyman, which is a pretty common categorical error. (Disney had very little to do with the length of US copyright. The length of copyright is pretty much entirely the product of geopolitics and international agreements, not Disney.) Its an interesting argument, but not one I find particularly convincing for abolishing copyright, at most shortening the length of it. (Which I do believe is needed.)

The second one is the "just solve capitalism and we can abolish copyright entirely" argument which is... a total non-starter. Yes, in an idealized utopia, we don't need capitalism or copyright and people can do things just because they want to and society provides for the artist just because humans all value art just that much. It's a fun utopic ideal, but there's many steps between the current state of the world and "you can abolish the idea of copyright", and we aren't even close to that state yet.

SketchySeaBeast

> The rights holder and the author are almost never the same entity in commercial artistic endeavors.

There's definitely problems with corporatization of ownership of these things, I won't disagree.

> Why? You created influential art and its influence was spread. Is that not the point of (good) art?

Why do we expect artists to be selfless? Do you think Stephen King is still writing only because he loves the art? You don't simply make software because you love it, right? Should people not be able to make money off their effort?

Zambyte

> You don't simply make software because you love it, right?

I can't speak for Stephen but I absolutely do. I program for fun all the time.

> Should people not be able to make money off their effort?

Is anyone arguing otherwise?

SketchySeaBeast

Removing copyright is removing a lot of the protections that enable users to get paid for their efforts. How would a novelist make money, and why would someone pay them, if their work is free to be copied at will?

Zambyte

> How would a novelist make money

Maybe selling books? Maybe other jobs? The same way that they made money for thousands of years before copyright, really. Books and other arts did exist before copyright!

> and why would someone pay them, if their work is free to be copied at will?

I don't think it's really a matter of if people will pay them. If their art is good, of course people will pay them. People feel good about paying for an original piece of art.

The question is really more about if people will be able to get obscenely rich over being the original creator of some piece of art, to which the answer is it would indeed be less likely.

SketchySeaBeast

> The same way that they made money for thousands of years before copyright, really.

We didn't have modern novelists a thousand years ago. We didn't have mass production until ~500 years ago, and copyright came in in the 1700's. We didn't have mass produced pulp fiction like we do today until the 20th century. There is little copyright-less historical precedent to refer to here, even if we carve out the few hundred years between the printing press and copyright, it's not as though everyone was mass consuming novels, the literacy rate was abysmal. I wonder what artist yearns for the 1650s.

> If their art is good, of course people will pay them.

You say this as if it were a fact, but that's not axiomatic. Once the first copy is in the wild it's fair game for anyone to copy it as they will. Who is paying them? Should the artists return to the days of needing a wealthy patron? Is patreon the answer to all of our problems?

> Maybe selling books?

But how? To who? A publishing house isn't going to pick them up, knowing that any other publishing house can start selling the same book the minute it shows to be popular, and if you're self publishing and you're starting to make good numbers then the publishing houses can eat you alive.

> The question is really more about if people will be able to get obscenely rich over being the original creator of some piece of art, to which the answer is it would indeed be less likely.

No, the question is if ordinary people could make a living off their novels without copyright. It's very hard today, but not impossible. Without copyright it wouldn't be.

dmonitor

You need some mechanism in place to prevent any joe schmoe from spinning up FreeSteam and rehosting the whole thing.

pitaj

There can be many incentives for people to use official sources: early access, easy updates, live events, etc

dmonitor

There's no reason FreeSteam can't also do that, though. There's no copyright, so just have an extension of the steamapp that changes it to point to your server when downloading games / checking ownership. Piracy stops being a service issue when pirates are allowed to make nice services.

Zambyte

"Early access" doesn't work in this context, but yes for the other means.

zelphirkalt

Just to challenge that idea: Why?

dmonitor

People would use that service instead of Steam, publishers would add annoying DRM to mitigate lost sales, etc etc.

The current illegality of the piracy website prevents them from offering a service as nice as Steam. It has to be a sketchy torrent hub that changes URLs every few months. If it was as easy as changing the url to freesteampowered.com or installing an extension inside the steam launcher, the whole "piracy is a service issue" argument loses all relevance. The industry would become unsustainable without DRM (which would be technically legal to crack, but also more incentivized to make harder to crack).

Zambyte

> publishers would add annoying DRM to mitigate lost sales, etc etc.

People would just delete the malware (DRM) out of the source code that is no longer restricted by copyright.

If your argument is that copyright is good because it discourages DRM, I think you have a very evidently weak argument.

dmonitor

Copyright does discourage DRM. Even the most egregious DRM these days can be bypassed with minimal effort and is mostly just a nuisance. Take away government enforcement of copyright and how profitable your digital product is will be directly tied to how advanced you are in the DRM arms race.

Steam is the classic example of how this is effective. You compete with pirates by offering what they can't: a reliable, convenient service. DRM becomes more of a hindrance than a benefit in this situation.

Allowing pirates to offer reliable convenient pirate websites that are "so easy a normie can do it" would be a disaster for all the creative industries. You would need to radically change the rest of society to prevent a total collapse of people making money off art.

whamlastxmas

Because the concept of owning an idea is really gross. Copyright means I can’t write about whatever I want in my own home even if I never distribute it or no one ever sees it. I’m breaking the law by privately writing Harry Potter fanfic in my journal or whatever. Copyright is supposed to be about encouraging intangibles, and the reality is that it only massively stifles it

redwall_hp

Whole genres of music are based entirely on sampling, and they got screwed by copyright law as it evolved over the 90s and 2000s. Now only people with a sufficiently sized business backing them can truly participate, or they're stuck licensing things on Splice.

And that's not even touching the spurious lawsuits about musical similarity. That's what musicians call a genre...

It makes some sense for a very short term literal right to reproduction of a singular work, but any time the concept of derivative works comes into play, it's just a bizarrely dystopian suppression of art, under the supposition that art is commercial activity rather than an innate part of humanity.

otterley

Copyright doesn’t protect ideas. It protects expression of those ideas.

Consider how many books exist on how to care for trees. Each one of them has similar ideas, but the way those ideas are expressed differ. Copyright protects the content of the book; it doesn’t protect the ideas of how to care for trees.

93po

Disney has a copyright over Moana. I would argue Moana is an idea in the sense that most people think of as ideas. Moana isn't tangle, it's not a physical good. It's not a plate on my table. It only exists in our heads. If I made a Moana comic book, with an entirely original storyline and original art and it was all drawn in my own style and not using 3D assets similar to their movies, that is violating copyright. Moana is an idea and there are a million ways to express the established character Moana, and Moana itself is an idea built on a million things that Disney doesn't have any rights to - history, culture, tropes, etc.

I understand what you're saying but the way you're framing it isn't what I really have a problem with. I still don't agree with the idea that I can't make my own physical copies of Harry Potters books, identical word for word. I think people can choose to buy the physical books from the original publisher because they want to support them or like the idea that it's the "true" physical copy. And I'm going to push back on that a million times less than the concept of things like Moana comic books. But still, it's infringing copyright for me to make Moana comic books in my own home, in private, and never showing them to anyone. And that's ridiculous.

otterley

> [Moana] only exists in our heads.

Moana and Moana 2 are both animated movies that have already been made. They're not just figures of one's imagination.

> If I made a Moana comic book, with an entirely original storyline and original art and it was all drawn in my own style and not using 3D assets similar to their movies, that is violating copyright

It might be, or it might not. Copyright protects the creation of derivative works (17 USC 101, 17 USC 103, 17 USC 106), but it's the copyright holder's burden to persuade the court that the allegedly infringing work with the character Moana in it is derivative of their protected work.

Ask yourself the question: what is the value of Moana to you in this hypothetical? What if you used a different name for the character and the character had a different backstory and personality?

> I still don't agree with the idea that I can't make my own physical copies of Harry Potters books

You might think differently if you had sunk thousands of hours into creating a new novel and creative work was your primary form of income.

> But still, it's infringing copyright for me to make Moana comic books in my own home, in private, and never showing them to anyone.

It seems unlikely that Disney is would go after you for that. Kids do it all the time.

umanwizard

In the world you’re proposing, you would also not be able to make word-for-word copies of Harry Potter books, because Harry Potter wouldn’t exist.

93po

why not? people write fiction all the time and put it on the internet for free. in fact, i'd say there's significantly more unpaid fiction writing in the world than paid.

otterley

People don't copy amateur fiction they can find for free. They copy (or rather, make derivative works of) successful commercial content because it is successful and well known.

umanwizard

Yes, and most of it is awful, whereas Joanne Rowling is talented.

It’s very unlikely that she would (or even could) have devoted herself to writing fiction in her free time as a passion project without hope of monetary reward and without any way to live from her writing for the ten years it took to finish the Potter series.

And even if she had somehow managed, you’d never hear about it, because without publishers to act as gatekeepers it’d have been lost in the mountains of fanfic and whatever other slop amateur writers upload to the internet.

flats

I don’t believe this is true? I’m pretty sure that you’re prohibited from making money from that fan fiction, not from writing it at all. So I don’t understand the claim that copyright “massively stifles” creativity. There are of course examples of people not being able to make money on specific “ideas” because of copyright laws, but that doesn’t seem to me to be “massively stifling” creativity itself, especially given that it also protects and supports many people generating these ideas. And if we got rid of copyright law, wouldn’t we be in that exact place, where people wouldn’t be allowed to make money off of creative endeavors?

I mean, owning an idea is kinda gross, I agree. I also personally think that owning land is kinda gross. But we live in a capitalist society right now. If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs. Sam Altman, Elon Musk, and all the other tech CEOs will benefit in place of all of the artists I love and admire.

That, to me, sucks.

Zambyte

> And if we got rid of copyright law, wouldn’t we be in that exact place, where people wouldn’t be allowed to make money off of creative endeavors?

This is addressed in the second article I linked.

SketchySeaBeast

Is it though? All I see is hand-waving.

93po

I will also add: there are tons of examples of companies taking down not for profit fanction or fan creation of stuff. Nintendo is very aggressive about this. The publisher of Harry Potter has also aggressively taken down not for profit fanfiction.

> If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs.

It's interesting how much parallel there is here to the idea that company owners reap the rewards of their employee's labor when doing no additional work themselves. The fruits of labors should go to the individuals who labor, I 100% agree.

93po

Copyright isn't about distribution, it's about creation. In reality the chances of getting in trouble is basically zero if you don't distribute it - who would know? But technically any creation, even in private, is violating copyright. Doesn't matter if you make money or put it on the internet.

There is fair use, but fair is an affirmative defense to infringing copyright. By claiming fair use you are simultaneously admitting infringement. The idea that you have to defend your own private expression of ideas based on other ideas is still wrong in my view.

Zambyte

> Copyright isn't about distribution, it's about creation

This is exactly wrong. You can copy all of Harry Potter into your journal as many times as you want legally (creating copies) so long as you do not distribute it.

whamlastxmas

https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...

"copyright law assigns a set of exclusive rights to authors: to make and sell copies of their works, to create derivative works, and to perform or display their works publicly"

"The owner of a copyright has the exclusive right to do and authorize others to do the following: To reproduce the work in copies or phonorecords;To prepare derivative works based upon the work;"

"Commonly, this involves someone creating or distributing"

https://www.copyright.gov/what-is-copyright/

"U.S. copyright law provides copyright owners with the following exclusive rights: Reproduce the work in copies or phonorecords. Prepare derivative works based upon the work."

https://internationaloffice.berkeley.edu/students/intellectu...

"Copyright infringement occurs when a work is reproduced, distributed, displayed, performed or altered without the creator’s permission."

There are endless legitimate sources for this. Copyright protects many things, not just distribution. It very clearly disallows the creation and production of copyrighted works.

achierius

Well what we're getting is lifetime for corporate, and zero (0) for human. Hope you're happy.

Zambyte

I'm not, because that's not what I asked for.

GuB-42

The issue with lifetime (vs something like lifetime + X years) is that of inheritance.

Assuming you agree with the idea of inheritance, which is another topic, then it is unfair to deny inheritance of intellectual property. For example if your father has built a house, it will be yours when he dies, it won't become a public house. So why would a book your father wrote just before he died become public domain the moment he dies. It is unfair to those doing who are doing intellectual work, especially older people.

If you want short copyright, is would make more sense to make it 20 years, human or corporate, like patents.

dghlsakjg

Then make it the greater of 20 years or the lifetime for humans.

Comparing intellectual property to real or physical property makes no sense. Intellectual property is different because it is non exclusive. If you are living in your father’s house, no one else can be living there. If I am reading your fathers book, that has nothing to do with whether anyone else can read the book.

GuB-42

That intellectual property is non exclusive doesn't change the inheritance problem.

If you consider it right to get value from the work of your family, and you consider that intellectual work (such as writing a book) to be valuable, then as an inheritor, you should get value from it. And since the way we give value to intellectual work is though copyright, then inheritors should inherit copyright.

If you think that copyright should not exceed lifetime, then the logical consequences would be one of:

- inheritance should be abolished

- intellectual work is less valuable than other forms of work

- intellectual property / copyright is not how intellectual work should be rewarded

There are arguments for abolishing inheritance, it is after all one of the greatest sources of inequality. Essentially, it means 100% inheritance tax in addition to all the work going into the public domain. Problematic in practice.

For the value of intellectual work, well, hard to argue against it on Hacker News without being a massive hypocrite.

And there are alternatives to copyright (i.e. artificial scarcity) for compensating intellectual work like there are alternatives to capitalism. Unfortunately, it often turns out poorly in practice. One suggestion is to have some kind of tax that is fairly distributed between authors in exchange for having their work in the public domain. Problem is: define "fairly".

Note that I am not saying that copyright should last long, you can make copyright 20 years, humans or corporate, inheritable. Simple, gets in the public domain sooner, fairer to older authors, already works for patents. Why insist on "lifetime"?

dghlsakjg

Agreed. I think it should be the greater of 20 years or the lifetime of the original authors.

Ekaros

20 or 25 years from publication. Enough for anyone inhering it to exploit if they are children. No need to have more. It is not like house builder keeps getting paid after house has been build.

MyOutfitIsVague

The issue with that is that inheritance only makes sense for tangible, scarce resources. Having copyright isn't easily analogous to ownership of a physical object, because an object is something you have and if somebody else has it, you can not have and use it.

Copyright is about control. If you know a song and you sing it to yourself, somebody overhears it and starts humming it, they have not deprived you of the ability to still know and sing that song. You can make economic arguments, of deprived profit and financial incentives, and that's fine; I'm not arguing against copyright here (I am not a fan of copyright, it's just not my point at the moment), I'm just saying that inheritance does not naturally apply to copyright, because data and ideas are not scarce, finite goods. They are goods that feasibly everybody in the world can inherit rapidly without lessening the amount that any individual person gets.

If real goods could be freely and easily copied the way data can, we might be having some very interesting debates about the logic and morality of inheriting your parents' house and depriving other people of having a copy.

jagermo

man, if we just had some napster fanboy in the oval office back then. Lot's of laws would not exist.

aurizon

Ned Ludd heirs at last win - High Court rules the spinning Jenny IS ILLEGAL!. All machine made cloth and machines must be destroyed. This is the end of the road for all mechanical ways to make cloth. Get naked, boys 'n girls = this will be fun!

renewiltord

I wonder when general internet sentiment moved from pro-piracy to IP maximalism. Fascinating shift.

wvenable

There's now an entire generation now that believes "Intellectual Property" is a real thing.

Instead of the understanding that copyrights and patents are temporary state-granted monopolies meant to benefit society they are instead framed as real perpetual property rights. This framing fuels support for draconian laws and obscures the real purpose of these laws: to promote innovation and knowledge sharing and not to create eternal corporate fiefdoms.

ronsor

AI has made people lose their minds and principles. It's fascinating to observe.

In the meantime, I will continue to dislike copyright regardless of the parties involved.

LexiMax

I think most people have lost their minds over the hypocrisy. For decades people have been raked over the coals for piracy, but now suddenly piracy is okay if your name is Facebook and you're building an AI model.

Either force AI companies to compensate the artists they're being "inspired" by, or let people torrent a copywashed Toy Story 5.

mncharity

I considered adding a reminder above that email used to be a copyright violation. Implied license not yet established; every copy between disk and memory a violation; let alone forwarding; the occasional email footer "LegalisticCo grants you a licence to use this email under the following terms ...". Oh well. And then almost all sharing of images.

lavezzi

Very recently, because historically the majority of people engaging in it aren't looking to profit from piracy.

The general public has been lectured for decades about how piracy is morally wrong, but as soon as startups and corporations are in it for profit, everybody looks away?

Ekaros

Not having massively overfunded corporations exploit artists is not IP minimalism. Private persons stealing something is seen as tiny evil. But big corporation exploiting everyone else is entirely different thing.

CaptainFever

IP minimalism is IP minimalism, regardless of who owns the IP.

anigbrowl

It didn't, you're falsely conflating two quite different things to give cover to a different set of large corporations.

Ukv

No hard data to back this up, but anecdotally I'd place the AI/copyright sentiment shift around mid-late 2022. DALL-E 2 experimentation (e.g: [0]) in early-mid 2022 seemed to just about sneak by unaffected, receiving similar positive/curious reception to previous trends (TalkToTransformer, ArtBreeder, GPT-3/AI Dungeon, etc.), but then Stable Diffusion bore the full brunt of "machine learning is theft" arguments.

[0]: https://x.com/xkcd/status/1552279517477183488

renewiltord

Hmm, "when it got good" then. I think what you're saying makes sense to me.

bongodongobob

Right around the same time struggling artists thought paying $40 for global distribution via Spotify and not getting paid anything for their 100 streams a month was being "ripped off". And I think that is related to influencer culture. Everyone thinks they deserve to be famous and needs someone to blame for their below average art not making them rich.

archagon

It's not that complicated: little guy taking stuff from big corp (then) vs. big corp taking stuff from little guy (now). Similar to the recent debates over permissive open source licenses and corporate exploitation.

As for the zeitgeist, I'm not sure anything has materially changed. Recently, creators have been very upset over Silicon Valley AI companies ingesting their output. Is this really reflective of "general internet sentiment"? Would those same people have supported abolition of copyright in the past? I doubt it.

bgwalter

That is fairly easy to answer: When the infringement shifted from small people taking from Walt Disney to Silicon Valley taking from everyone, including open source authors and small YouTube channels.

I find the shift of some right wing politicians and companies from "TPB and megaupload are criminals and its owners must be extradited from foreign countries!" to "Information wants to be free!" much more illuminating.

vharuck

Personally, I'd support an alternative to copyright for letting creators earn living expenses while working or in reward for good works. But it's a terrible thing to offer them the copyright system and then ignore it to use the works they hoped could earn money. And to further use those works to make something that will replace a lot of creative positions they've relied on because copyright only pays off after the work's been done.

Maybe the government should set up a fund to pay all the copyright holders whose works were used to train the AI models. And if it's a pain to track down the rights holders, I'll play a tiny violin.

throwaway1854

Apples and oranges - and also I don't know if anyone is really supporting IP maximalism.

IP maximalism is requiring DRM tech in every computer and media-capable device that won't play anything without checking into a central server and also making it illegal to reverse or break that DRM. IP maximalism is extending the current bonkers time interval of copyright (over 100 years) to forever. If AI concerns manage to get this down to a reasonable, modern timeframe it'll be awesome.

Record companies in the 90s tied the noose around their own necks, which is just as well because they're very useless now except for supporting geriatric bands. They should have started selling mp3s for 99 cents in 1997 and maybe they would have made a couple of dollars before their slide into irrelevance.

The specific thing people don't want, which a few weirdos keep pushing, is AI-generated stuff passed off as new creative material. It's fine for fun and games, but no one wants a streaming service of AI-generated music, even if you can't tell it's AI generated. And the minute you think you have that cracked - that an AI can create music/art as good as a human and that humans can't tell, the humans will start making bad music/art in rebellion, and it'll be the cool new thing, and the armies of 10Kw GPUs will be wasting their energy on stuff an 1Mhz 8-bit machine could do in the 80s.

thomastjeffery

> The remarks about Musk may refer to the billionaire’s recent endorsement of Twitter founder Jack Dorsey’s desire to “Delete all IP law"...

Yes please.

Delete it for everyone, not just these ridiculous autocrats. It's only helping them in the first place!

hatenberg

Big Tech: We shouldn’t pay, each individual piece of content is worth basically nothing.

Also Big Tech: We added 300.000.000 users worth of GTM because we trained in the 10 specific anime movies of Studio Ghibli and are selling their style.

Aerroon

The funny thing is that style is not copyrightable.

_trampeltier

Exept it's a rectangle with 4 rounded corner.

mr_toad

Patents are not the same thing as copyright.

nickpsecurity

"Pretraining data us worth basically nothing."

(Raises $10 billion based on estimated worth of the resulting models.)

"We can't share the GPT4 prettaining data or weights because they're trade secrets that generate over a billion in revenue for us."

I'll believe they're worth nothing when (a) nobody is buying AI models or (b) AI companies stop using the copyrighted works to train models they sell. So far, it looks like they're lying about the worth of the training data.

achrono

If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

And for the future, here's one heuristic: if there is a profound violation of the law anywhere that (relatively speaking) is ignored or severely downplayed, it is likely that interested parties have arrived at an understanding. Or in other words, a conspiracy.

[1] There are tons of legal arguments on both sides, but for me it is enough to ask: if this is not illegal and is totally fair use (maybe even because, oh no look at what China's doing, etc.), why did they have to resort to & foster piracy in order to obtain this?

NitpickLawyer

> If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.

European here, but why do you think this is so clear cut? There are other jurisdictions where training on copyrighted data has already been allowed by law/caselaw (Germany and Japan). Why do you need a conspiracy in the US?

AFAICT the US copyright law deals with direct reproductions of a copyrighted piece of content (and also carves out some leeway with direct reproduction, like fair use). I think we can all agree by now that LLMs don't fully reproduce "letter perfect" content, right? What then is the "spirit" of the law that you think was broken here? Isn't this the definition of "transformative work"?

Of note is also the other big case involving books - the one where google was allowed to process mountains of books, they were sued and allowed to continue. How is scanning & indexing tons of books different than scanning & "training" an LLM?

AlotOfReading

Google asserted fair use in that case, which is an admission of (allowed) copyright infringement. They didn't turn books into a "new form", they provided limited excerpts that couldn't replace the original usage and directly incentivized purchases through normal sales channels while also providing new functionality.

Contrast that with AI companies:

They don't necessarily want to assert fair use, the results aren't necessarily publicly accessible, the work used isn't cited, users aren't directed to typical sales channels, and many common usages do meaningfully reduce the market for the original content (e.g. AI summaries for paywalled pages).

It's not obvious to me as a non-lawyer that these situations are analogous, even if there's some superficial similarity.

achrono

Let me answer those questions with actual evidence.

To begin with, this very case of Perlmutter getting fired after her office's report is interesting enough, but let's keep it aside. [0]

First, plenty of lobbying has been afoot, pushing DC to allow training on this data to continue. No intention to stop or change course. [1]

Next, when regulatory attempts were in fact made to act against this open theft, those proposed rules were conveniently watered down by Google, Microsoft, Meta, OpenAI and the US government lobbying against the copyright & other provisions. [2]

If you still think, "so what? maybe by strict legal interpretation it's still fair use" -- then explain why OpenAI is selectively signing deals with the likes of Conde Nast if they truly believe this to be the case. [3]

Lastly, when did you last see any US entity or person face no punitive action whatsoever despite illegally downloading (and uploading) millions of books & journal articles; do you remember Aaron Swartz? [4]

You might not agree with my assessment of 'conspiracy', but are you denying there is even an alignment of incentives contrary to the spirit of the law?

[0] https://www.reuters.com/legal/government/trump-fires-head-us...

[1] https://techcrunch.com/2025/03/13/openai-calls-for-u-s-gover...

[2] https://www.euronews.com/next/2025/04/30/big-tech-watered-do...

[3] https://www.reuters.com/technology/openai-signs-deal-with-co...

[4] https://cybernews.com/tech/meta-leeched-82-terabytes-of-pira...

whycome

What’s your reading of the spirit of the law?

internet_rand0

copyright is long overdue for a total rework

the internet demands it.

the people demand free mega upload for everybody, why? because we can (we seem to NOT want to, but that should be a politically solvable problem)

tempeler

I think, A new chapter is about to begin. It seems that in the future, many IPs will become democratized — in other words, they will become public assets.

SketchySeaBeast

"Democratized" as in large corporations are free to ingest the IPs and then reinterpret and censor them before they feed their version back to us, with us never having free access to the original source?

rurban

"Democratized" in the meaning of fascistoized, right? Laws do not apply to the cartels, military, executive and secret services.

tempeler

To defend yourself against those who don't play by the rules. it has to be democratized. The world isn’t a fair place.

AlexandrB

Public assets as long as you pay your monthly ChatGPT bill.

kmeisthax

They aren't going to legalize, say, publishing Mario fangames or whatever. They're just going to make copyright allow AI training, because AI is what the owner class wants. That's not democratizing IP, that's just prejudicial (dis)enforcement against the creative class.

jobigoud

Millions of pages of fan fic based on existing IP have been written. There is a point where it doesn't really make sense trying to go after individuals especially if they make no money out of it.

If we enter a world where anyone can create a new Mario game and there are thousands of them released on the public web it would be impossible for the rights holders to do anything, and it would be a PR bad move to go after individuals doing it for fun.

int_19h

Imagine a world where all models capable of creating a new Mario game from scratch are only available through cloud providers which must implement mandatory filters such that asking "write me a Mario clone" (or anything functionally equivalent) gets you a lecture on don't-copy-that-floppy.

Bad PR? The entire copyright enforcement industry has had bad PR pretty much since easy copying enabled grassroots piracy - i.e. since before computers even. It never stopped them. What are you going to do about it? Vote? But all the mainstream parties are onboard with the copyright lobby.

kmeisthax

Yes, but none of that has anything to do with AI. Or democratization.

The fact that copyright law is easy to violate and hard to enforce doesn't stop Nintendo from burning millions of dollars on legal fees to engage in life-ruining enforcement actions against randos making fangames.

"Democratization" with respect to copyright law would be changing the law to put Mario in the public domain, either by:

- Reducing term lengths to make Mario literally public domain. It's unclear whether or not such an act would survive the Takings Clause of the US Constitution. Perhaps you could get around that by just saying you can't enforce copyrights older than 20 years even though they nominally exist. Which brings us to...

- Adding legal exceptions to copyright to protect fans making fan games. Unlikely, since in the US we have common law, which means our exceptions have to be legislated from the judicial bench, and judges are extremely leery of 'fair use' arguments that basically say 'it is very inconvenient for me to get permission to use the thing'.

- Creating some kind of social copyright system that "just handles" royalty payments. This is probably the most literal interpretation of 'democratize'. I know of few extant systems for this, though - like, technically ASCAP is this, but NOBODY would ever hold up ASCAP as an example of how to do licensing right. Furthermore without legal backing, Nintendo can just hold out and retain traditional "my way or the highway" licensing rights.

- Outright abolishing copyright and telling artists to fend for themselves. This is the kind of solution that would herald either a total system collapse or extreme authoritarianism. It's like the local furniture guy selling sofas at 99% off because the Mafia is liquidating his gambling debts. Sure, I like free shit, but I also know that furniture guy is getting a pair of cement shoes tonight.

None of these are what AI companies talk about. Adding an exception just for AI training isn't democratizing IP, because you can't democratize AI training. AI is hideously memory-hungry and the accelerators you need to make it work are also expensive. I'm not even factoring in the power budget. They want to replace IP with something worse. The world they want is one where there are three to five foundation models, all owned and controlled by huge tech megacorps, and anyone who doesn't agree with them gets cut off.

anigbrowl

I invite you to imagine the howling that will ensue the moment some politician offers legislation requiring commercial LLM operators to publish their weights and training data.

numpad0

Oh yeah. It's the Cultural Revolution all over again.

ahmeni

If only there was some sort of term for fake democracy where you're actually just there to plunder resources.

tempeler

This idea does not belong to me. If lawmakers and regulators allow companies to use these IPs, how can you keep ordinary people away from them? Something created by AI is regarded as if it was created from scratch by human hands. that's reality.

Hoasi

“We used publicly available data” worked good enough for now. And yet OpenAI just accused China of stealing its content.

sophrocyne

The USCO report was flawed, biased, and hypocritical. A pre-publication of this sort is also extremely unusual.

https://chatgptiseatingtheworld.com/2025/05/12/opinion-why-t...

ceejayoz

What in https://chatgptiseatingtheworld.com/about/ says "ah, yes, trustworthy unbiased analysis" to you? Why should I trust this source's opinion?

Pre-publication reports aren't unusual. https://www.federalregister.gov/public-inspection/current

https://www.federalregister.gov/reader-aids/using-federalreg...

> The Federal Register Act requires that the Office of the Federal Register (we) file documents for public inspection at our office in Washington, DC at least one business day before publication in the Federal Register.

andy99

Two different issues that while apparently related need separate consideration. Re the copyright finding, does the US copyright office have standing to make such a determination? Presumably not since various claims about AI and copyright are before the courts. Why did they write this finding?

kklisura

> The Office is releasing this pre-publication version of Part 3 in response to congressional inquiries and expressions of interest from stakeholders

They acknowledge the issue is before courts:

> These issues are the subject of intense debate. Dozens of lawsuits are pending in the United States, focusing on the application of copyright’s fair use doctrine. Legislators around the world have proposed or enacted laws regarding the use of copyrighted works in AI training, whether to remove barriers or impose restrictions

Why did they write the finding: I assume it's because it's their responsibility:

> Pursuant to the Register of Copyrights’ statutory responsibility to “[c]onduct studies” and “[a]dvise Congress on national and international issues relating to copyright,”...

All excerpts are from https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

_heimdall

Given that the issue at hand is related to potential misuse of copyright protected material, it seems totally reasonable for the copyright office to investigate and potentially act to reconcile the issue.

Sure the courts may find its out of their jurisdiction, but they should act as they see fit and let the courts settle that later.

bgwalter

The US Supreme court has complained on multiple occasions that it is forced to do the work of the legislative.

Why could a copyright office not advise the congress/senate to enact a law that forbids copyrighted material to be used in AI training? This is literally the politicians' job.

9283409232

Part of Congresses power is to defer that agencies it has created. Such as the US Copyright Office.

Made by @calebRussel