Whether or not data was openly accessible doesn’t really matter […] ChatGPT also isn’t just reading the data at its source, it’s copying it into its training dataset, and that copying is unlicensed.
Actually, the act of copying a work covered by copyright is not itself illegal. If I check out a book from a library and copy a passage (or the whole book!) for rereading myself or some other use that is limited strictly to myself, that’s actually legal. If I turn around and share that passage with a friend in a way that’s not covered under fair use, that’s illegal. It’s the act of distributing the copy that’s illegal.
That’s why whether the AI model is publicly accessible does matter. A company is considered a “person” under copyright law. So OpenAI can scrape all the copyrighted works off the internet it wants, as long as it didn’t break laws to gain access to them. (In other words, articles freely available on CNN’s website are free to be copied (but not distributed), but if you circumvent the New York Times’ paywall to get articles you didn’t pay for, then that’s not legal access.) OpenAI then encodes those copyrighted works in its models’ weights. If it provides open access to those models, and people execute these attacks to recover pristine copies of copyrighted works, that’s illegal distribution. If it keeps access only for employees, and they execute attacks that recover pristine copies of copyrighted works, that’s keeping the copies within the use of the “person” (company), so it is not illegal. If they let their employees take the copyrighted works home for non-work use (or to use the AI model for non-work use and recover the pristine copies), that’s illegal distribution.
I think a critical detail getting overlooked in the broader discussion of the changes brought by LLM AI is not quality but quantity. What I mean is, sure, AI isn’t going to replace any one complete worker. There are vanishingly few jobs AI can 100% take over. But it can do 80% of a few jobs, 50% of more jobs, and 20% of a lot of jobs.
So at the company level, where you had to hire 100 workers to do something, now you only need 80, or 50, or 20. That’s still individual people who are out of their entire job because AI did some or most of it, and their bosses consolidated the rest of the responsibilities onto the remaining workers.