It always should have had the right business model where they paid for this access for AI training. They knew it was wrong but in their rush to be known they decided it was better to take without asking and then ask for forgiveness later. Regardless what happens now, people have already made a name for themselves swindling the likes of Microsoft out of it and will have long well-paying careers from it.
It seems like it was almost necessary to go through this phase for the sake of developing the tech. Doesn’t a lot of CS research uses web crawling algorithms to gather data without identifying that the information is licensed for such use? What about the fediverse? it remains unclear what the copyright and licensing will be should it come into question. There is no EULA to access fedi, just a set of open protocols.
I seem to remember NYT suing Google years ago for effectively the same thing. Google copies all NYT articles into it’s index, then sells ads for people to search for that copyrighted information.
It always should have had the right business model where they paid for this access for AI training. They knew it was wrong but in their rush to be known they decided it was better to take without asking and then ask for forgiveness later. Regardless what happens now, people have already made a name for themselves swindling the likes of Microsoft out of it and will have long well-paying careers from it.
It seems like it was almost necessary to go through this phase for the sake of developing the tech. Doesn’t a lot of CS research uses web crawling algorithms to gather data without identifying that the information is licensed for such use? What about the fediverse? it remains unclear what the copyright and licensing will be should it come into question. There is no EULA to access fedi, just a set of open protocols.
Testing an algorithm for a paper with releasing the weights/data is not the same as selling the output of the algorithm.
It doesn’t matter: scraping data has and always been legal.
Depends where you live, my academic advisor set limits on scrapping due to past experience.
I seem to remember NYT suing Google years ago for effectively the same thing. Google copies all NYT articles into it’s index, then sells ads for people to search for that copyrighted information.