25+ yr Java/JS dev
Linux novice - running Ubuntu (no windows/mac)

  • 1 Post
  • 1.64K Comments
Joined 1 year ago
cake
Cake day: June 14th, 2023

help-circle




  • Individual attention is good. And AI as a tutor can be helpful because they do have infinite patience.

    That being said, genuine curiosity plays an important role and if a kid isn’t curious about a subject, AI isn’t going to help with motivation.

    What’s much more likely is that (eventually) AI Teachers and AI Doctors are going to be the best we’ve ever had. No human, not even the parents of only children, can lavish the time, expertise, and attention these AIs will give your child.

    No, that’s pretty unlikely. They have time and attention, but not really expertise. They have good command of straightforward knowledge, but just imagine the shitshow that would be explaining the politics of the American Civil War. Or Vietnam.

    Yeah, AI knows what a gerund is and how to calculate the area of an ellipse, but it will struggle with more philosophical topics that don’t have a clear cut right and wrong answer.



  • You made a lot of points here. Many I agree with, some I don’t, but I specifically want to address this because it seems to be such a common misconception.

    It does and it doesn’t discard the original. It isn’t impossible to recreate the original (since all the data it gobbled up gets stored somewhere in some shape or form and can be truthfully recreated, at least judging by a few comments bellow and news reports). So AI can and does recreate (duplicate or distribute, perhaps) copyrighted works.

    AI stores original works like a dictionary does. All the words are there, but the order and meaning is completely gone. An original work is possible to recreate by randomly selecting words from the dictionary, but it’s unlikely.

    The thing that makes AI useful is that it understands the patterns words are typically used in. It orders words in the right way far more often than random chance. It knows “It was the best of” has a lot of likely options for the next word, but if it selects “times” as the next word, it’s far more likely to continue with, “it was the worst of times.” Because that sequence of words is so ubiquitous due to references to the classic story. But over the course of following these word patterns, it will quickly glom onto a different pattern and create a wholly new work from the original “prompt.”

    There are only two cases in which an original work should be duplicated: either the training data is far too small and the model is overtrained on that particular work, or the work is the most derivative text imaginable lacking any flair or originality.

    Adding more training data makes it less likely to recreate any original works.

    I am aware of examples where it was claimed an LLM reproduced entirely code functions including original comments. That is either a case of overtraining, or far too many people were already copying that code verbatim into their own, thus making that work very over represented in the training data (same thing, but it was infringing developers who poisoned the data, not researchers using bad training data).

    Bottom line: when created with enough data, no original works are stored in any way that allows faithful reproduction other than by chance so random that it’s similar to rolling dice over a dictionary.

    None of this means AI can do no wrong, I just don’t find the copyright claim compelling.