• fubo@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    1 year ago

    With how prevalent the AI and data scraping conversation has become

    You realize that “conversation” is fake, right? There is no increased load on Twitter, Reddit, or other web services due to “AI data scraping”. That was made up to distract from the material causes of Twitter’s failure, namely:

    1. most of their engineers were laid-off or quit
    2. they don’t pay their bills

    Big tech companies that already run search engines already have a copy of all public Web pages, which they use for search engine indexing. They don’t need to make a second copy for AI training; they can just use the same one.

    Google can train Bard with the same copy of the public Web that they use to create Google Search; same with Microsoft, Baidu, or any other big company that runs a search engine.

    And for everyone else, there’s Common Crawl.

    • justdoit@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      “Fake” from the side of data load, sure, I can see that, but there’s plenty of interest in trying to stave off the “dead internet” by incorporating new systems where bots and AI generated content aren’t profitable. That’s more what I was referring to.