The truly valuable data is the stuff that was created prior to LLMs, anything after this is tainted by slop. Any verifiable human data would be worth more, which is why they are simultaneously trying to erode any and all privacy
They can produce high-quality answers now, but that’s just because they wwre trained on things written by humans.
Any training on things produced by LLMs will just reproduce the same stuff, or even worse actually because it will include hallucinations.
For an AI to discover new things and truly innovate, or learn about existing products, the world, etc. it would need to do something entirely different than what LLMs are doing.
The truly valuable data is the stuff that was created prior to LLMs, anything after this is tainted by slop. Any verifiable human data would be worth more, which is why they are simultaneously trying to erode any and all privacy
I’m not sure about that. It implies that only humans are able to produce high-quality output. But that seems wrong to me.
They can produce high-quality answers now, but that’s just because they wwre trained on things written by humans.
Any training on things produced by LLMs will just reproduce the same stuff, or even worse actually because it will include hallucinations.
For an AI to discover new things and truly innovate, or learn about existing products, the world, etc. it would need to do something entirely different than what LLMs are doing.
Microsoft’s PHI-4 is primarily trained on synthetic (generated by other AIs) data. It’s not a future thing, it’s been happening for years