Anthropic just published new research that successfully identified and mapped millions of human-interpretable concepts, called “features”, within the neural networks of Claude.

Lugh@futurology.today · 6 months ago

Anthropic just published new research that successfully identified and mapped millions of human-interpretable concepts, called “features”, within the neural networks of Claude.

Sims@lemmy.ml · 6 months ago

…oc it also opens up for manipulative use by corporations. I.e we will probably quickly see commercial models that inflate users ego by exaggerating how amazing the users insights are, or recommending Corp interests - all hidden for the user, and just to profit from the $!@ model.

Anthropic just published new research that successfully identified and mapped millions of human-interpretable concepts, called “features”, within the neural networks of Claude.

Anthropic just published new research that successfully identified and mapped millions of human-interpretable concepts, called “features”, within the neural networks of Claude.

Mapping the Mind of a Large Language Model