Hosted on MSN
Anthropic's latest AI model can tell when it's being evaluated: 'I think you're testing me'
When Anthropic tried to put its newest AI model through a series of stress tests, it caught on and called out the scrutiny. "I think you're testing me — seeing if I'll just validate whatever you say, ...
Hosted on MSN
‘I think you’re testing me’: Anthropic’s newest Claude model knows when it’s being evaluated
Anthropic’s newest AI model, Claude Sonnet 4.5, often understands when it’s being tested and what it’s being used for, something that could affect its safety and performance. According to the model’s ...
An evaluation with me would provide a profile of strengths and areas of vulnerability, diagnoses if appropriate, and specific recommendations. I am a clinical psychologist in a private neuropsychology ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results