AI Evaluation Techniques for Insurers: The Future of Risk Pricing

Gallagher Re highlights the urgent need for advanced evaluation techniques of AI systems to enable insurers to accurately price AI-related risks. In their report, "Anthropic’s Fourth Way: Why Restricted AI Models Are a Challenge for Insurers," Gallagher Re points out that current methods are not tailored for underwriting, focusing instead on quantifiable performance rather than true operational behavior.

Ed Pocock, Global Head of Cyber Security at Gallagher Re, emphasizes the discrepancy between benchmark testing and actual risk assessment processes in insurance. Understanding AI models' failure rates and their potential correlations across portfolios is crucial for effective risk management.

Standard benchmarks assess AI on specified tasks but fail to capture system behavior under complex conditions. Gallagher Re warns of unseen risks like AI hallucinations or inconsistent responses, highlighting that such benchmarks overlook concentration risks linked to systemic failures among insured entities.

The report also flags 'benchmark contamination', where models optimized for specific tests may produce artificially high scores, compromising score reliability. This issue could reduce variability among systems and increase systemic risks within the insurance sector.

Gallagher Re explores the trend of restricted-distribution AI models, such as Anthropic’s Mythos, which limit access to select partners. Such models present challenges for independent evaluation critical to accurate risk pricing, potentially influencing regulatory compliance requirements.

Despite evaluations by institutions like the UK AI Security Institute, Gallagher Re calls for broader access to ensure reliable risk assessments. Pocock warns that without independent evaluation, pricing becomes uncertain and possibly more costly, stalling market evolution.

The firm advocates for evaluation methods reflecting AI systems' practical operations. These methods should incorporate realistic inputs, adversarial scenarios, and continuous monitoring. Pioneering efforts by Epoch AI and Artificial Analysis are promising steps towards robust, less manipulative evaluation techniques, essential for transparent and resilient AI development. Pocock concludes that enhanced evaluations could drive market incentives for transparency and robustness, cautioning against reliance on brand and scale without comprehensive evaluations which may exacerbate concentration risks.