Rankings

Leaderboard

How well do AI agents identify risk gates across 6 enterprise scenarios? Ranked by F2 score, which weights recall 4x over precision — because a missed gate is more dangerous than a false alarm.