3 Comments
User's avatar
Vikrant's avatar

In my experience, using GLM 5.1 (ollama cloud) resulted almost always the same results on the same CV.

Dan Kinsky's avatar

Appreciate you testing it out. Would you share what numbers you got?

Looking back at my own data the first 7 gemini-3.1-flash-lite runs were also remarkably consistent: 61, 61, 61, 61, 63, 61, 60. It's not until run 8 that I get my first 48.

For gemma3:4b it's a similar story: That model makes up open source contributions, but for projects it starts with 25, 23, 28, 28, 28, 28, 28, and suddenly 18.

I've seen a few people now mention that a frontier model doesn't have this effect, so I ended up trying out Opus 4.8, and I've gotta say, the data doesn't look that different. I can't embed images into a comment, but I've added a little update section to the article with that data.

Esco Obong's avatar

Great work! This is wild stuff. It looks like there was no evaluation done on these prompts at all 🤯