Sam's refuge of logic, reason and rationality...

May 21, 2025

Trying to respond to @scaptal

Yeah just comparing answers doesn't always work, especially if there are multiple possible right ones, so instead you ask it to show its working, like in a school exam, and then you can judge the reasoning behind it. Stuff like how efficient it was (if that’s the goal), how many sources it used, how good they were, what kind of maths it did, whether it double-checked anything, avoided obvious bias, that kind of thing. Even if we don’t tell it what the goal is, and it doesn’t fully figure it out, it can still optimise. I should correct what I said before with 2X as the higher reward. With a simple multiplier, the model would just figure out that Y is always 2X, 3X, 10X etc and do the cost/benefit almost immediately. So instead, Y needs to be more desirable but in a less predictable way, maybe using some randomised multiplier, so it can't just (immediately at least) calculate if cheating will be fast enough to do it enough times in time for it to be worth more than Y and thus wo...

Search This Blog

Sam's refuge of logic, reason and rationality...

Posts

Featured

Trying to respond to @scaptal

Latest posts

Apple and Google (Android)...

It's not about winning the argument it's about controlling the conversation...

Telling the generic douchebags from the "isists"

Royal Mail