diz

joined 2 years ago
[–] diz@awful.systems 5 points 2 weeks ago

Well, it did reach for "I double checked it, I'm totally sure now" language.

From the perspective of trying to convince the top brass that they are making good progress towards creating an artificial psychopath - not just an artificial human - it's pretty good.

[–] diz@awful.systems 5 points 2 weeks ago* (last edited 2 weeks ago)

Still seems terminally AI pilled to me, an iteration or two later. "5 digit multiplication is borderline", how is that useful?

I think there's a combination of it being a pinnacle of billions and billions of dollars, and probably theirs firing people for slightest signs of AI skepticism. There's another data point, "reasoning math & code" is released as stable by Google without anyone checking if it can do any kind of math.

edit: imagine that a calculator manufacturer in 1970s is so excited about microprocessors they release an advanced scientific calculator that can't multiply two 6 digit numbers (while their earlier discrete component model could). Outside the crypto sphere, that sort of insanity is new.

[–] diz@awful.systems 7 points 2 weeks ago

Yeah, I'd also bet on the latter. They also added a fold-out button that shows you the code it wrote (folded by default), but you got to unfold it or notice that it is absent.

[–] diz@awful.systems 10 points 2 weeks ago

Oh and also for the benefit of our AI fanboys who can't understand why we would expect something as mundane from this upcoming super-intelligence, as doing math, here's why:

[–] diz@awful.systems 6 points 2 weeks ago* (last edited 2 weeks ago)

Also, I just noticed something really fucking funny:

(arrows are for the sake of people like llllll...)

[–] diz@awful.systems 9 points 2 weeks ago (5 children)

lmao: they have fixed this issue, it seems to always run python now. Got to love how they just put this shit in production as "stable" Gemini 2.5 pro with that idiotic multiplication thing that everyone knows about, and expect what? to Eliza Effect people into marrying Gemini 2.5 pro?

[–] diz@awful.systems 6 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

there was a directive that if it were asked a math question that you can’t do in your brain or some very similar language it should forward it to the calculator module.

The craziest thing about leaked prompts is that they reveal the developers of these tools to be complete AI pilled morons. How in the fuck would it know if it can or can't do it "in its brain" lol.

edit: and of course, simultaneously, their equally idiotic fanboys go "how stupid of you to expect it to use a calculating tool when it said it used a calculating tool" any time you have some concrete demonstration of it sucking ass, while simultaneously the same kind of people are lauding the genius of system prompts half of which are asking it to meta-reason.

[–] diz@awful.systems 11 points 2 weeks ago (3 children)

Thing is, it has tool integration. Half of the time it uses python to calculate it. If it uses a tool, that means it writes a string that isn't shown to the user, which runs the tool, and tool results are appended to the stream.

What is curious is that instead of request for precision causing it to use the tool (or just any request to do math), and then presence of the tool tokens causing it to claim that a tool was used, the requests for precision cause it to claim that a tool was used, directly.

Also, all of it is highly unnatural texts, so it is either coming from fine tuning or from training data contamination.

[–] diz@awful.systems 3 points 2 weeks ago

I don't think we need to go as far as evopsych here... it may just be an artifact of modeling the environment at all - you learn to model other people as part of the environment, you re-use models across people (some people are mean, some people are nice, etc).

Then weather happens, and you got yourself a god of bad weather and a god of good weather, or perhaps a god of all weather who's bipolar.

As far as language goes it also works the other way, we over used these terms in application to computers, to the point that in relation to computers "thinking" no longer means it is actually thinking.

[–] diz@awful.systems 14 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

misinterpreted as deliberate lying by ai doomers.

I actually disagree. I think they correctly interpret it as deliberate lying, but they misattribute the intent to the LLM rather than to the company making it (and its employees).

edit: its like you are watching a TV and ads come on you say that a very very flat demon who lives in the TV is lying, because the bargain with the demon is that you get to watch entertaining content in response to having to listen to its lies. It's fundamentally correct about lying, just not about the very flat demon.

[–] diz@awful.systems 12 points 2 weeks ago* (last edited 2 weeks ago)

Hmm, fair point, it could be training data contamination / model collapse.

It's curious that it is a lot better at converting free form requests for accuracy, into assurances that it used a tool, than into actually using a tool.

And when it uses a tool, it has a bunch of fixed form tokens in the log. It's a much more difficult language processing task to assure me that it used a tool conditionally on my free form, indirect implication that the result needs to be accurate, than to assure me it used a tool conditionally on actual tool use.

The human equivalent to this is "pathological lying", not "bullshitting". I think a good term for this is "lying sack of shit", with the "sack of shit" specifying that "lying" makes no claim of any internal motivations or the like.

edit: also, testing it on 2.5 flash, it is quite curious: https://g.co/gemini/share/ea3f8b67370d . I did that sort of query several times and it follows the same pattern: it doesn't use a calculator, it assures me the result is accurate, if asked again it uses a calculator, if asked if the numbers are equal it says they are not, if asked which one is correct it picks the last one and argues that the last one actually used a calculator. I hadn't ever managed to get it to output a correct result and then follow up with an incorrect result.

edit: If i use the wording of "use an external calculator", it gives a correct result, and then I can't get it to produce an incorrect result to see if it just picks the last result as correct, or not.

I think this is lying without scare quotes, because it is a product of Google putting a lot more effort into trying to exploit Eliza effect to convince you that it is intelligent, than into actually making an useful tool. It, of course, doesn't have any intent, but Google and its employees do.

[–] diz@awful.systems 15 points 2 weeks ago (2 children)

Pretentious is a fine description of the writing style. Which actual humans fine tune.

view more: ‹ prev next ›