BigMuffin69

joined 1 year ago
[โ€“] BigMuffin69@awful.systems 14 points 4 months ago (13 children)

Reposting this for the new week thread since it truly is a record of how untrustworthy sammy and co are. Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.

From Subbarao Kambhampati via linkedIn:

"๐Ž๐ง ๐ญ๐ก๐ž ๐ฌ๐ž๐ž๐๐ฒ ๐จ๐ฉ๐ญ๐ข๐œ๐ฌ ๐จ๐Ÿ โ€œ๐‘ฉ๐’–๐’Š๐’๐’…๐’Š๐’๐’ˆ ๐’‚๐’ ๐‘จ๐‘ฎ๐‘ฐ ๐‘ด๐’๐’‚๐’• ๐’ƒ๐’š ๐‘ช๐’๐’“๐’“๐’‚๐’๐’๐’Š๐’๐’ˆ ๐‘ฉ๐’†๐’๐’„๐’‰๐’Ž๐’‚๐’“๐’Œ ๐‘ช๐’“๐’†๐’‚๐’•๐’๐’“๐’”โ€ hashtag#SundayHarangue. One of the big reasons for the increased volume of โ€œ๐€๐†๐ˆ ๐“๐จ๐ฆ๐จ๐ซ๐ซ๐จ๐ฐโ€ hype has been o3โ€™s performance on the โ€œfrontier mathโ€ benchmarkโ€“something that other models basically had no handle on.

We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1โ€“and that the benchmark creators were not allowed to disclose this *until after o3 *.

That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of โ€œ๐’1/๐’3 ๐’˜๐’†๐’“๐’† ๐’‹๐’–๐’”๐’• ๐’ƒ๐’†๐’Š๐’๐’ˆ ๐’•๐’“๐’‚๐’Š๐’๐’†๐’… ๐’๐’ ๐’”๐’Š๐’Ž๐’‘๐’๐’† ๐’Ž๐’‚๐’•๐’‰, ๐’‚๐’๐’… ๐’•๐’‰๐’†๐’š ๐’ƒ๐’๐’๐’•๐’”๐’•๐’“๐’‚๐’‘๐’‘๐’†๐’… ๐’•๐’‰๐’†๐’Ž๐’”๐’†๐’๐’—๐’†๐’” ๐’•๐’ ๐’‡๐’“๐’๐’๐’•๐’Š๐’†๐’“ ๐’Ž๐’‚๐’•๐’‰โ€โ€“that the AGI tomorrow crowd seem to haveโ€“that ๐˜–๐˜ฑ๐˜ฆ๐˜ฏ๐˜ˆ๐˜ ๐˜ธ๐˜ฉ๐˜ช๐˜ญ๐˜ฆ ๐˜ฏ๐˜ฐ๐˜ต ๐˜ฆ๐˜น๐˜ฑ๐˜ญ๐˜ช๐˜ค๐˜ช๐˜ต๐˜ญ๐˜บ ๐˜ค๐˜ญ๐˜ข๐˜ช๐˜ฎ๐˜ช๐˜ฏ๐˜จ, ๐˜ค๐˜ฆ๐˜ณ๐˜ต๐˜ข๐˜ช๐˜ฏ๐˜ญ๐˜บ ๐˜ฅ๐˜ช๐˜ฅ๐˜ฏโ€™๐˜ต ๐˜ฅ๐˜ช๐˜ณ๐˜ฆ๐˜ค๐˜ต๐˜ญ๐˜บ ๐˜ค๐˜ฐ๐˜ฏ๐˜ต๐˜ณ๐˜ข๐˜ฅ๐˜ช๐˜ค๐˜ตโ€“is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I donโ€™t completely believe that OpenAI didnโ€™t have access to the Olympiad/Frontier Math data before handโ€ฆ )

I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )

๐‘ซ๐’๐’Š๐’๐’ˆ ๐’˜๐’†๐’๐’ ๐’๐’ ๐’‰๐’‚๐’“๐’… ๐’ƒ๐’†๐’๐’„๐’‰๐’Ž๐’‚๐’“๐’Œ๐’” ๐’•๐’‰๐’‚๐’• ๐’š๐’๐’– ๐’‰๐’‚๐’… ๐’‘๐’“๐’Š๐’๐’“ ๐’‚๐’„๐’„๐’†๐’”๐’” ๐’•๐’ ๐’Š๐’” ๐’”๐’•๐’Š๐’๐’ ๐’Š๐’Ž๐’‘๐’“๐’†๐’”๐’”๐’Š๐’—๐’†โ€“๐’ƒ๐’–๐’• ๐’…๐’๐’†๐’”๐’โ€™๐’• ๐’’๐’–๐’Š๐’•๐’† ๐’”๐’„๐’“๐’†๐’‚๐’Ž โ€œ๐‘จ๐‘ฎ๐‘ฐ ๐‘ป๐’๐’Ž๐’๐’“๐’“๐’๐’˜.โ€

We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than โ€œ๐˜ธ๐˜ฆ ๐˜ฅ๐˜ช๐˜ฅ๐˜ฏโ€™๐˜ต ๐˜ด๐˜ฆ๐˜ฆ ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ด๐˜ฑ๐˜ฆ๐˜ค๐˜ช๐˜ง๐˜ช๐˜ค ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฃ๐˜ญ๐˜ฆ๐˜ฎ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ฏ๐˜ค๐˜ฆ ๐˜ฅ๐˜ถ๐˜ณ๐˜ช๐˜ฏ๐˜จ ๐˜ต๐˜ณ๐˜ข๐˜ช๐˜ฏ๐˜ช๐˜ฏ๐˜จโ€ (see โ€œIn vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilitiesโ€ https://lnkd.in/gZ2wBM_F ).

At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."

Big stupid snake oil strikes again.

[โ€“] BigMuffin69@awful.systems 11 points 4 months ago* (last edited 4 months ago) (1 children)

Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.

From Subbarao Kambhampati via linkedIn:

"๐Ž๐ง ๐ญ๐ก๐ž ๐ฌ๐ž๐ž๐๐ฒ ๐จ๐ฉ๐ญ๐ข๐œ๐ฌ ๐จ๐Ÿ "๐‘ฉ๐’–๐’Š๐’๐’…๐’Š๐’๐’ˆ ๐’‚๐’ ๐‘จ๐‘ฎ๐‘ฐ ๐‘ด๐’๐’‚๐’• ๐’ƒ๐’š ๐‘ช๐’๐’“๐’“๐’‚๐’๐’๐’Š๐’๐’ˆ ๐‘ฉ๐’†๐’๐’„๐’‰๐’Ž๐’‚๐’“๐’Œ ๐‘ช๐’“๐’†๐’‚๐’•๐’๐’“๐’”" hashtag#SundayHarangue. One of the big reasons for the increased volume of "๐€๐†๐ˆ ๐“๐จ๐ฆ๐จ๐ซ๐ซ๐จ๐ฐ" hype has been o3's performance on the "frontier math" benchmark--something that other models basically had no handle on.

We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1--and that the benchmark creators were not allowed to disclose this *until after o3 *.

That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of "๐’1/๐’3 ๐’˜๐’†๐’“๐’† ๐’‹๐’–๐’”๐’• ๐’ƒ๐’†๐’Š๐’๐’ˆ ๐’•๐’“๐’‚๐’Š๐’๐’†๐’… ๐’๐’ ๐’”๐’Š๐’Ž๐’‘๐’๐’† ๐’Ž๐’‚๐’•๐’‰, ๐’‚๐’๐’… ๐’•๐’‰๐’†๐’š ๐’ƒ๐’๐’๐’•๐’”๐’•๐’“๐’‚๐’‘๐’‘๐’†๐’… ๐’•๐’‰๐’†๐’Ž๐’”๐’†๐’๐’—๐’†๐’” ๐’•๐’ ๐’‡๐’“๐’๐’๐’•๐’Š๐’†๐’“ ๐’Ž๐’‚๐’•๐’‰"--that the AGI tomorrow crowd seem to have--that ๐˜–๐˜ฑ๐˜ฆ๐˜ฏ๐˜ˆ๐˜ ๐˜ธ๐˜ฉ๐˜ช๐˜ญ๐˜ฆ ๐˜ฏ๐˜ฐ๐˜ต ๐˜ฆ๐˜น๐˜ฑ๐˜ญ๐˜ช๐˜ค๐˜ช๐˜ต๐˜ญ๐˜บ ๐˜ค๐˜ญ๐˜ข๐˜ช๐˜ฎ๐˜ช๐˜ฏ๐˜จ, ๐˜ค๐˜ฆ๐˜ณ๐˜ต๐˜ข๐˜ช๐˜ฏ๐˜ญ๐˜บ ๐˜ฅ๐˜ช๐˜ฅ๐˜ฏ'๐˜ต ๐˜ฅ๐˜ช๐˜ณ๐˜ฆ๐˜ค๐˜ต๐˜ญ๐˜บ ๐˜ค๐˜ฐ๐˜ฏ๐˜ต๐˜ณ๐˜ข๐˜ฅ๐˜ช๐˜ค๐˜ต--is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don't completely believe that OpenAI didn't have access to the Olympiad/Frontier Math data before hand.. )

I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )

๐‘ซ๐’๐’Š๐’๐’ˆ ๐’˜๐’†๐’๐’ ๐’๐’ ๐’‰๐’‚๐’“๐’… ๐’ƒ๐’†๐’๐’„๐’‰๐’Ž๐’‚๐’“๐’Œ๐’” ๐’•๐’‰๐’‚๐’• ๐’š๐’๐’– ๐’‰๐’‚๐’… ๐’‘๐’“๐’Š๐’๐’“ ๐’‚๐’„๐’„๐’†๐’”๐’” ๐’•๐’ ๐’Š๐’” ๐’”๐’•๐’Š๐’๐’ ๐’Š๐’Ž๐’‘๐’“๐’†๐’”๐’”๐’Š๐’—๐’†--๐’ƒ๐’–๐’• ๐’…๐’๐’†๐’”๐’'๐’• ๐’’๐’–๐’Š๐’•๐’† ๐’”๐’„๐’“๐’†๐’‚๐’Ž "๐‘จ๐‘ฎ๐‘ฐ ๐‘ป๐’๐’Ž๐’๐’“๐’“๐’๐’˜."

We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than "๐˜ธ๐˜ฆ ๐˜ฅ๐˜ช๐˜ฅ๐˜ฏ'๐˜ต ๐˜ด๐˜ฆ๐˜ฆ ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ด๐˜ฑ๐˜ฆ๐˜ค๐˜ช๐˜ง๐˜ช๐˜ค ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฃ๐˜ญ๐˜ฆ๐˜ฎ ๐˜ช๐˜ฏ๐˜ด๐˜ต๐˜ข๐˜ฏ๐˜ค๐˜ฆ ๐˜ฅ๐˜ถ๐˜ณ๐˜ช๐˜ฏ๐˜จ ๐˜ต๐˜ณ๐˜ข๐˜ช๐˜ฏ๐˜ช๐˜ฏ๐˜จ" (see "In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities" https://lnkd.in/gZ2wBM_F ).

At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."

Big stupid snake oil strikes again.

[โ€“] BigMuffin69@awful.systems 6 points 4 months ago* (last edited 4 months ago)

Lmaou. "We need to alignment pill the Russian youth." Fast forward to the year 20XX and the haunted alignment pilled adults are now 'aligning' their bots to the world's top nuclear armed despot.

tony_soprano_how_could_this_happen.jpg (for some reason awful systems won't let me upload pictures anymore (ใƒŽเฒ ็›Šเฒ )ใƒŽ)

Holy Moses in heaven, iirc both Sam and Dario have said that their urge to build the torment nexus came from being inspired by online RAT forums. Maybe alignment 'pilling' youths is counterproductive to human flourishing? As the LWers say, "update your priors fuckheads"

[โ€“] BigMuffin69@awful.systems 3 points 4 months ago

smh they really do be out here believing there's a little man in the machine with goals and desires, common L for these folks

[โ€“] BigMuffin69@awful.systems 1 points 5 months ago

Thank you. My wife is deathly allergic to shrimp, and I live by the motto

'If they send one of your loved ones to the emergency room, you send 10 of theirs to the deep fryer. '

[โ€“] BigMuffin69@awful.systems 0 points 5 months ago* (last edited 5 months ago) (1 children)

Shared this on tamer social media site and a friend commented:

"That's nonsense. The largest charities in the country are Feeding America, Good 360, St. Jude's Children's Research Hospital, United Way, Direct Relief, Salvation Army, Habitat for Humanity etc. etc. Now these may not satisfy the EA criteria of absolutely maximizing bang for the buck, but they are certainly mostly doing worthwhile things, as anyone counts that. Just the top 12 on this list amount to more than the total arts giving. The top arts organization on this list is #58, the Metropolitan Museum, with an income of $347M."

[โ€“] BigMuffin69@awful.systems 3 points 5 months ago

A nice exponential curve depicting the infinite future lives saved by whacking a ceo

[โ€“] BigMuffin69@awful.systems 1 points 6 months ago* (last edited 6 months ago)

The American electorate has just covered itself with gasoline because eggs cost 2 dollars more. Come January they strike the match. gg. HATE. LET ME TELL YOU HOW MUCH I'VE COME TO HATE YOU SINCE NOVEMBER 5TH. My only consolation is that I'll hopefully get to watch some of the Magas/non voters/vote-your-conscience peeps suffer before the end. But Ol musky and peter thiel will be in their gilded bunkers while the fires consume us all.

[โ€“] BigMuffin69@awful.systems 0 points 6 months ago* (last edited 6 months ago) (2 children)

I know it's Halloween, but this popped up in my feed and was too spooky even for me ๐Ÿ˜ฑ

As a side note, what are peoples feelings about Wolfram? Smart dude for sho, but some of the shit he says just comes across as straight up pseudoscientific gobbledygook. But can he out guru Big Yud in a 1v1 on Final Destination (fox only, no items) ? ๐Ÿค”

[โ€“] BigMuffin69@awful.systems 1 points 7 months ago (1 children)

Actual message I got while renewing my insurance plan last night. Thank you for adding a shitty chat bot which will give me false information about my life and death decisions, bravo.

[โ€“] BigMuffin69@awful.systems 0 points 9 months ago (3 children)

but I do believe brains are computers, but only in the broadest sense of what computation could be

Agree. A human brain is capable of executing the steps of a TM with pen/paper, and in that sense the brain is absolutely capable of acting as a computer. But as far as all the other process a brain does (breathing/maintaining heart rate/etc.) describing that as 'a computer' seems such an abuse of notation as to render the original definition meaningless. We might as well call the moon a computer since it is 'calculating' the effect of a gravitational field on a moon sized object. What I think many people are really claiming when they say a brain is a computer is that if only we could identify the correct finite state deterministic program, there would be no difference between the brain and its implementation in silicon. Personally, I find claims of substrate independence to be less plausible, but of course many of our dear friends are willing to bite that bullet.

[โ€“] BigMuffin69@awful.systems 1 points 9 months ago* (last edited 9 months ago)

Smh, why do I feel like I understand the theology of their dumb cult better than its own adherents? If you believe that one day AI will foom into a 10 trillion IQ super being, then it makes no difference at all whether your ai safety researcher has 200 IQ or spends their days eating rocks like the average LW user.

view more: โ€น prev next โ€บ