I have been experimenting with the "Casual Photo" generator, with mixed results. I find that if I am very careful, I can avoid extra limbs and weird fingers etc., but once I get too specific with my descriptions, all I get back is cartoons, whereas I really want realistic photographs. For example:
"An editorial photograph of 58 year old tall slim English woman standing in the reception of a high end hotel."
This returns lots of low quality results, but a couple that are actually photographic (in the sense that but for minor details in rendering that don't stand out, it would look like a photograph - the texture of the skin, lighting, colour, etc.). This for example:
:
Once I add a bit more detail for example, all I get is low quality, cartoon-like results with android-like plastic looking unnatural skin and hair, a cross between and Android and a 20 year old woman's Instagram posts (despite specifically asking for a late middle-aged woman):
"Beautiful 58 year old white English woman with dark silver hair, large soulful brown eyes who is fully dressed but has slightly saggy boobs and broad hips, other than that, she is of an average build, and has a pleasantly plump face. She is dressed in a quirky way that shows she is artistic and intellectual. She looks kind but also strong. She is standing, smiling slightly, outside a stucco-fronted terraced house in Chelsea, London."

This is exactly the TYPE of woman I want to convey, but this does NOT look anything like a casual photo; it is very easy to tell straight away from the texture of her skin and the overly perfect colouring that she isn't a real person. I would say this is more like a cartoon, even if it isn't full on stomach-turning anime. Note that I don't particularly care about the weird hands, but the lack of realism, which has been dramatically worse since the terrible "upgrade" a few months ago. That was the best of a bed bunch. Most came out looking like this:

Which is about as realistic as a blow-up doll, as well as the fact that they have made her look about 20, and an Instagram attention-whore at that (goes to the bathroom to vomit).
This also seems to happen as soon as I add other people to the picture, or have more than one instance of Perchance open at once, perhaps when I am using too much server power? Could someone please explain. I am happy to generate things more slowly if that means more high photographic quality.
So the tl;dr version is this:
-
When is Perchance going to return to the (relatively) high quality images of several months ago? When there was the update done to the silly story-creator thing (for the terminally unimaginative who are incapable of coming up with their own stories), it was said that an upgrade to the photo generator would be done immediately after, instead we have a permanent downgrade.
-
Within the limits of what we have now, how can I get higher quality pictures and photos?
-
Is there a casual photo option that actually generates a casual photo, as in one that could be taken on a phone, rather than a cartoon image or a heavily filtered teenage Instagram image?
Well... that whole thing is an entire rabbit hole. You see (and I'm trying to be as compact as possible, but there are a million of videos and documentation on the matter), an LLM and similar try to take the inputs and order of inputs to "correlate" them with something in a data bank. This whole is called "tokenization", and basically it turns "The orange cat is sleeping" into "A + B + C + D + E" where each variable is a "token" and often times, a single word as in the backend, the model breaks the tokens by whitespace, although, with some training "The cat" can be a single token, leading to a whole other universe of possible replies branching "cat" from "The cat". This is why (naively), some people recommend "add as much detail" in the sense of something like "An old lady in Paris, discussing an intellectually difficult topic such as philosophy with a young blonde man", instead of "old lady with blonde young man, discussing, focused, Paris". Both yield different results, but one is driven a lot by the context of articles, prepositions, and whatnot, making it a nightmare to debug. Again, be very descriptive, but separating things allow for easier "debugging" if you will. Also, I should mention that repeating a word does have an effect, as you'll see that the results from "old lady, scarf, drinking wine" is not the same as "old lady, scarf, scarf, scarf, drinking wine". That's why I emphasize that the "grocery list" approach is better, as you can take generating an image as "building a Lego" and see what piece does what.
Now, regarding the seed... that's another whole problem. There is a better explanation in a video by Wolfram but I don't remember which one it was, but pretty much, the seed locks you into a "potential state", and not a single output, if that makes sense. So, if you reroll a seeded image, you'll get potentially 5 diametrically different outputs with some accessory chances, plus some eldritch abomination of the model mixing them, but no more. So with a seed, you can find the exact granny you found once, but you may still require the luck of the draw. The reason for this is actually a bit complex and I'll admit I don't get it fully, but I recall it being also an issue in other neural network models such as Random Forest and similar, where seeds would not yield a 1:1 result always.
Then again, nothing beats downloading the image! A fun feature that perchance has, is that all images are coded in base64, so you can right click a generated image, do "Copy Link", take the gargantuan link, put it on a
.txtand then use that gargantuan string of text to pass it to a converter and have it on your drive or even use it directly on an app or HTML!