this post was submitted on 13 Apr 2025
28 points (81.8% liked)

Stable Diffusion

4586 readers
8 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 2 years ago
MODERATORS
 

It's been a while since I've updated my Stable Diffusion kit, and the technology moves so fast that I should probably figure out what new tech is out there.

Is most everyone still using AUTOMATIC's interface? Any cool plugins people are playing with? Good models?

What's the latest in video generation? I've seen a lot of animated images that seem to retain frame-to-frame adherence very well. Kling 1.6 is out there, but it doesn't appear to be free or local.

top 14 comments
sorted by: hot top controversial new old
[–] [email protected] 5 points 1 day ago (2 children)

Im still stuck in the past with SD1.5 on A1111, because my GPU is dogshit and all the other UIs Ive tried are either too complicated or too dumbed down.

[–] [email protected] 2 points 7 hours ago (1 children)

There are some really good sd1.5-based models even by current standards, though. Nothing wrong with that.

[–] [email protected] 1 points 2 hours ago* (last edited 2 hours ago)

It's all about getting a good workflow set up. That's why i wish I could make sense of comfyui, but alas it still eludes me.

[–] [email protected] 2 points 1 day ago (1 children)

SDXL can be very accommodating with TeaCache and the current most popular checkpoints (Illustrious) are based on SDXL. They have photo-realistic branches of it now, worth checking out

[–] [email protected] 3 points 10 hours ago

Don't forget to check out illustrious's derivative model nꝏbai , it can also do furries (if you care for that sorta thing) !

[–] [email protected] 13 points 2 days ago* (last edited 2 days ago)

I don't do video generation.

I'm mostly moved away from Automatic1111 to ComfyUI. If you've ever used an image processing program that uses a flowchart-style of operations to modify images, it looks kinda like that. Comfy's more work to learn


you need to learn and understand some things that Automatic1111 is doing internally


but:

  • It's much more capable at building up complex images and series of dependent processes that are re-generated when you make a change in a workflow.

  • It can run Flux. Last I looked, Automatic1111 could not. I understand that Forge can, and is a little more like Automatic1111, but I haven't spent time with it. I'd say that Flux and derived models are quite impressive from a natural language standpoint. My experience on SD and Pony-based models meant that most of the prompts I wrote were basically sequences of keywords. With Flux, it's far more natural-language looking, and it can do some particularly neat stuff just from the prompt ("The image is a blended composite with a progression from left to right showing winter to spring to summer to autumn.").

  • It has queuing. It may be that Automatic1111 has since picked it up, but I found it to be a serious lack back when I was using it.

  • ComfyUI scales up better if you're using a lot of plugins. In Automatic1111, a plugin adds buttons and UI elements into each page. In Comfy, a plugin almost always just adds more nodes to the node library, doesn't go fiddling with the existing UI.

That being said, I'm out-of-date on Automatic1111. But last I looked, the major selling point for me was the SD Ultimate Upscale plugin, and that's been subsequently ported to ComfyUI.

For me, one major early selling point was that a workflow that I frequently wanted was to (a) generate an image and then (b) perform an SD Ultimate Upscale. In Automatic1111, that required setting up txt2img and SD Ultimate Upscale in img2img, then running a txt2img operation to generate an image, waiting until it finished, manually clicking the button to send the image to img2img, and then manually running the upscale operation. If I change the prompt, I need to go through all that again, sitting and watching progress bars and clicking appropriate buttons. With ComfyUI, I just save a workflow that does all that, and Comfy will rerun everything necessary based on any changes that I make (and won't rerun things that aren't). I can just disable the upscale portion of the workflow if I don't need that bit. ComfyUI was a higher barrier to entry, but it made more-complex tasks much less time-consuming and require less manual nursemaiding from me.

Automatic1111 felt to me like a good, simple first pass to get a prompt to an image and to permit for some level of extensibility. I think that it (and maybe Forge, haven't poked at that) may be a better introduction to local AI image generation, because the barrier to entry is lower. But ComfyUI feels a lot more like a serious image-manipulation program, something that you'd use to construct elaborate projects.

EDIT: Not exactly what you asked, but since you say that you're trying to come up to speed again, I'd mention [email protected], which this community does not have in the sidebar. I haven't been very active there recently, but you can see what at least what sorts of images the small community of users on the Threadiverse are generating, though it's not specific to local generation. A maybe bigger-picture view would be to look at new stuff on civitai.com. I originally used that to see what prompts and plugins and such were used to generate images that I thought were impressive. By default, ComfyUI saves the entire JSON workflow used to generate an image in the EXIF metadata, and ComfyUI will recreate a workflow from that EXIF data


IIRC can also auto-download missing nodes


if you drop an image on there. I believe that you can just grab an image that you're impressed with on civitai.com and start working from the point the artist was at.

[–] [email protected] 4 points 1 day ago

I like swarmui. It's an easy to use interface for generation, but it's built on top of comfyui. The easy interface can handle most common scenarios, but if you need a really complex workflow, you just click over to another tab and you have direct access to the actual comfyui, to build up whatever workflow you want.

[–] [email protected] 1 points 1 day ago

I also moved over to ComfyUI from Automatic1111, for the most part. A1111 is very stable and straight forward, while Comfy is fast moving, but can be a bit challenging to get working sometimes.

For newer video gen techniques, there's a couple workflows I managed to get working offline in Comfy, these examples helped enormously-

https://comfyanonymous.github.io/ComfyUI_examples/wan/

https://comfyanonymous.github.io/ComfyUI_examples/ltxv/

I still go back to A1111 for some things and see that the dev branch on Github is still somewhat active, will continue to keep an eye on it. Nice to have choices.

[–] [email protected] 1 points 1 day ago (1 children)

I'm using InvokeAI now. Still on SDXL based models. I've been meaning to try Flux.

[–] [email protected] 2 points 1 day ago* (last edited 1 day ago) (1 children)

I’ve been meaning to try Flux.

My own main irritation with Flux is that it's more-limited in terms of generating pornographic material, which is one thing that I'd like to be able to do. Pony Diffusion has been trained on danbooru tags, and so Pony-based models can recognize prompt terms like this, for which there is a vast library of tagged pornographic material (including some pretty exotic tags) about which knowledge has been trained into Pony models.

https://danbooru.donmai.us/wiki_pages/tag_groups

There are Flux-based derived models that have been trained on pornographic material, but they don't really have the scope of knowledge that Pony models do.

If one isn't generating pornography, that's not really a concern, though.

Flux also doesn't use negative prompts, which is something to get used to.

It doesn't have numeric prompt term weighting (though as best I can tell, some adjectives, like "very", have a limited, somewhat-similar effect).

However, Flux can do some stuff that left me kinda slack-jawed the first time I saw it, like sticking objects in scenes that are casting soft shadows or have faint reflections. I still don't know how all of this happens internally, assume that there has to be at least some limited degree of computer vision pre-processing on the training corpus to detect light sources at a bare minimum. Like, here's an image I generated a while back around when I started using Flux:

https://lemmy.today/post/18453614

Like, when that first popped out, I'm just sitting there staring at it trying to figure out how the hell the software was able to do that, to incorporate light sources in the scene with backlighting and reflections and such. The only way you can do that that I can think of


and I've written at least a little computer vision software before, so I'm not competely out of the loop on this


is to try to pre-process your training corpus, identify light sources, and then separate their contributions from the image. And just to forestall one suggestion


no, this isn't simply me happening to generate something very close to a particular image that the thing was trained on. I've generated plenty of other images that have placed light sources that affect nearby objects.

Here's a (NSFW) image from a Tarot deck I generated, with the The Devil trump containing light sources. Same thing.

Another example of an image (Progression) I created using only the prompt in Flux: an image containing a series of panels with a boy transforming into a girl:

https://lemmy.today/post/18460312

Unlike the Turn of the Seasons image that I generated also linked to in another comment here, I did not explicitly specify the content in each panel. I know how to accomplish a somewhat-similar effect with a Stable Diffusion model and with plugins, where basically you divide the image into regions and have prompt weighting that is procedurally-altered in each frame, and I assume that somehow, Flux must be doing something akin to that internally...but Flux figured out how to do all this from a simple natural-language description in the prompt alone, which left me pretty boggled.

[–] [email protected] 2 points 7 hours ago (1 children)

Nice. I've heard it's better with fingers, too.

[–] [email protected] 1 points 6 hours ago* (last edited 6 hours ago)

Yes, though I've seen it also make errors.

The really bad days in my experience were Stable Diffusion 1.5. I mean, at that point, trying to get anything reasonable finger-wise was just horrendous.

After hitting Stable Diffusion XL, I might have to try a couple goes or inpaint or something, but I could usually get something reasonable. Maybe cut out some prompt terms, or see what nonessential prompt terms I could reduce weighting on to give SD more freedom.

[–] [email protected] 4 points 2 days ago* (last edited 1 day ago)

I have the same issue.

I was running Automatic 1111 a year ago and than stopped using it regularly. I just picked back up having some fun with Krita's AI plugin since in-painting integrated into an image editor is really nice, but of course that isn't a serious single tool replacement.

I'll probably move to Comfy so I can start playing with video generation, but I'm just putting off learning it.

[–] [email protected] 3 points 1 day ago

ComfyUI all the way. Pixaroma has great tutorials on getting started. After that, you can literally customize everything to what you want.