Persistent Issues with SDXL + IP-Adapter Stability Across RunPod AI Templates (ComfyUI & A1111)
Hey @RunPod Support / Community,
I'm running into persistent critical issues with the hearmeman/comfyui-wanvideo:v9 template when trying to use SDXL + IP-Adapter (specifically with h94 IP-Adapter weights and the LAION CLIP-ViT-H-14 encoder).
Despite correct model placement and extensive troubleshooting, enabling IP-Adapter consistently leads to highly unstable/disfigured outputs or backend crashes, even with simple prompts and base SDXL 1.0 (which works fine without ControlNet/IP-Adapter). The "Resampler size mismatch" error has been resolved by using the correct ~2.5GB ViT-H encoder.
Is anyone else facing issues with stable SDXL + IP-Adapter character consistency? I'm trying to get basic Text-to-Image, Image-to-Image, and eventually Image-to-Video workflows functional.
I'm currently unable to get a usable character output. Any verified workflow examples (JSONs), specific ComfyUI node configurations for this template, or insights into known incompatibilities would be hugely appreciated. I've been trying to resolve this since last Friday.
Thanks!

38 Replies
So you're having a problem for sdxl + ip adapter, it doesnt generate the image's you'd hoped?
does ip adapter actually works for fictional characters like that ? non-realism stye
Hey, exactly.
Could you take a look at my workflow in that image to check for anything seemingly wrong? Any help and insights would be greatly appreciated.
I suppose so, but it has a great degree of realism though, not photorealistic type but surely realistic.
ill try to make some tonight, what resources have you explored to fix this / get any workflows?
try youtube?
if you dont mind, please share your source image maybe later i can try with it
can i share your screenshot in comfy's discord, maybe they can figure out whats wrong too
I am a beginner, ever since attempting to set up workflows for realistic videos and images using common templates such as a1111, forge and comfyui I never succeeded at generating videos with a consitent character and only managed once to get consistent character images with a1111, if you manage to accomplish these workflows your guidance with instructions would be appreciated. I have resorted to some videos and the latest AI`s (gemini 2.5 pro and gpt 0.3 and 0..4).
Sure

tried the image on the right as input and got that<


so its working with photorealistic?
it is not working with any realistic images.
does it work with semi-realistic like that but if the face were bigger?
seems to be working here no?
no, it is not as realistic as I requested, there are some eyes issues and it duplicated the character which I also had not requested.
here are more examples with that input, I had never inputted a prompt soliciting multiple faces.


These ones with the blonde women were from the official a1111 template, I downloaded a well known checkpoint and I believe to have followed all the steps to configure the ui correctly.
I tested extensively with different characters, poses and angles and it did not work.
hearmeman is in this server, maybe he can help? @Hearmeman
If anyone can provide me with a step by step guidance (pod`s terminal commands included) in configuring a template that was proven to work recently, please send me, the goal is realisitc AI videos and images with a consistent character.
thisone works for me (sdxl base)

quite good

that is the model I used.
Can you?
i installed comfyui_ipadapter_plus custom nodes
@yumb
Use my SDXL template
It comes with all the pre requisites for IP adapter.
You’re using a WAN template
Can you send me the link?
So, how do I open this workflow on the template? I just got started last week with this 😆 , is it through the left menu sidebar?
drag in the file you download to your comfyui
Thanks, will have a look.
if you can't do it use the menus'

and thats for img2img only right? Would you mind sending me two for text2img and img2vid?
text2img?
what do you mean, ipadapter has always been having at least 1 image as an input isnt it
you can find workflows in like youtube, or other workflow sharing sites
yes, but im also looking for text2img generation, I dont think thats related to ipadapter.
its text2img what i've used
i dont understand, how would that work with ip adapter??
I am not saying it is going to work with it, it is gonna be a separate workflow just for text2img.
I want three workflows, text2img img2img and img2vid
oh, just remove the ip adapter related nodes then
im looking for these
whats preventing you from doing it yourself?
yeaa just try
removing nodes by right clicking and press delete
I tried, but as you could see, did not have a lot of success.
to vid workflows i think needs other models (to convert to video )
figure out what you want to use then maybe i can help (not sure)
So I just dropped the file you uploaded here, I proceeded to upload my image, change the postiive prompt and I hit enter, I indeed got a great image very similar to the one you sent related to the magician, but not as realistic I was hoping for, even though I am using a realistic checkpoint (realistic5 from civitai), by the way did you notice anything wrong on the workflow screenshot I uploaded to this discussion?
If you did not check, could you please?
I typed another prompt cute woman running, I uploaded the blonde girl image and instead I got a similar image, no girl running.

maybe your prompt?
try other ipadapter models
i think ipadapter should match the style of the source image, so not sure
im not quite expert at this yet, so try to find resources online