Topics

Runpod•4mo ago

H100 Replicate VS RunPod

Hi, When I use a Flux model on Replicate, generating 4 images takes about 30 seconds and costs $0.001525 on an H100. On the other hand, with RunPod, generating the same 4 images takes 60 seconds and costs a bit more. How can I achieve the same processing time and cost on RunPod as I do on Replicate? I prefer using RunPod because I have more control over the workflow. Thanks!

45 Replies

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

SimonOP•4mo ago

Actually, the workflow I created on RunPod is the same as on Replicate, but on RunPod it's very inconsistent, it can take anywhere from 45 to 60 seconds, with a per-second cost that's higher than on Replicate. But there's nothing different; I'm just using a custom LoRA + Flux.dev. I'm going to try the L40s to see if that helps, thanks.

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

SimonOP•4mo ago

I'm using https://github.com/runpod-workers/worker-comfyui the model is already in the container, there are juste the lora in the network storage

GitHub

GitHub - runpod-workers/worker-comfyui: ComfyUI as a serverless API...

ComfyUI as a serverless API on RunPod. Contribute to runpod-workers/worker-comfyui development by creating an account on GitHub.

flash-singh•4mo ago

we plan to have public endpoints for flux very soon, will support flux dev and schnell current times we get for 4090s is about 9 seconds per image right now its planned for end of this month, cost is about 50% of replicate

SimonOP•4mo ago

but for flux.dev with a custom lora ?

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

we don't plan to support custom loras yet, thats a maybe for future

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

the fastest is using H100s which is about 4 seconds but then cost goes up

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

yes we plan to eventually have pro, wont be end of this month, for now its flux dev and schnell, others will come in july most likely, july is also when we plan to do video models

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

if you had the choice of faster image gen for higher cost or 2x slower image gen with half the cost, which would you pick?

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

sure thanks

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

@Jason which flux models you plan to use?

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

how fast is 1 image on replicate with flux dev?

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

from what @Simon mentioned, should be about 7-8 seconds

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

thats likely using go_fast which uses flux schnell

We provide a go_fast flag within the API which toggles a version of flux-schnell optimized for inference. Currently this version is a compiled fp8 quantization with an optimized attention kernel. We’ll update the model and this documentation as we develop further enhancements.

We provide a go_fast flag within the API which toggles a version of flux-schnell optimized for inference. Currently this version is a compiled fp8 quantization with an optimized attention kernel. We’ll update the model and this documentation as we develop further enhancements.

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

if they say it, otherwise no clue, so far best times for flux dev on h100 are 3-4s, if someone can do it faster on A100, ill be skeptical

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

schnell on H100 can do 1s for sure, dev is slower

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

flash-singh•4mo ago

just tried fal, its 2.89 seconds with 1024x1024 28 steps

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

SimonOP•4mo ago

On replicate I train a model with ostris/flux-dev-trainer then 1 image is like 9 seconds 4 -> 30 secondes

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

SimonOP•4mo ago

30 * $0.001525 = 0.045ct what I see in replicate is Downloaded weights in 0.64s it's very fast Loaded LoRAs in 2.02s it's also fast I think they have a cache and then 100%|██████████| 28/28 [00:08<00:00, 3.40it/s] with an h100

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

SimonOP•4mo ago

yes it's very close

flash-singh•4mo ago

how big is the lora?

SimonOP•4mo ago

680mb

flash-singh•4mo ago

will have to explore loras in future, is that for flux dev or schnell?

SimonOP•4mo ago

Flux Dev. I'm spending $2,000 a month on Replicate, if I can get close to the same response time and price per second, I'm switching to RunPod.

flash-singh•4mo ago

are you passing loras using s3 bucket url or some other way?

Unknown User•4mo ago

Message Not Public

Sign In & Join Server To View

SimonOP•4mo ago

yes replicate store the lora directly in their server, i'm using https://replicate.com/ostris/flux-dev-lora-trainer/train and what I read is that they use the "fast-booting"

Fast booting fine-tunes

Sometimes, we’re able to optimize how a trained model is run so it boots fast. This works by using a common, shared pool of hardware running a base model. In these cases, we only ever charge you for the time the model is active and processing your requests, regardless of whether or not it’s public or private.

Fast booting fine-tunes are labeled as such in the model’s version list. You can also see which versions support the creation of fast booting models when training.

Fast booting fine-tunes

Sometimes, we’re able to optimize how a trained model is run so it boots fast. This works by using a common, shared pool of hardware running a base model. In these cases, we only ever charge you for the time the model is active and processing your requests, regardless of whether or not it’s public or private.

Fast booting fine-tunes are labeled as such in the model’s version list. You can also see which versions support the creation of fast booting models when training.

ostris/flux-dev-lora-trainer – Replicate

Fine-tune FLUX.1-dev using ai-toolkit

Jassim•2mo ago

is it not slower putting models in the docker image ? im using network storage since i was told its faster that way

Unknown User•2mo ago

Message Not Public

Sign In & Join Server To View

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

19KMembers

View on Discord

Did you find this page helpful?