R
RunPod•4mo ago
ashleyk

24GB PRO availability in RO

I switched from 24GB tier in RO to 24GB PRO to benefit from the higher availability of the 4090's in RO, but most of my workers are becoming throttled again.
No description
22 Replies
flash-singh
flash-singh•4mo ago
i would mix them, 4090s get relative high spikes
ashleyk
ashleyk•4mo ago
I've never seen that priority thing actually working ever though Even if all my workers become throttled, it doesn't initialize the 2nd choice, they just stay throttled
flash-singh
flash-singh•4mo ago
what it will do is pick gpu that is available and split between the two based on availability
justin
justin•4mo ago
just wondering, i am trying to make a new endpoint and instantly all 5 workers are throttled before initialization, so i had to add new endpoints and hopefully i can get some unthrottled to just initialize. but why would it state high avaliability if i just immediately get throttled on initialization? what does high avaliability mean then?
No description
JM
JM•4mo ago
@justin how long are you waiting after setting up an endpoint? Initial setup does take a decent amount of time. Also, are you using 10+ max workers?
justin
justin•4mo ago
Been a weird situation, ive been launching endpoints but when it hits idle, and i send a request, it just starts downloading again, so ive been deleting it thinking maybe I need to wait for all my docker pushes iterations in the bg to settle down, maybe conflicting hashes are causing redownloads. https://discord.com/channels/912829806415085598/1208257003131113502 Usually i wait for about 10-20 mins in the bg right now, and see if it works, trying to solve a bug right now that is causing my worker to work on gpu pod, but somethign about it crashing on serverless. And no, im just at 3 max workers, so it spins up 5 potential workers I dont want to spin up 10+ max workers, cause i dont have enough limits to waste workers like that But yeah to answer this usually about 10-20 mins, I see if it switched to idle states from an initializing state
JM
JM•4mo ago
@justin Use 10, I give you full permission 😊
justin
justin•4mo ago
can i get an upgrade on worker limits at some point haha, but ok
JM
JM•4mo ago
Personally, I like putting 10, with 1-2 active workers, for the initial setup
justin
justin•4mo ago
i see why does that change? is it just to capture some good gpus to initialize?
JM
JM•4mo ago
Then, send some requests, check if those processed, then if so, remove the active
justin
justin•4mo ago
ah got it good to know huh
JM
JM•4mo ago
Simply my own opinion of an efficient way of checking a new endpoint, I am far from being an expert though, don't get me wrong haha What's your endpoint ID? I can check it out
justin
justin•4mo ago
AH it finally works nah its all good xD i just ended up increasing things to not just be 4090s I guess the thing i had before was i only had it on the 24 GB PRO / 4090 cause it said high avaliability and i didnt wanna run into like a out of memory but what fixed it just now for me was just extending the options
JM
JM•4mo ago
Well, even just 24gb pro should work
justin
justin•4mo ago
interesting
JM
JM•4mo ago
But use more max workers, trust me!
justin
justin•4mo ago
ok haha i guess im just running out of workesr as i deploy more 😭
JM
JM•4mo ago
If you activate flash boot, it doesn't work very well for small max workers
justin
justin•4mo ago
but good to know
JM
JM•4mo ago
It gets exponentially better with more workers Give me ID, I will give you more 😂
flash-singh
flash-singh•4mo ago
our 4090s in eu-ro come in 2x or 3x servers, they fill up easily and cause throttle 8x servers are better but sadly 8x 4090 servers are not easy to come by