R
RunPod3mo ago
ACiDGRiM

server less capability check

I want to add runpod into a tier of load balanced llm models behind an app like openrouter.ai, but the decision will occur in our infrastructure. When i invoke a server less instance with my app and a task is completed, how am I billed for idle time if the container unloads the model from gpu memory? In other words I want to reduce costs and increase performance by only needing to load the model after an idle timeout, paying only for the small app footprint in storage/memory
Solution:
You are charged for the entire time the container is running including cold start time, execution time and idle timeout.
Jump to solution
4 Replies
Solution
ashleyk
ashleyk3mo ago
You are charged for the entire time the container is running including cold start time, execution time and idle timeout.
ACiDGRiM
ACiDGRiM3mo ago
I thought so. Do the containers have docker capabilities to create a wireguard interface?
ashleyk
ashleyk3mo ago
You can't access the underlying docker stuff on the host machine if that's what you're asking
ACiDGRiM
ACiDGRiM3mo ago
I don't mean the docker socket. I mean I want to create a VPN tunnel to my AWS tenant, rather than dealing with pki in the container