Advice needed: How to synchronize the video and audio segments duration in adaptive streaming

I still have a problem understanding how to synchronize the segments (and partial segments) duration when using membrane_http_adaptive_stream library. As per HLS guidelines, for smooth playback experience on Apple devices we need each playlist's segment to be of the same duration. What I'm struggle to understand is to how to achieve this behaviour. My previous confusion was resolved with keyframes sent at the right interval (GOP size). But now I have different problem: the video track follows the segment_duration and partial_segment_duration parameters provided perfetcly, but my audio track isn't! Here is my configuration: for audio:
|> via_in(Pad.ref(:input, :audio),
options: [
encoding: :AAC,
segment_duration: segment_duration(),
partial_segment_duration: fragment_duration()
]
|> via_in(Pad.ref(:input, :audio),
options: [
encoding: :AAC,
segment_duration: segment_duration(),
partial_segment_duration: fragment_duration()
]
for video:
build_sink_input_opts(name, nil) ++
[{:partial_segment_duration, fragment_duration()}]
end

defp build_sink_input_opts(name, _) do
[
encoding: :H264,
track_name: name,
segment_duration: segment_duration(),
max_framerate: 24
]
end
build_sink_input_opts(name, nil) ++
[{:partial_segment_duration, fragment_duration()}]
end

defp build_sink_input_opts(name, _) do
[
encoding: :H264,
track_name: name,
segment_duration: segment_duration(),
max_framerate: 24
]
end
And the corresponding values:
defp segment_duration do
Membrane.Time.milliseconds(2000)
end

defp fragment_duration do
Membrane.Time.milliseconds(500)
end
defp segment_duration do
Membrane.Time.milliseconds(2000)
end

defp fragment_duration do
Membrane.Time.milliseconds(500)
end
I'll attach the playlists in the next message. What could be the issue? Thank you in advance.
22 Replies
varsill
varsill4mo ago
Hi @odingrail ! It looks quite surprising, that mismatch of 325ms between the desired and an actual segment length is quite huge. What is the sampling frequency of your AAC stream? (it affects how much "time" does AAC packet represents, as each packet encodes 1024 samples)
odingrail
odingrailOP4mo ago
Hi, @varsill ! I was experimenting with this value, setting it in 960 and 1024, the results are pretty much the same. For context, this is an RTSP stream. I go with following command:
ffmpeg -re -stream_loop -1 -i ~/example/bunny_movie.mp4 \
-c:v libx264 \
-preset ultrafast \
-tune zerolatency \
-r 25 \
-vsync cfr \
-x264-params "scenecut=0:open_gop=0:keyint=48:min-keyint=48" \
-force_key_frames "expr:gte(t,n_forced*2)" \
-g 48 \
-b:v 2000k \
-maxrate 2000k \
-bufsize 4000k \
-c:a aac \
-b:a 128k \
-ar 48000 \
-ac 2 \
-af "aresample=48000:async=1:min_hard_comp=0.1:first_pts=0,asetpts=PTS-STARTPTS" \
-fflags +genpts \
-audio_preload 0.5 \
-fflags +genpts+igndts \
-avoid_negative_ts make_zero \
-f rtsp \
-rtsp_transport udp \
-timeout 5000000 \
'rtsp://localhost:8554/app/65649836-a978-481c-aca6-47928e6e7c02'
ffmpeg -re -stream_loop -1 -i ~/example/bunny_movie.mp4 \
-c:v libx264 \
-preset ultrafast \
-tune zerolatency \
-r 25 \
-vsync cfr \
-x264-params "scenecut=0:open_gop=0:keyint=48:min-keyint=48" \
-force_key_frames "expr:gte(t,n_forced*2)" \
-g 48 \
-b:v 2000k \
-maxrate 2000k \
-bufsize 4000k \
-c:a aac \
-b:a 128k \
-ar 48000 \
-ac 2 \
-af "aresample=48000:async=1:min_hard_comp=0.1:first_pts=0,asetpts=PTS-STARTPTS" \
-fflags +genpts \
-audio_preload 0.5 \
-fflags +genpts+igndts \
-avoid_negative_ts make_zero \
-f rtsp \
-rtsp_transport udp \
-timeout 5000000 \
'rtsp://localhost:8554/app/65649836-a978-481c-aca6-47928e6e7c02'
varsill
varsill4mo ago
I see, and could you show the part of the pipeline responsible for processing audio? (listing all the elements/bins from the RTSP source to the HLS SinkBin should be enough 😉 )
odingrail
odingrailOP4mo ago
Pipeline is looks something like this, I hope I don't miss something.
odingrail
odingrailOP4mo ago
get_child(source_element_name)
|> via_out(Pad.ref(:output, ssrc),
options: [encoding: :AAC, depayloader: %Membrane.RTP.AAC.Depayloader{mode: :hbr}]
)
|> child(:audio_depayloader_tee, Membrane.Tee.Parallel)
get_child(source_element_name)
|> via_out(Pad.ref(:output, ssrc),
options: [encoding: :AAC, depayloader: %Membrane.RTP.AAC.Depayloader{mode: :hbr}]
)
|> child(:audio_depayloader_tee, Membrane.Tee.Parallel)
varsill
varsill4mo ago
And have you considered using muxed_av mode (you can configure it with hls_mode option of the SinkBin: https://hexdocs.pm/membrane_http_adaptive_stream_plugin/Membrane.HTTPAdaptiveStream.SinkBin.html) ? In this mode they both audio and track samples will be put in the same .m4s segments so you won't need to worry about their synchronization.
odingrail
odingrailOP4mo ago
No, I doesn't considered it. But what could be the issue with this AAC case? More importantly, what is the best practicies to ensure the tracks synchronization?
varsill
varsill4mo ago
The problem with the :separate_av tracks is that there is no mechanism implemented in the Sink (nor SinkBin) that ensures the target duration for both tracks matches - each track is treated more or less independently. Each segment is built out of that many partials which total duration is equal or higher than the segment_duration. At the same time, each partial is made up of some number of AAC packets (which duration depends on the sampling frequency, for 44.1kHz it's around 23ms each). In case these two don't perfectly align, you might end up with the segment which length exceeds the segment_duration. One potential solution is to choose slightly lower segment_duration to make up for these alignment mismatches. You could try to reduce the desired segment_duration to e.g. 1.8s and it should create a segment of 1.834s duration. Then the TARGET-DURATION tag (which value is the ceil of the duration of the longest segment in the playlist) would become 2s. The easiest way to ensure the synchronization between the tracks is to do it at the container level, so I definitelly recomend trying hls_mode: :muxed_av option in case nothing is preventing you from doing so.
odingrail
odingrailOP4mo ago
Thank you for advice @Łukasz Kita !
varsill
varsill4mo ago
You are welcome! Let me know if the muxed_av mode helps in your case.
odingrail
odingrailOP3mo ago
Hey, @varsill ! I,m in the process of migrating to the :muxed_av. I noticed the issue with multiple cmaf_muxers trying to link to the same one audio pad. With :muxed_av if I have multiple video qualities I'm forced to have N audio pads, right?
varsill
varsill3mo ago
Hi @odingrail ! SinkBin should take care of spawning :audio_tee under the hood in :muxed_av mode to distribute the audio tracks to the muxers corresponding to appropriate video variants. In fact, it explicitly disallows connecting multiple audio pads in :muxed_av mode: https://github.com/membraneframework/membrane_http_adaptive_stream_plugin/blob/9f0237c2325c4ae4a902e0ba5c6a6acedaf66737/lib/membrane_http_adaptive_stream/sink_bin.ex#L235 . But perhaps you came across a bug - could you provide some logs with the stacktrace of the error?
odingrail
odingrailOP3mo ago
Here is the stacktrace. It looks like CMAF muxer linking issue.
odingrail
odingrailOP3mo ago
We are using membrane_http_adaptive_stream_plugin 0.18.7 I think the library is trying to link the audio pad more than once. It works when I have only one video quality enabled. It also works in :separate_av mode. This issue appears only when I have multiple video qualities. The actual pad that is being mentioned in exception is the :audio Pad, as I moved my dbg into the library code.
varsill
varsill3mo ago
Hmm it looks that indeed there is a bug in theSinkBin - could you try changing the membrane_http_adaptive_stream_plugin dependency in mix.exs to a bugfix branch: {:membrane_http_adaptive_stream_plugin, github: "membraneframework/membrane_http_adaptive_stream_plugin", branch: "varsill/fix_pad_refs"} and see if it works?
odingrail
odingrailOP3mo ago
It is working for me now, without any changes to my pipeline! Thanks for the fix @varsill
varsill
varsill3mo ago
Great! I think one other thing could have been bugged so please update your dependency to the GitHub branch once again. Another thing is that soon we should release membrane_http_adaptive_stream_plugin v0.18.8.
odingrail
odingrailOP3mo ago
I still got the following issues:
"messages" : [
{
"errorComment" : "The sum of the partial segment durations must match the parent segment duration",
"errorDomain" : "ValidatorErrorDomain",
"errorStatusCode" : -50066,
"errorRequirementLevel" : 1,
"errorDetail" : "Segment duration: 2.0030, Sum of partials: 2.0050"
}
"messages" : [
{
"errorComment" : "The sum of the partial segment durations must match the parent segment duration",
"errorDomain" : "ValidatorErrorDomain",
"errorStatusCode" : -50066,
"errorRequirementLevel" : 1,
"errorDetail" : "Segment duration: 2.0030, Sum of partials: 2.0050"
}
I use the 2 seconds segment and 0.5 seconds partial segments duration. I'm also constantly getting the inconsistent segments produced:
muxed_segment_21_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:09.822Z
#EXTINF:2.00266665,
muxed_segment_22_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:11.754Z
#EXTINF:2.518666645,
muxed_segment_23_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:14.247Z
#EXTINF:2.518666645,
muxed_segment_24_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:17.017Z
#EXTINF:2.00266665,
muxed_segment_25_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:18.961Z
#EXTINF:2.435999979,
muxed_segment_26_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:21.359Z
#EXTINF:2.00266665,
muxed_segment_21_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:09.822Z
#EXTINF:2.00266665,
muxed_segment_22_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:11.754Z
#EXTINF:2.518666645,
muxed_segment_23_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:14.247Z
#EXTINF:2.518666645,
muxed_segment_24_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:17.017Z
#EXTINF:2.00266665,
muxed_segment_25_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:18.961Z
#EXTINF:2.435999979,
muxed_segment_26_source_quality.m4s
#EXT-X-PROGRAM-DATE-TIME:2025-07-14T13:25:21.359Z
#EXTINF:2.00266665,
With 3 seconds segment and 1 second partial segment it almost there, but mediastreamvalidator is still angry: Error: The sum of the partial segment durations must match the parent segment duration --> Detail: Segment duration: 2.9830, Sum of partials: 2.9840 --> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_0_source_quality.m4s --> Detail: Segment duration: 2.0030, Sum of partials: 2.0040 --> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_1_source_quality.m4s --> Detail: Segment duration: 2.0030, Sum of partials: 2.0040 --> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_2_source_quality.m4s --> Detail: Segment duration: 2.0030, Sum of partials: 2.0040 --> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_3_source_quality.m4s --> Detail: Segment duration: 2.9940, Sum of partials: 2.9950 --> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_4_source_quality.m4s -------------------------------------------------------------------------------- CRITICAL MUST fix issues -------------------------------------------------------------------------------- Critical: #EXT-X-PART Partial Segment duration 1.002000 exceeds PART-TARGET --> Source: http://localhost:4000/hls-video/2f04f52c-3040-4a05-a967-bfbe8babe5fc/source_quality.m3u8
varsill
varsill3mo ago
Hi! That sum of partials / segment duration mismatch seems to be some kind of a rounding error. Does it cause trouble with the playback? Anyway, I will check that. When it comes to inconsistent segments produced, I would say that it's expected that segment duration differs - the segment_duration option specifies the "minimal segment duration of the regular segments" - if, for example, keyframes are not aligned with that segment_duration, we will end up with longer segments and their distribution will be based on the keyframes distribution. The most worrying error is the one with the partial segment duration exceeding PART-TARGET, I will try to find what's causing that.
odingrail
odingrailOP3mo ago
The partial segment duration exceeding PART-TARGET seems to be the case only when I have following config: the segment duration is 2 seconds and partial segment duration is 1 second. With 3 seconds segments and 1 second partial segment it seems dissapear
varsill
varsill3mo ago
I see, I will try to reproduce that bug with the configuration you mentioned. However, with your stream's characteristics (I assume that your video has 24FPS and a keyframe every 2 seconds, and each AAC frame is of 1024/48000=0.021s duration) I would go for segment_duration: 1.8s and partial_segment_duration: 0.5s.

Did you find this page helpful?