Advice needed: How to synchronize the video and audio segments duration in adaptive streaming
I still have a problem understanding how to synchronize the segments (and partial segments) duration when using membrane_http_adaptive_stream library.
As per HLS guidelines, for smooth playback experience on Apple devices we need each playlist's segment to be of the same duration. What I'm struggle to understand is to how to achieve this behaviour.
My previous confusion was resolved with keyframes sent at the right interval (GOP size). But now I have different problem: the video track follows the
segment_duration
and partial_segment_duration
parameters provided perfetcly, but my audio track isn't!
Here is my configuration:
for audio:
for video:
And the corresponding values:
I'll attach the playlists in the next message.
What could be the issue? Thank you in advance.22 Replies
Hi @odingrail !
It looks quite surprising, that mismatch of 325ms between the desired and an actual segment length is quite huge. What is the sampling frequency of your AAC stream? (it affects how much "time" does AAC packet represents, as each packet encodes 1024 samples)
Hi, @varsill ! I was experimenting with this value, setting it in 960 and 1024, the results are pretty much the same.
For context, this is an RTSP stream. I go with following command:
I see, and could you show the part of the pipeline responsible for processing audio? (listing all the elements/bins from the RTSP source to the HLS SinkBin should be enough 😉 )
Pipeline is looks something like this, I hope I don't miss something.
And have you considered using
muxed_av
mode (you can configure it with hls_mode
option of the SinkBin: https://hexdocs.pm/membrane_http_adaptive_stream_plugin/Membrane.HTTPAdaptiveStream.SinkBin.html) ? In this mode they both audio and track samples will be put in the same .m4s segments so you won't need to worry about their synchronization.No, I doesn't considered it. But what could be the issue with this AAC case? More importantly, what is the best practicies to ensure the tracks synchronization?
The problem with the
:separate_av
tracks is that there is no mechanism implemented in the Sink
(nor SinkBin
) that ensures the target duration for both tracks matches - each track is treated more or less independently.
Each segment is built out of that many partials which total duration is equal or higher than the segment_duration
. At the same time, each partial is made up of some number of AAC packets (which duration depends on the sampling frequency, for 44.1kHz it's around 23ms each). In case these two don't perfectly align, you might end up with the segment which length exceeds the segment_duration
.
One potential solution is to choose slightly lower segment_duration
to make up for these alignment mismatches. You could try to reduce the desired segment_duration
to e.g. 1.8s and it should create a segment of 1.834s duration.
Then the TARGET-DURATION
tag (which value is the ceil of the duration of the longest segment in the playlist) would become 2s.
The easiest way to ensure the synchronization between the tracks is to do it at the container level, so I definitelly recomend trying hls_mode: :muxed_av
option in case nothing is preventing you from doing so.Thank you for advice @Łukasz Kita !
You are welcome! Let me know if the
muxed_av
mode helps in your case.Hey, @varsill ! I,m in the process of migrating to the
:muxed_av
. I noticed the issue with multiple cmaf_muxers
trying to link to the same one audio pad. With :muxed_av
if I have multiple video qualities I'm forced to have N audio pads, right?Hi @odingrail !
SinkBin
should take care of spawning :audio_tee
under the hood in :muxed_av
mode to distribute the audio tracks to the muxers corresponding to appropriate video variants. In fact, it explicitly disallows connecting multiple audio pads in :muxed_av
mode: https://github.com/membraneframework/membrane_http_adaptive_stream_plugin/blob/9f0237c2325c4ae4a902e0ba5c6a6acedaf66737/lib/membrane_http_adaptive_stream/sink_bin.ex#L235 . But perhaps you came across a bug - could you provide some logs with the stacktrace of the error?Here is the stacktrace. It looks like CMAF muxer linking issue.
We are using
membrane_http_adaptive_stream_plugin 0.18.7
I think the library is trying to link the audio pad more than once.
It works when I have only one video quality enabled.
It also works in :separate_av
mode.
This issue appears only when I have multiple video qualities.
The actual pad that is being mentioned in exception is the :audio
Pad, as I moved my dbg
into the library code.Hmm it looks that indeed there is a bug in the
SinkBin
- could you try changing the membrane_http_adaptive_stream_plugin
dependency in mix.exs
to a bugfix branch: {:membrane_http_adaptive_stream_plugin, github: "membraneframework/membrane_http_adaptive_stream_plugin", branch: "varsill/fix_pad_refs"}
and see if it works?It is working for me now, without any changes to my pipeline!
Thanks for the fix @varsill
Great! I think one other thing could have been bugged so please update your dependency to the GitHub branch once again. Another thing is that soon we should release
membrane_http_adaptive_stream_plugin
v0.18.8.I still got the following issues:
I use the 2 seconds segment and 0.5 seconds partial segments duration.
I'm also constantly getting the inconsistent segments produced:
With 3 seconds segment and 1 second partial segment it almost there, but
mediastreamvalidator
is still angry:
Error: The sum of the partial segment durations must match the parent segment duration
--> Detail: Segment duration: 2.9830, Sum of partials: 2.9840
--> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_0_source_quality.m4s
--> Detail: Segment duration: 2.0030, Sum of partials: 2.0040
--> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_1_source_quality.m4s
--> Detail: Segment duration: 2.0030, Sum of partials: 2.0040
--> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_2_source_quality.m4s
--> Detail: Segment duration: 2.0030, Sum of partials: 2.0040
--> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_3_source_quality.m4s
--> Detail: Segment duration: 2.9940, Sum of partials: 2.9950
--> Source: http://localhost:4000/hls-video/daa4b2bf-1b14-4aae-b6a3-eacbde51199b/source_quality.m3u8 - b4ba812f-aa3c-41f8-8292-a2c0b458f48f/muxed_segment_4_source_quality.m4s
--------------------------------------------------------------------------------
CRITICAL MUST fix issues
--------------------------------------------------------------------------------
Critical: #EXT-X-PART Partial Segment duration 1.002000 exceeds PART-TARGET
--> Source: http://localhost:4000/hls-video/2f04f52c-3040-4a05-a967-bfbe8babe5fc/source_quality.m3u8Hi! That sum of partials / segment duration mismatch seems to be some kind of a rounding error. Does it cause trouble with the playback? Anyway, I will check that.
When it comes to inconsistent segments produced, I would say that it's expected that segment duration differs - the
segment_duration
option specifies the "minimal segment duration of the regular segments" - if, for example, keyframes are not aligned with that segment_duration
, we will end up with longer segments and their distribution will be based on the keyframes distribution.
The most worrying error is the one with the partial segment duration exceeding PART-TARGET
, I will try to find what's causing that.The partial segment duration exceeding
PART-TARGET
seems to be the case only when I have following config: the segment duration is 2 seconds and partial segment duration is 1 second.
With 3 seconds segments and 1 second partial segment it seems dissapearI see, I will try to reproduce that bug with the configuration you mentioned.
However, with your stream's characteristics (I assume that your video has 24FPS and a keyframe every 2 seconds, and each AAC frame is of
1024/48000=0.021s
duration) I would go for segment_duration: 1.8s
and partial_segment_duration: 0.5s
.