Design III: Liquidsoap tomorrow?
12/17/2010 03:16:00 PM
In the previous post we've seen the problems with the current source at the core of liquidsoap. The point of doing so was to highlight the interest of a more principled design.
We could start with a mathematical notion of stream, but we haven't reached that point yet. As I've illustrated before, sources are inherently interactive objects, which is hard to account for.
Instead, I'll keep the same practical notion of source but do one simplification: we'll work sample by sample. We'll have a
As before, a source is passive by default (it doesn't produce data when not asked to) and caching mechanisms ensure that a source gives consistent answers to several observers. Depending on how/when it is used, a source will generate a stream, possibly undefined at some instants, but otherwise containing samples, metadata and end of track markers. Note that we need more precision than just time to denote a point in such a stream, in order to be able to tell the position relative to end of tracks -- it is possible to have several consecutive end of tracks exactly at the same point in time.
Let us detail caching, which isn't totally trivial. What do we cache: a number N of end of tracks, corresponding to partial
In the current liquidsoap, this M is passed as part of the frame, but there is no way to track which source has produced which end of track. Hence it is possible that M>N, for example in case of a switch which has seen many empty tracks before switching to that source. Even when M<N, it makes it unclear how many end of tracks we should add. I'm not sure whether it is crucial that we keep precise track of who produces which end of track.
What matters is (1) if you listen continuously to a stream, you get exactly all its content and (2) we don't have the first problem with sharing. This is because various operators of a source can only get data at one instant. If I pump it, the source will generate (or find in its cache) an end of track or a sample. It's as simple as this, there's no bad side-effect here. We don't even need to worry about sharing detection, because we can perform caching all the time without changing the stream content -- efficiency isn't an issue for now. (At some point I was worried that sub-instantaneous data is in the cache, i.e. end of tracks, could lead to the same problem as before, but it does not make sense because this data can't be refered to without first generating it: you can't pump a source "after" it has generated end of tracks, you can only pump it, ignore end of tracks, and pump it again.)
For our second problem, we would simply cache the result of
Some minor remarks:
This new model is not revolutionary. But we gained more than it seems. For example, we can now write an operator that takes two sources and sums them, only producing data when both of them are ready, and not pumping any of them when none is ready. The reason is that in the old frame-based design, we can only get data until either the end of a frame or the end of a track. But if one source ends a track, the other will still fill a frame.
The real challenge now is to derive an efficient frame-based model that faithfully reflects the simply sample-based model.
I'll start with the biggest problem: sharing. We can't approximate it anymore, or we'll obtain the same problem. Hence I'm going to propose a two-phase pumping: In a first abstract phase we do a dry run of the pumping methods to know who pumps who at what instant. Then we have a precise picture of the sharing and we can do a real run, computing the actual stream content.
This method forbids some source behaviors: for example if a source acts differently based on the actual values of its input samples, it cannot meaningfully take part to the first phase. That would be a problem with a blank detection source. I believe we can relax this enough to get what liquidsoap currently does. The key is to declare some values as "sparse" or "slow", allowing the change of the value (and hence the reaction of the source) to only take place at the end of the frame. For example blank detection would set a flag that only takes effect for the next frame.
For most sources I want to automatically generate the
Now, I believe I can run that simulation thing. The nice thing is that once I bite the bullet and decide we need this, there's no question about being more modest in other places. In particular, the new
There are several things we can do now:
I gave a rather precise proposal of what Step 1 could look like. Still we can discuss some points, try to generalize a few things. My proposition doesn't look as clean as I would like in some places, but it has one great advantage: I'm sure we can program liquidsoap sources with that model, and even do some things currently impossible.
Step 2 presents even more design choices: should users write sources in an OO style like we do? should they work with an explicit notion of time? should time be discrete or continuous?
We could start with a mathematical notion of stream, but we haven't reached that point yet. As I've illustrated before, sources are inherently interactive objects, which is hard to account for.
Instead, I'll keep the same practical notion of source but do one simplification: we'll work sample by sample. We'll have a
#is_ready
method that tells whether the source can produce a sample for the current instant. A source is always ready when it is in the middle of a track. And we have the #get
method which returns at most one sample, none when the current track ends, and should never be called unless the source #is_ready
.As before, a source is passive by default (it doesn't produce data when not asked to) and caching mechanisms ensure that a source gives consistent answers to several observers. Depending on how/when it is used, a source will generate a stream, possibly undefined at some instants, but otherwise containing samples, metadata and end of track markers. Note that we need more precision than just time to denote a point in such a stream, in order to be able to tell the position relative to end of tracks -- it is possible to have several consecutive end of tracks exactly at the same point in time.
Let us detail caching, which isn't totally trivial. What do we cache: a number N of end of tracks, corresponding to partial
#get
, possibly followed by a sample. How do we use that cache: for an operator which already had M end of tracks and performs a #get
, we return an end of track if M<N, the sample otherwise.In the current liquidsoap, this M is passed as part of the frame, but there is no way to track which source has produced which end of track. Hence it is possible that M>N, for example in case of a switch which has seen many empty tracks before switching to that source. Even when M<N, it makes it unclear how many end of tracks we should add. I'm not sure whether it is crucial that we keep precise track of who produces which end of track.
What matters is (1) if you listen continuously to a stream, you get exactly all its content and (2) we don't have the first problem with sharing. This is because various operators of a source can only get data at one instant. If I pump it, the source will generate (or find in its cache) an end of track or a sample. It's as simple as this, there's no bad side-effect here. We don't even need to worry about sharing detection, because we can perform caching all the time without changing the stream content -- efficiency isn't an issue for now. (At some point I was worried that sub-instantaneous data is in the cache, i.e. end of tracks, could lead to the same problem as before, but it does not make sense because this data can't be refered to without first generating it: you can't pump a source "after" it has generated end of tracks, you can only pump it, ignore end of tracks, and pump it again.)
For our second problem, we would simply cache the result of
#is_ready
(rather, rely on the same cache). As with #get
, we need to know the number of end of tracks already obtained by the observer to tell him if the source is ready at that precise point.Some minor remarks:
- We can't distinguish a source that is in the middle or just before the beginning of a track. If you don't listen to a source for a while, and the source is ready when you come back to it, there's no way to know if it is about to start something new or if you missed the beginning of the current track. It's never been a problem, I'll assume we can keep living with it.
- The implementation of end-of-track notification is still a "partial"
#get
which may be too ad-hoc. The alternative is to attach an end-of-track tag to the last sample. Looks like a meaningless choice to me. - Note that metadata and end-of-tracks are treated very differently: we never issue a partial
#get
because of metadata. It corresponds to a view where, end of tracks are "after" the last sample, while metadata is "before" or maybe "inside" samples. We should try to generalize this into a nice notion of event, some being interruptions, some not. - With Samuel, we have explored other design choices. What I present here isn't his favorite, but it's the most convincing to me. It also has the advantage of being fully defined (at least in my mind).
This new model is not revolutionary. But we gained more than it seems. For example, we can now write an operator that takes two sources and sums them, only producing data when both of them are ready, and not pumping any of them when none is ready. The reason is that in the old frame-based design, we can only get data until either the end of a frame or the end of a track. But if one source ends a track, the other will still fill a frame.
Lifting to frames
The real challenge now is to derive an efficient frame-based model that faithfully reflects the simply sample-based model.
I'll start with the biggest problem: sharing. We can't approximate it anymore, or we'll obtain the same problem. Hence I'm going to propose a two-phase pumping: In a first abstract phase we do a dry run of the pumping methods to know who pumps who at what instant. Then we have a precise picture of the sharing and we can do a real run, computing the actual stream content.
This method forbids some source behaviors: for example if a source acts differently based on the actual values of its input samples, it cannot meaningfully take part to the first phase. That would be a problem with a blank detection source. I believe we can relax this enough to get what liquidsoap currently does. The key is to declare some values as "sparse" or "slow", allowing the change of the value (and hence the reaction of the source) to only take place at the end of the frame. For example blank detection would set a flag that only takes effect for the next frame.
For most sources I want to automatically generate the
#simulated_get
method from a simple #get
. This is why compilation is needed now.Now, I believe I can run that simulation thing. The nice thing is that once I bite the bullet and decide we need this, there's no question about being more modest in other places. In particular, the new
#is_ready
will be asked not only if a source is ready at a particular point but for how long. This enables a nice frame-based implementation of the precise sum operator described above. Of course, it goes with a #get
that takes not only a start but also a stop position.Plans
There are several things we can do now:
- Compile frame-level sources from sample-level sources written in OCaml. This is only about analysing the source, and it does not need be user-friendly. There are a couple tricky analysis to run. For example, a sine source will have a counter decremented at each sample and reset after an end of track, and we'll have to turn this into the natural code for the frame-level sine.
- Extend liquidsoap scripts with a notation for sources and compile it down to OCaml objects. We'll have to ensure that the user never sees a type error coming from OCaml, but only errors from us, which should be more readable. The main extra challenge is to run liquidsoap code fast. Some of it can be compiled directly to OCaml but function calls won't match exactly. We'll have to inline or leave some interpreter calls.
- Do cross-source optimizations on top of the previous work. If the user writes a sum of sine sources, inline it into a source that computes directly the sums.
- Compile to C or even lower level code.
I gave a rather precise proposal of what Step 1 could look like. Still we can discuss some points, try to generalize a few things. My proposition doesn't look as clean as I would like in some places, but it has one great advantage: I'm sure we can program liquidsoap sources with that model, and even do some things currently impossible.
Step 2 presents even more design choices: should users write sources in an OO style like we do? should they work with an explicit notion of time? should time be discrete or continuous?
Libellés : design, liquidsoap