Main Tech Page

© 2017 DIGIFON

Dealing With Codec Delay in Live Interactive Sessions by Dave Immer

Introduction:

More and more producers and talent are hooking up via ISDN audio codecs, enabling live real-time creative activity and production to occur regardless of physical distance between the parties involved. But no matter how sophisticated and powerful the hardware and software that enables this exchange is, two issues will always need to be dealt with.

Issue # 1

There will always be a need for both sides to correctly route their local audio paths when either or both sides of the transmission are talking to the other via the codecs. This issue is really about a feedback loop that manifests itself as a "slap-back" effect, making it very confusing for the talent to talk/read/perform while listening to it. It grows out of the fact that each end needs to monitor and send audio using the same transmission path. How do we eliminate this annoying echo condition?

Generally, for those using a mixer to route the send and receive signals to and from the codec and other equipment (which is most users doing interactive production) the answer is: You need to create two separate and independent "mixes": one for yourself and one for the far end. Make sure you are not sending the far-end audio back to your local codec (unless requested to do so). This can be accomplished by using pre-fader auxiliary-sends only to feed your codec. Create your monitor mix (headphones or speakers), including the codec audio from the far-end, using the main faders (or gain pots) through your main mix out. Send only those sources that you want the far end to hear through the pre-fader aux-sends. The aux-send output should be connected to the codec line input. If you are dealing with stereo, then you will need a stereo aux bus (or two mono aux busses). An alternative to this concept would be using multi-track or submix busses in a pre-fader mode. Yet another approach could be using the main mix bus as the codec send and an aux/monitor bus as your monitor mix. Use the signal path with the least circuitry to feed your codec. The main thing to understand is that you need to create two separate and independent "mixes": one for yourself and one for the far end.

Issue # 2

Depending on which coding algorithm is employed, circuit mileage between both parties and the amount of routing latency, when recording live talent performing with a guide mix originating at the producers end, the real-time performance of the remote talent will arrive a little bit later than the producers local mix, making it sound "out of sync". How do we eliminate this time differential in the audio signal that gets monitored at the producers end?

First you need to determine the amount of delay for the round trip audio. This can be found by asking the far end to "fold back" your signal. The engineer at your side then taps a talk-back mic with a pencil or his finger and listens to both the folded-back return from the far end and the tapping routed through a digital delay locally. He then adjusts the digital delay so that the two signals "sync up". He then observes the delay amount in milliseconds on the DDL. Another way to determine the round-trip time using a hard-disk recorder would be to record your tap direct to one track and the folded-back tap to another track. Then look in the tracks window at the 2 waveforms and subtract the time of the local tap from the time of the fold-back tap.

To hear both near and far-end elements in sync, use a hard-disk recorder (or 2 tape machines SMPTE-locked and offset) and create a duplicate guide track for the talent only that gets slid to the "left" so that it plays earlier (by the total delay amount determined by using one of the above methods). The producer listens only to the original un-offset guide track (which will play slightly later than the talent's guide track). Then when a take is rehearsed or recorded and played back it is already in sync with the producers guide track.

When dealing with live performers on both ends, with todays codec processor speeds, usually only one performing side will be able to listen and react to the other: the producer and performer on the listen/react end will hear the wide area ensemble/duet in sync in real time. The performer on the other end will not hear the live performance of the opposite end. (For this reason it's best to put the primary rhythm creators at the other end. But the producer on the other end (in a sufficiently isolated control booth) could route the local performance through a digital delay by the round-trip amount and thereby also listen to the wide area ensemble. There are some situations where, due to the nature of the material, the flexibility of the performers, or the use of a predictive coding method, both sides could listen and react (albeit more or less late).

Keep in mind that while the time differential will not change over the course of an established connection, it will vary from one "call" to another. This means that if you hang up the calls and then re-establish a new connection, the delay amount will be different and you will have to determine the time differential anew using one of the above methods.

With no-delay linear audio over a fiber-optic link optimized for media, the only latency factor that remains is the propogation speed of the light signal, which is around 186,000 miles per second. Stated in a way that relates meaningfully to the scale of time we are dealing with, it can be restated as 186 miles per millisecond. The speed of sound at sea-level is roughly 1 foot per millisecond. So a 10ms delay between two musicians would be roughly equivalent to them standing 10 feet apart. An equivalent 10ms delay over the aforementioned fiber-optic link would allow the two musicians to be 1,860 miles apart.

Conclusion

When using ISDN codecs, it takes a little while to get used to accommodating mixing consoles at both ends plus working with time differentials imposed by coding delay, circuit distance and routing latency but once you get comfortable with the routine, these kind of live - interactive sessions will go much more smoothly.