Martin Geddes

Hypervoice and the "smart pipe" minefield

One of the key questions surrounding hypervoice is how to transport the voice media. Hypervoice is complex, since it requires notonly reliable real-time audio (just like a normal phone or conference call) but also wants to share pristine recorded audio; furthermore, you want to do complex signalling and integration of all the digital gestures and responses (e.g. taking notes, sharing slides).

The pressing concern is: what are the right tools for the job of hypervoice media transport, both right now, as well as in the near future? In particular, what is the role of the network operator in a hypervoice world? Does hypervoice mean that telcos finally have a good excuse for a “smart pipe” strategy?

As we shall see, there is a role for network operators, but the “smart pipe” framing is unhelpful since it means radically different things to different people. It would be good to keep an open mind about delivery mechanisms in the face of high uncertainty on how the hypervoice technology market will evolve.

For background file transfer, TCP/IP over the public Internet is perfectly adequate for most purposes. However, reliable real-time communications has traditionally been associated with dedicated voice networks. We clearly need future broadband networks to also deliver voice reliably, just as the old telephony network has done.

The means for delivering reliable voice on broadband vary. One approach is the "traditional" telco way, which is to re-create old fashioned circuits on top of IP. This is inefficient, as you end up reserving bandwidth on your network for "empty" lanes using complex protocols, and your application becomes enmeshed with interdependencies on the network signalling systems. Furthermore, you often miss the efficiency gain of statistical multiplexing. However, despite these drawbacks, it is what is happening for Unified Communications and VoLTE (voice on 4G). Some people would call this a "smart pipe", some not.

Another approach is the "polyservice network", which I am advocating, but which is a very very new idea, and few people know about, and even fewer really understand. This takes a different approach, trading loss and delay across flows to get the desired outcomes. Some would call this "smart pipe" and I would want to discourage them.

The PSTN is really a "dumb (bit)pipe" as it just relays a bitstream. Yet some people think it was a "smart pipe" because of things like call forwarding and audio processing. The "smart pipe" is also historically about preserving the flow of money via the call termination fee regime; in contrast the Internet is “dumb” about money as well as delivery. So it's a complex and confusing area!

My recommendation is to avoid the “smart pipe” term entirely, and position the delivery problem differently. The smart vs. dumb pipe is a false dichotomy. Why so?

Telcos are always going to be in the business of making reliable voice delivery work. The Internet, as currently constructed, is fundamentally unsuited to the task of delivering all of society's real-time communications needs as an “over the top” application. The Internet is by design reliably unreliable, because it can’t sufficiently isolate the flows, and this is not going to change. In the next decade it won't replace the PSTN, or be used for mass business voice use (e.g. contact centres or unified communications). That said, we will use the Internet for all kinds of voice services, many highly innovative and useful, but it will just be one of many mechanisms we use.

*The hunt for a "smart pipe" is flawed from the very beginning, because networks are not pipes in any sense. *This mistake matters: they are statistically multiplexed systems with complex emergent behaviours. By birthright telcos have a role in configuring that multiplexing to isolate voice flows from other flows, so that they can be delivered reliably without experiencing excessive contention.

Hypervoice is naturally aligned with the Internet, since that is what you would typically use to transfer recorded audio and associated metadata, as well as deliver the presentation layer of any cloud services. However, that doesn't mean you need to deliver the original real-time audio over the Internet, nor does it mean you need to re-design the Internet to be a reliable voice transport.

In the long term (10+ years) we can expect new internet architectures that do offer reliable voice delivery. We can also expect technologies like WebRTC and its successors to bring native voice capability into all Web browsers. Furthermore, we can expect browsers to become as adept at handling hypervoice conversations as hypertext documents.

That future technology territory remains to be explored and discovered. We’ve barely begun to consider the nature of demand for new rich hypermedia delivery; jumping to conclusions about the nature of supply is premature.