|

|
|
Cybertelecom
Federal Internet Law & Policy
An Educational Project |
|
VoIP Notes
|
Latency
NIST, Security Considerations for VoIP Systems 800-58
p. 16 (April 2004)
"Latency in VOIP refers to the time it takes for a
voice transmission to go from its source to its destination. Ideally,
we would like to keep latency as low as possible but there are
practical lower bounds on the delay of VOIP. The ITU-T Recommendation
G.114 [2] set forth a number of time constraints on one-way latency.
The upper bound is150 ms. for one-way traffic. This corresponds to the
current latency bound experienced in domestic calls across PSTN lines
in the continental United States [3]. For international calls, a delay
of up to 400 ms. was deemed tolerable [4], but since most of the added
time is spent routing and moving the data over long distances, we
consider here only the domestic case and assume our solutions are
upwards compatible in the international realm."
Jitter
NIST, Security Considerations for VoIP Systems 800-58
p. 17 (April 2004)
Jitter refers to non-uniform packet delays. It is
often caused by low bandwidth situations in VOIP and can be
exceptionally detrimental to the overall QoS. Variations in delays can
be more detrimental to QoS than the actual delays themselves [8].
Jitter can cause packets to arrive and be processed out of sequence.
RTP, the protocol used to transport voice media, is based on UDP so
packets out of order cannot be reassembled at the protocol level.
However, RTP allows applications to do the reordering using the
sequence number and timestamp fields. The overhead in reassembling
these packets is non-trivial, especially when dealing with the tight
time constraints of VOIP.
When jitter is high, packets arrive at their destination
in spurts. This situation is analogous to uniform road traffic coming
to a stoplight. As soon as the stoplight turns green (bandwidth opens
up), traffic races through in a clump. The general prescription to
control jitter at VOIP endpoints is the use of a buffer, but such a
buffer has to release its voice packets at least every 150 ms (usually
a lot sooner given the transport delay) so the variations in delay must
be bounded. The buffer implementation issue is compounded by the
uncertainty of whether a missing packet is simply delayed an
anomalously long amount of time, or is actually lost. If jitter is
particularly erratic, then the system cannot use past delay times as an
indicator for the status of a missing packet. This leaves the system
open to implementation specific behavior regarding such a packet.
Jitter can also be controlled at the nexuses of the VOIP
network by using routers, firewalls, and other network elements that
support QoS. These elements process and pass along time urgent traffic
like VOIP packets sooner than less urgent data packets. Unfortunately,
not all network components were designed with QoS in mind. An example
of a network element that does not implement this QoS demand is a
crypto-engine, which ignores Type of Service (ToS) bits in an IP header
and other indicators of packet urgency (see 8.7). Another method for
reducing delay variation is to pattern network traffic to diminish
jitter by making as efficient use of the bandwidth as possible.
Unfortunately, this constraint is at odds with some security measures
in VOIP. Chief among these is IPsec, whose processing requirements may
increase latency, thus limiting effective bandwidth and contributing to
jitter. Effective bandwidth is compromised when packets are expanded
with new headers. In normal IP traffic, this problem is negligible
since the change in the size of the packet is very small compared with
the packet size. Because VOIP uses very small packets, even a minimal
increase is important because the increase accrues across all the
packets, and VOIP sends a very high volume of these small packets.
The window of delivery for a VOIP packet is very small, so
it follows that the acceptable variation in packet delay is even
smaller. Thus, although we are concerned with security, the utmost care
must be given to assuring that delays in packet deliveries caused by
security devices are kept uniform throughout the traffic stream.
Implementing devices that support QoS and improving the efficiency of
bandwidth with header compression allows for more uniform packet delay
in a secured VOIP network.
Packet Loss
NIST, Security Considerations for VoIP Systems 800-58
p. 18 (April 2004)
VOIP is exceptionally intolerant of packet loss.
Packet loss can result from excess latency, where a group of packets
arrives late and must be discarded in favor of newer ones. It can also
be the result of jitter, that is, when a packet arrives after its
surrounding packets have been flushed from the buffer, it is useless.
VOIP-specific packet loss issues exist in addition to the packet loss
issues already associated with data networks; these are the cases where
a packet is not delivered at all. Compounding the packet loss problem
is VOIP’s reliance on RTP, which is based on the unreliable
UDP, and thus does not guarantee packet delivery. Unfortunately, the
time constraints do not allow for a reliable protocol such as TCP to be
used to deliver media. By the time a packet could be reported missing,
retransmitted, and received, the time constraints for QoS would be well
exceeded. The good news is that VOIP packets are very small, containing
a payload of only 10-50 bytes [5], which is approximately 12.5-62.5 ms,
with most implementations tending toward the shorter range. The loss of
such a minuscule amount of speech is not discernable or at least not
worthy of complaint for a human VOIP user. The bad news is these
packets are usually not lost in isolation. Bandwidth congestion and
other such causes of packet loss tend to affect all the packets being
delivered around the same time. So although the loss of one packet is
fairly inconsequential, probabilistically the loss of one packet means
the loss of several packets, which severely degrades the quality of
service in a VOIP network.
Lifeline Service
- 8 hour power back up is best practice
- Note that there is no requirement that customers use
phones that have power backup (in other words, all phones can be
cordless and go offline in poweroutage)
- Backup Power
Reemerges As Issue for Cable VoIP Service, Cablenews 10/10/03
Interconnection with PSTN SS7
Protocols
H.323 ITU
NIST, Security Considerations for VoIP Systems,
800-58 p. 22 (April 2004)
4 H.323
H.323 is the ITU specification for audio and video
communication across packetized networks. H.323 is actually an umbrella
standard, encompassing several other protocols, including H.225, H.245,
and others. It acts as a wrapper for a suite of media control
recommendations by the ITU. Each of these protocols has a specific role
in the call setup process, and all but one are made to dynamic ports.
Figure 4 provides an overview of the H.323 call setup process.
4.1 H.323 Architecture
An H.323 network is made up of several endpoints
(terminals), a gateway, and possibly a gatekeeper, Multipoint control
unit, and Back End Service. The gateway is often one of the main
components in H.323 systems. It serves for address resolution and
bandwidth control. The gateway serves as a bridge between the H.323
network and the outside world of (possibly) non-H.323 devices. This
includes SIP networks and traditional PSTN networks. This brokering can
add to delays in VOIP, and hence there has been a movement towards the
consolidation of at least the two major VOIP protocols [see 11]. A
Multipoint Control Unit is an optional element that facilitates
multipoint conferencing and other communications between more than two
endpoints. Gatekeepers are an optional but widely used component of a
VOIP network that perform several network optimization tasks [see 12].
If a gatekeeper is present, a Back End Service (BES) may exist to
maintain data about endpoints, including their permissions, services,
and configuration [13].
Generally, there are different types of H.323 calls
defined in the H.323 standard:
- Gatekeeper routed call with gatekeeper routed H.245
signaling
- Gatekeeper routed call with direct H.245 signaling
- Direct routed call with gatekeeper
- Direct routed call without gatekeeper
An H.323 VOIP session is initiated (depending on the
call model used) by either a TCP or a UDP (if RAS is the starting
point) connection with an H.225 signal. In the case of UDP this signal
contains the Registration Admission Status (RAS) protocol that
negotiates with the gatekeeper and obtains the address of the endpoint
it is attempting to contact. Then a Q.931-like” protocol
(still within the realm of H.225) is used to establish the call itself
and negotiate the addressing information for the H.245 signal. (This is
done via TCP; Q.931 actually encapsulates the H.225 Call Signaling
messages.) This setup next” procedure is common throughout
the H.323 progression where one protocol negotiates the configuration
of the next protocol used. In this case, it is necessary because H.245
has no standard port [7]. While H.225 simply negotiates the
establishment of a connection, H.245 establishes the channels that will
actually be used for media transfer. Once again, this is done over TCP.
In a time-urgent situation, the H.245 message can be embedded within
the H.225 message (H.245 tunneling), but the speed of a call setup is
usually a QoS issue that vendors and customers are willing to concede
for better call quality. H.323 also offers Fast Connect. Here, a call
may be setup using one roundtrip. The SETUP and the CONNECT messages
piggyback the necessary H.245 signaling elements.
H.245 must establish several properties of the VOIP
call. These include the audio codecs that will be used and the logical
channels for the transportation of media. The
OpenLogicalChannel” signal also brokers the RTP and RTCP
ports. Overall, 4 connections must be established because the logical
channels (RTP and RTCP) are only one direction. Each one-way pair must
also be on adjacent ports as well. After H.245 has established all the
properties of the VOIP call and the logical channels, the call may
begin.
The preceding described the complicated VOIP setup
process based on H.323, although the complexities have been somewhat
reduced with version 4 of H.323. The H.323 suite has different
protocols associated with more complex forms of communication including
H.332 (large conferences), H.450.1, H.450.2, and H.450.3 (supplementary
services), H.235 (security), and H.246 (interoperability with circuit
switched services) [14]. Authentication may also be performed at each
point in the call setup process using symmetric keys or some prior
shared secret [15]. The use of these extra protocols and/or security
measures adds to the complexity of the H.323 setup process. We shall
see that this complexity is paramount in the incompatibility of H.323
with firewalls and NATs.
Inter-Asterisk eXchange (IAXT)
Megaco/H.248
MGCP
|
|
SIP
NIST, Security Considerations for VoIP Systems, 800-58 p. 34 (April
2004)
SIP is the IETF specified protocol for
initiating a two-way communication session. It is considerably simpler
than H.323 [14][12] when simple calls are to be performed. SIP is text
based; thereby avoiding the ASN.1 associated parsing issues that exist
with the H.323 protocol suite, if S/MIME as part of the SIP inherent
security measures is not used. Also, SIP is an application level
protocol, that is, it exists independently from the protocol layer it
is transported across. It can be based in TCP, UDP, or a number of
different IP protocols. UDP may be used to decrease overhead and
increase speed and efficiency, or TCP may be used if SSL/TLS is
incorporated for security services. Unlike H.323, only one port is used
in SIP (note that H.323 may also be used in a way that uses only one
port – direct routed calls). The default value for this port
is 5060.
5.1 SIP Architecture
The architecture of a SIP network is different from
the H.323 structure. An SIP network is made up of end points, a proxy
and/or redirect server, location server, and registrar. A diagram is
provided in Figure 5. In the SIP model, a user is not bound to a
specific host (neither is this the case in H.323, gatekeeper provides
address resolution). The user initially reports their location to a
registrar, which may be integrated into a proxy or redirect server.
This information is in turn stored in the external location server.
Messages from endpoints must be routed through either
a proxy or redirect server. The proxy server intercepts messages from
endpoints or other services, inspects their To:” field,
contacts the location server to resolve the username into an address
and forwards the message along to the appropriate end point or another
server. Redirect servers perform the same resolution functionality, but
the onus is placed on the end points to perform the actual
transmission. That is, Redirect servers obtain the actual address of
the destination from the location server and return this information to
the original sender, which then must send its message directly to this
resolved address (similar to H.323 direct routed calls with
gatekeeper).
The SIP protocol itself is modeled on the three way
handshake method implemented in TCP (see Figure 6). We will consider
the setup here when a proxy server is used to mediate between
endpoints. The process is similar with a redirect server, but with the
extra step of returning the resolved address to the source endpoint.
During the setup process, communication details are negotiated between
the endpoints using Session Description Protocol (SDP), which contains
fields for the codec used, caller’s name, etc. If Bob wishes
to place a call to Alice he sends an INVITE request to the proxy server
containing SDP info for the session, which is then forwarded to
Alice’s client by Bob’s proxy, possibly via her
proxy server. Eventually, assuming Alice wants to talk to Bob, she will
send an OK” message back containing her call preferences in
SDP format. Then Bob will respond with an ACK”. SIP provides
for the ACK to contain SDP instead of the INVITE, so that an INVITE may
be seen without protocol specific information. After the ACK”
is received, the conversation may commence along the RTP / RTCP ports
previously agreed upon. Notice that all the traffic was transported
through one port in a simple (text) format, without any of the
complicated channel / port switching associated with H.323. Still, SIP
presents several challenges for firewalls and NAT. These difficulties
are discussed in the next section.
|
Protocols: Papers
- 12 K. Siddiqui, M. Kamran, S. Tajammul, Comparison of
H.323 and SIP for IP Telephony Signaling. In Proceedings of IEEE 4th
International Multioptics Conference, Lahore, Pakistan, Dec. 2001.
- VoIP
Protocols
- VoIP -
Voice Over Internet Protocol, SORG
|