SIP


Chapter 1. SIP



1.1. Purpose of SIP

SIP stands for Session Initiation Protocol. It is an application-layer control protocol which has been developed and designed within the IETF . The protocol has been designed with easy implementation, good scalability, and flexibility in mind.

RFCs , the most important one is RFC3261 which contains the core protocol specification. The protocol is used for creating, modifying, and terminating sessions with one or more participants. By sessions we understand a set of senders and receivers that communicate and the state kept in those senders and receivers during the communication. Examples of a session can include Internet telephone calls, distribution of multimedia, multimedia conferences, distributed computer games, etc.

SIP is not the only protocol that the communicating devices will need. It is not meant to be a general purpose protocol. Purpose of SIP is just to make the communication possible, the communication itself must be achieved by another means (and possibly another protocol). Two protocols that are most often used along with SIP are RTP and SDP . RTP protocol is used to carry the real-time multimedia data (including audio, video, and text), the protocol makes it possible to encode and split the data into packets and transport such packets over the Internet. Another important protocol is SDP

SIP has been designed in conformance with the Internet model. It is an end-to-end oriented signalling protocol which means, that all the logic is stored in end devices (except routing of SIP

SIP is a significant divergence from regular PSTN (Public Switched Telephone Network) where all the state and logic is stored in the network and end devices (telephones) are very primitive. Aim of SIP is to provide the same functionality that the traditional PSTN s have, but the end-to-end design makes SIP networks much more powerful and open to the implementation of new services that can be hardly implemented in the traditional PSTN

SIP is based on HTTP protocol. The HTTP protocol inherited format of message headers from RFC822 . HTTP is probably the most successful and widely used protocol in the Internet. It tries to combine the best of the both. In fact, HTTP can be classified as a signalling protocol too, because user agents use the protocol to tell a HTTP server in which documents they are interested in. SIP is used to carry the description of session parameters, the description is encoded into a document using SDP . Both protocols (HTTP and SIP ) have inherited encoding of message headers from RFC822 . The encoding has proven to be robust and flexible over the years.




1.2. SIP URI

SIP entities are identified using SIP URI (Uniform Resource Identifier). A SIP URI has form of sip:username@domain, for instance, sip:joe@company.com. As we can see, SIP URI consists of username part and domain name part delimited by @ (at) character. SIP URI s are similar to e-mail addresses, it is, for instance, possible to use the same URI for e-mail and SIP communication, such URI




1.3. SIP

SIP messages directly to each other, a typical SIP network will contain more than one type of SIP elements. Basic SIP

Note that the elements, as presented in this section, are often only logical entities. It is often profitable to co-locate them together, for instance, to increase the speed of processing, but that depends on a particular implementation and configuration.




1.3.1. User Agents

SIP to find each other and to negotiate a session characteristics are called user agents . User agents usually, but not necessarily, reside on a user's computer in form of an application--this is currently the most widely used approach, but user agents can be also cellular phones, PSTN gateways, PDAs , automated IVR

User Agent Server (UAS ) and User Agent Client (UAC ). UAS and UAC are logical entities only, each user agent contains a UAC and UAS . UAC is the part of the user agent that sends requests and receives responses. UAS

UAC and UAS , we often say that a user agent behaves like a UAC or UAS . For instance, caller's user agent behaves like UAC when it sends an INVITE requests and receives responses to the request. Callee's user agent behaves like a UAS

UAC and the caller's user agent behaves like UAS



Figure 1-1. UAC and UAS





Figure 1-1 shows three user agents and one stateful forking proxy. Each user agent contains UAC and UAS . The part of the proxy that receives the INVITE from the caller in fact acts as a UAS . When forwarding the request statefully the proxy creates two UAC

UAS becomes a UAC




1.3.2. Proxy Servers

SIP allows creation of an infrastructure of network hosts called proxy servers . User agents can send messages to a proxy server. Proxy servers are very important entities in the SIP

"closer"

SIP




1.3.2.1. Stateless Servers

Stateless server are simple message forwarders. They forward messages independently of each other. Although messages are usually arranged into transactions (see Section 1.5 ), stateless proxies do not take care of transactions.

Stateless proxies are simple, but faster than stateful proxy servers. They can be used as simple load balancers, message translators and routers. One of drawbacks of stateless proxies is that they are unable to absorb retransmissions of messages and perform more advanced routing, for instance, forking or recursive traversal.




1.3.2.2. Stateful Servers

Stateful proxies are more complex. Upon reception of a request, stateful proxies create a state and keep the state until the transaction finishes. Some transactions, especially those created by INVITE, can last quite long (until callee picks up or declines the call). Because stateful proxies must maintain the state for the duration of the transactions, their performance is limited.

SIP

Stateful proxies can absorb retransmissions because they know, from the transaction state, if they have already received the same message (stateless proxies cannot do the check because they keep no state).

Stateful proxies can perform more complicated methods of finding a user. It is, for instance, possible to try to reach user's office phone and when he doesn't pick up then the call is redirected to his cell phone. Stateless proxies can't do this because they have no way of knowing how the transaction targeted to the office phone finished.

SIP proxies today are stateful because their configuration is usually very complex. They often perform accounting, forking, some sort of NAT




1.3.2.3. Proxy Server Usage

SIP proxy server which is used by all user agents in the entity. Let's suppose that there are two companies A and B and each of them has it's own proxy server. Figure 1-2 shows how a session invitation from employee Joe in company A will reach employee Bob in company B.



Figure 1-2. Session Invitation





SIP proxy server proxy.a.com. The proxy server figures out that user sip:bob@b.com is in a different company so it will look up B's SIP proxy server and send the invitation there. B's proxy server can be either preconfigured at proxy.a.com or the proxy will use DNS SRV records to find B's proxy server. The invitation reaches proxy.bo.com. The proxy knows that Bob is currently sitting in his office and is reachable through phone on his desk, which has IP




1.3.3. Registrar

SIP proxy at proxy.b.com knows current Bob's location but haven't mentioned yet how a proxy can learn current location of a user. Bob's user agent (SIP phone) must register with a registrar . The registrar is a special SIP entity that receives registrations from users, extracts information about their current location (IP

Figure 1-3 shows a typical SIP registration. A REGISTER message containing Address of Record sip:jan@iptel.org and contact address sip:jan@1.2.3.4:5060 where 1.2.3.4 is IP



Figure 1-3. Registrar Overview





Each registration has a limited lifespan. Expires header field or expires parameter of Contact header field determines for how long is the registration valid. The user agent must refresh the registration within the lifespan otherwise it will expire and the user will become unavailable.




1.3.4. Redirect Server

redirect server

The originator of the request then extracts the list of destinations and sends another request directly to them. Figure 1-4 shows a typical redirection.



Figure 1-4. SIP






1.4. SIP

SIP (often called signalling) comprises of series of messages . Messages can be transported independently by the network. Usually they are transported in a separate UDP datagram each. Each message consist of "first line" , message header, and message body. The first line identifies type of the message. There are two types of messages--requests and responses

SIP



INVITE sip:7170@iptel.org SIP/2.0
Via: SIP/2.0/UDP 195.37.77.100:5040;rport
Max-Forwards: 10
From: "jiri" <sip:jiri@iptel.org>;tag=76ff7a07-c091-4192-84a0-d56e91fe104f
To: <sip:jiri@bat.iptel.org>
Call-ID: d10815e0-bf17-4afa-8412-d9130a793d96@213.20.128.35
CSeq: 2 INVITE
Contact: <sip:213.20.128.35:9315>
User-Agent: Windows RTC/1.0
Proxy-Authorization: Digest username="jiri", realm="iptel.org", 
  algorithm="MD5", uri="sip:jiri@bat.iptel.org", 
  nonce="3cef753900000001771328f5ae1b8b7f0d742da1feb5753c", 
  response="53fe98db10e1074
b03b3e06438bda70f"
Content-Type: application/sdp
Content-Length: 451

v=0
o=jku2 0 0 IN IP4 213.20.128.35
s=session
c=IN IP4 213.20.128.35
b=CT:1000
t=0 0
m=audio 54742 RTP/AVP 97 111 112 6 0 8 4 5 3 101
a=rtpmap:97 red/8000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:112 G7221/16000
a=fmtp:112 bitrate=24000
a=rtpmap:6 DVI4/16000
a=rtpmap:0 PCMU/8000
a=rtpmap:4 G723/8000
a=rtpmap: 3 GSM/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16



URI on the first line--sip:7170@iptel.org is called Request URI and contains URI

SIP request can contain one or more Via header fields which are used to record path of the request. They are later used to route SIP

SMTP where they identify sender and recipient of a message). From header field contains a tag parameter which serves as a dialog identifier and will be described in Section 1.6 .

Call-ID header field is a dialog identifier and it's purpose is to identify messages belonging to the same call. Such messages have the same Call-ID identifier. CSeq is used to maintain order of requests. Because requests can be sent over an unreliable transport that can re-order messages, a sequence number must be present in the messages so that recipient can identify retransmissions and out of order requests.

IP

SDP




1.4.1. SIP

We have described how an INVITE request looks like and said that the request is used to invite a callee to a session. Other important requests are:

  • ACK
  • BYE
  • CANCEL
  • REGISTER --Purpose of REGISTER request is to let registrar know of current user's location. Information about current IP address and port on which a user can be reached is carried in REGISTER messages. Registrar extracts this information and puts it into a location database. The database can be later used by SIP

The listed requests usually have no message body because it is not needed in most situations (but can have one). In addition to that many other request types have been defined but their description is out of the scope of this document.




1.4.2. SIP

When a user agent or proxy server receives a request it send a reply. Each request must be replied except ACK requests which trigger no replies.

A typical reply looks like this:



SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.30:5060;received=66.87.48.68
From: sip:sip2@iptel.org
To: sip:sip2@iptel.org;tag=794fe65c16edfdf45da4fc39a5d2867c.b713
Call-ID: 2443936363@192.168.1.30
CSeq: 63629 REGISTER
Contact: <sip:sip2@66.87.48.68:5060;transport=udp>;q=0.00;expires=120
Server: Sip EXpress router (0.8.11pre21xrc (i386/linux))
Content-Length: 0
Warning: 392 195.37.77.101:5060 "Noisy feedback tells:  
    pid=5110 req_src_ip=66.87.48.68 req_src_port=5060 in_uri=sip:iptel.org 
    out_uri=sip:iptel.org via_cnt==1"



As we can see, responses are very similar to the requests, except for the first line. The first line of response contains protocol version (SIP/2.0), reply code, and reason phrase.

reply code

  • 1xx are provisional
    Typically proxy servers send responses with code 100 when they start processing an INVITE and user agents send responses with code 180 (Ringing) which means that the callee's phone is ringing.
  • 2xx responses are positive final
    UAC may receive several 200 messages to a single INVITE request. This is because a forking proxy (described later) can fork the request so it will reach several UAS
  • 3xx
  • 4xx are negative final
  • 5xx
  • 6xx

reason phrase

The request to which a particular response belongs is identified using the CSeq header field. In addition to the sequence number this header field also contains method of corresponding request. In our example it was REGISTER request.




1.5. SIP

SIP messages are sent independently over the network, they are usually arranged into transactions by user agents and certain types of proxy servers. Therefore SIP is said to be a transactional protocol

SIP messages exchanged between SIP

If a transaction was initiated by an INVITE request then the same transaction also includes ACK, but only if the final response was not a 2xx response. If the final response was a 2xx response then the ACK is not considered part of the transaction.

As we can see this is quite asymmetric behavior--ACK is part of transactions with a negative final response but is not part of transactions with positive final responses. The reason for this separation is the importance of delivery of all 200 OK messages. Not only that they establish a session, but also 200 OK can be generated by multiple entities when a proxy server forks the request and all of them must be delivered to the calling user agent. Therefore user agents take responsibility in this case and retransmit 200 OK responses until they receive an ACK. Also note that only responses to INVITE are retransmitted !

SIP entities that have notion of transactions are called stateful

SIP RFC2543 the transaction identifier was calculated as hash of all important message header fields (that included To, From, Request-URI and CSeq). This proved to be very slow and complex, during interoperability tests such transaction identifiers used to be a common source of problems.

In the new RFC3261 the way of calculating transaction identifiers was completely changed. Instead of complicated hashing of important header fields a SIP

Figure 1-5 shows what messages belong to what transactions during a conversation of two user agents.



Figure 1-5. SIP






1.6. SIP

dialog . A dialog represents a peer-to-peer SIP relationship between two user agents. A dialog persists for some time and it is very important concept for user agents. Dialogs facilitate proper sequencing and routing of messages between SIP

dialog is a sequence of transactions . Figure 1-6 extends Figure 1-5 to show which messages belong to the same dialog.



Figure 1-6. SIP





Some messages establish a dialog and some do not. This allows to explicitly express the relationship of messages and also to send messages that are not related to other messages outside a dialog. That is easier to implement because user agent don't have to keep the dialog state.

For instance, INVITE message establishes a dialog, because it will be later followed by BYE request which will tear down the session established by the INVITE. This BYE is sent within the dialog established by the INVITE.

But if a user agent sends a MESSAGE request, such a request doesn't establish any dialog. Any subsequent messages (even MESSAGE) will be sent independently of the previous one.




1.6.1. Dialogs Facilitate Routing

We have said that dialogs are also used to route the messages between user agents, let's describe this a little bit.

SIP

The request will be sent from proxy to proxy until it reaches one that knows current location of the callee. This process is called routing. Once the request reaches the callee, the callee's user agent will create a response that will be sent back to the caller. Callee's user agent will also put Contact header field into the response which will contain the current location of the user. The original request also contained Contact header field which means that both user agents know the current location of the peer.

Because the user agents know location of each other, it is not necessary to send further requests to any proxy--they can be sent directly from user agent to user agent. That's exactly how dialogs facilitate routing.

Further messages within a dialog are sent directly from user agent to user agent. This is a significant performance improvement because proxies do not see all the messages within a dialog, they are used to route just the first request that establishes the dialog. The direct messages are also delivered with much smaller latency because a typical proxy usually implements complex routing logic. Figure 1-7 contains an example of a message within a dialog (BYE) that bypasses the proxies.



Figure 1-7. SIP






1.6.2. Dialog Identifiers

We have already shown that dialog identifiers consist of three parts, Call-Id, From tag, and To tag, but it is not that clear why are dialog identifiers created exactly this way and who contributes which part.

call identifier

From tag is generated by the caller and it uniquely identifies the dialog in the caller's user agent.

To tag is generated by a callee and it uniquely identifies, just like From tag, the dialog in the callee's user agent.

This hierarchical dialog identifier is necessary because a single call invitation can create several dialogs and caller must be able to distinguish them.




1.7. Typical SIP

SIP scenarios that usually make up the SIP




1.7.1. Registration

Users must register themselves with a registrar to be reachable by other users. A registration comprises a REGISTER message followed by a 200 OK sent by registrar if the registration was successful. Registrations are usually authorized so a 407 reply can appear if the user didn't provide valid credentials. Figure 1-8 shows an example of registration.



Figure 1-8. REGISTER Message Flow






1.7.2. Session Invitation

A session invitation consists of one INVITE request which is usually sent to a proxy. The proxy sends immediately a 100 Trying reply to stop retransmissions and forwards the request further.

All provisional responses generated by callee are sent back to the caller. See 180 Ringing response in the call flow. The response is generated when callee's phone starts ringing.



Figure 1-9. INVITE Message Flow





A 200 OK is generated once the callee picks up the phone and it is retransmitted by the callee's user agent until it receives an ACK from the caller. The session is established at this point.




1.7.3. Session Termination

Session termination is accomplished by sending a BYE request within dialog established bye INVITE. BYE messages are sent directly from one user agent to the other unless a proxy on the path of the INVITE request indicated that it wishes to stay on the path by using record routing (see Section 1.7.4 .

Party wishing to tear down a session sends a BYE request to the other party involved in the session. The other party sends a 200 OK response to confirm the BYE and the session is terminated. See Figure 1-10 , left message flow.




1.7.4. Record Routing

SIP proxies. This approach makes SIP network more scalable because only a small number of SIP

SIP proxy need to stay on the path of all further messages. For instance, proxies controlling a NAT

record routing . Such a proxy would insert Record-Route header field into SIP messages which contain address of the proxy. Messages sent within a dialog will then traverse all SIP

The recipient of the request receives a set of Record-Route header fields in the message. It must mirror all the Record-Route header fields into responses because the originator of the request also needs to know the set of proxies.



Figure 1-10. BYE Message Flow (With and without Record Routing)





Left message flow of Figure 1-10 show how a BYE (request within dialog established by INVITE) is sent directly to the other user agent when there is no Record-Route header field in the message. Right message flow show how the situation changes when the proxy puts a Record-Route header field into the message.




1.7.4.1. Strict versus Loose Routing

The way how record routing works has evolved. Record routing according to RFC2543 rewrote the Request-URI. That means the Request-URI always contained URI of the next hop (which can be either next proxy server which inserted Record-Route header field or destination user agent). Because of that it was necesarry to save the original Request-URI as the last Route header field. This approach is called strict routing

Loose routing , as specified in RFC3261 , works in a little bit different way. The Request-URI is no more overwritten, it always contains URI of the destination user agent. If there are any Route header field in a message, than the message is sent to the URI from the topmost Route header field. This is significant change--Request-URI doesn't necessarily contain URI

Because transit from strict routing to loose routing would break backwards compatibility and older user agents wouldn't work, it is necesarry to make loose routing backwards compatible. The backwards compatibility unfortunately adds a lot of overhead and is often source of major problems.




1.7.5. Event Subscription And Notification

SIP specification has been extended to support a general mechanism allowing subscription to asynchronous events. Such evens can include SIP

The mechanism is used mainly to convey information on presence (willingness to communicate) of users. Figure 1-11 shows the basic message flow.



Figure 1-11. Event Subscription And Notification





SIP

Note that the first NOTIFY message in Figure 1-11 is sent regardless of any event that triggers notifications.

Subscriptions--as well as registrations--have limited lifespan and therefore must be periodically refreshed.




1.7.6. Instant Messages

SIP



Figure 1-12. Instant Messages