VOD Online

Saturday, 2 July 2011

RTMP Protocol

====== RTMP Protocol [DRAFT] ======

{{tag>documentation rtmp}}

===== Introduction =====

RTMP is a protocol used by the Flash Player to deliver real time objects, video, and audio to clients using a binary TCP connection or polling HTTP tunnel.

The protocol is a container for data packets which may be [[:documentation:amf|AMF]] or raw audio/video data like found in the [[:flv]].

A single connection is capable of multiplexing many net streams using different channels. Within these channels packets are split up into fixed size body chunks.

RTMPT basically is a HTTP wrapper around the RTMP protocol that is sent using POST requests from the client to the server. Because of the non-persistent nature of HTTP connections, RTMPT requires the clients to poll for updates periodically in order to get notified about events that are generated by the server or other clients.

[[http://www.joachim-bauch.de/tutorials/red5/SPEC-RTMPT.html|This]] document written by Joachim Bauch describes the RTMPT tunneling protocol as implemented by the Red5 Open Source Flash Server that hopefully helps other people to write software that makes use of RTMPT.

===== Connection =====

Sample ActionScript for connecting and playing a stream:


var videoInstance:Video = your_video_instance;
var nc:NetConnection = new NetConnection();
var connected:Boolean = nc.connect("rtmp://localhost/myapp");
var ns:NetStream = new NetStream(nc);
videoInstance.attachVideo(ns);
ns.play("flvName");


Default port is 1935


===== Handshake =====

Client -> Server : Sends Handshake Request. This is not a protocol packet but a single byte (0x03) followed by 1536 bytes. This content does not seem to be vital for the protocol, but is not random either. See ((yes it seems random, but there is a pattern in there, I dont think there is any data though. If you order the bytes in an editor (wrapped at about 20 bytes) you see cols going down with bytes going in sequence, between these cols seems to be random content. I spent a night or two head scratching trying to work out, but in the end gave up. Five bucks to the person who finds ETs message home hidden in the handshake. Oh I did find a tiny amount of data right at the start. There is a counter which goes up, it seemed to be the clients uptime (time since system started)) ((It was suggested on [[http://tech.slashdot.org/comments.pl?sid=1101223&cid=26567551|Slashdot]] that this is most likely bandwidth test. Those bytes are constructed so they wouldn't be compressed much in transit, giving more accurate result. The uptime counter also hints at measuring bandwidth with this data block.))

Server -> Client : Sends a Handshake Response. This is not a RTMP packet but a single byte (0x03) followed by two 1536 byte chunks (so a total of 3072 raw bytes). The second chunk of bytes is the original client request bytes sent in handshake request. The first chunk can be anything. Use null bytes it doesnt seem to matter.

Client->Server: Sends 1536 raw bytes which are the second 1536 chunk of server generated handshake.

At this time, handshake is done and further packets are RTMP ones.

Client -> Server : send the Connect RTMP packet.

Server -> Client : Server responds

...and so on...





===== RTMP Datatypes =====

| 0x01 | Chunk Size | changes the chunk size for packets |
| 0x02 | Unknown | anyone know this one? |
| 0x03 | Bytes Read | send every x bytes read by both sides |
| 0x04 | Ping | ping is a stream control message, has subtypes |
| 0x05 | Server BW | the servers downstream bw |
| 0x06 | Client BW | the clients upstream bw |
| 0x07 | Unknown | anyone know this one? |
| 0x08 | Audio Data | packet containing audio |
| 0x09 | Video Data | packet containing video data |
| 0x0A - 0xE | Unknown | anyone know? |
| 0x0F | Flex Stream | Stream with variable length |
| 0x10 | Flex Shared Object | Shared object with variable length |
| 0x11 | Flex Message | Shared message with variable length |
| 0x12 | Notify | an invoke which does not expect a reply |
| 0x13 | Shared Object | has subtypes |
| 0x14 | Invoke | like remoting call, used for stream actions too. |
| 0x16 | [FMS3] Set of one or more FLV tags, as documented on the [[:flv]] page. Each tag will have an 11 byte header - [1 byte Type][3 bytes Size][3 bytes Timestamp][1 byte timestamp extention][3 bytes streamID], followed by the body, followed by a 4 byte footer containing the size of the body. | FLV data |

FIXME luke to add structure for rtmp types

===== Shared Object DataTypes =====


| 0x01 | Connect |
| 0x02 | Disconnect |
| 0x03 | Set Attribute |
| 0x04 | Update Data |
| 0x05 | Update Attribute |
| 0x06 | Send Message |
| 0x07 | Status |
| 0x08 | Clear Data |
| 0x09 | Delete Data |
| 0x0A | Delete Attribute |
| 0x0B | Initial Data |

FIXME joachim can you add details and structure for diff types of shared object








===== RTMP Packet Structure =====

FIXME (documentation started, not yet complete)

RTMP Packets consist of a fixed-length header and a variable length body that has a default of 128 bytes. The header can come in one of four sizes: 12, 8, 4, or 1 byte(s).

The two most significant bits of the first byte of the packet (which also counts as the first byte of the header) determine the length of the header. They can be extracted by ANDing the byte with the mask 0xC0. The possible header lengths are specified in the table below:

| Bits | Header Length |
| 00 | 12 bytes |
| 01 | 8 bytes |
| 10 | 4 bytes |
| 11 | 1 byte |

The header excludes information in the shorter version and implies that the information that is excluded is the same as the last time that information was explicitly included in the header.

In a full 12 byte header is broken down as follows:

The first byte has the header size and the object id. The first two bits are the size of the header and the following 6 bits are the object id. This limits a RTMP packets to a maximum of 64 objects in one packet. This byte is always sent no matter the size of the header.

The next three bytes are the time stamp. This is a big-endian integer and it is sent whenever the header size is 4 bytes or larger.

The next three bytes are the length of the object body. This is an integer and is big-endian. The length of the object is the size of the AMF in the RTMP packet without the RTMP headers, so you need to remove any RTMP headers before this number matches properly. These bytes are sent whenever the header size is 8 or more.

The next single byte is the content type. The content types are listed in another section this page and are only included when the header is 8 bytes or longer.

The final 4 bytes of the header is a stream id. This is a 32 bit integer that is little-endian encoded. These bytes are only included when the header is a full 12 bytes. (As mentioned in Mick's Breakdown of RTMP linked below, it is possible that the stream id defines which NetStream/NetConnection object is the source or target of the message)

As mentioned above, in cases where the header is not a full 12 bytes any of the missing fields are assumed to be unchanged from the last header sent with this object ID. For example, if a four-byte header is sent without a length field, and the last object that was sent for the current stream has terminated (ie, there are zero bytes remaining in the last object's transmission), then it should be assumed that the length of the object that follows will be the same as the length of the object that was last transmitted. So if a packet is sent whose payload is 32 bytes, and then immediately afterwards another packet is transmitted with the same object ID, but without a length field, it is assumed that the length of this next packet is also 32 bytes.

Also, checkout the document below. Its mostly accurate, although where it talks about AMF its really RTMP. AMF is used in RTMP, but its not covered in the following document.

Mick's Breakdown of RTMP: [[http://osflash.org/_media/rtmp_spec.jpg|JPG]] / [[http://www.acmewebworks.com/Downloads/openCS/TheAMF.pdf|PDF]]




===== Streaming =====

FIXME need more infos

For basic publish cycle this is what happens :

Client->Server : sends a CreateStream request //( is it a single RTMP packet ?)//

The createstream request is a single AFM0 function call (remote method invocation) whose high-level equivalent function signature would be "createstream(double ClientStream, NULL)" The ClientStream variable starts at 1 and is incremented by one for every stream that is created in a connection. It is NOT used by the server to route data for multimedia streams.

Server->Client : sends a response with a streamIndex number

The response the server sends back is also an AMF0 call, this time targeted to the client-side function "_result(double ClientStream, NULL, double ServerStream)" where the ClientStream value is the same one provided in the request to create the stream supplied by the client, above, and the ServerStream value is generated by the server to identify the stream over which data will be routed. Most servers that work with Flash clients appear to simply increment a value starting from one, just like the client does, but the ClientStream does not have to match the ServerStream.

Client->Server : does a publish //(what does it means in this context ?)//

Yet another AMF0 call, this time to a server method "publish(double 0, NULL, "resource_name", "options")" The first variable has always been zero in the sessions I have captured, but I have no idea what it is intended to reflect. Ditto for the NULL in the second parameter position. The resource_name parameter is a string that identifies the resource to which the stream is to be attached. For FMS-like servers, this appears to simply be a file name in in a directory on the disk, although for real-time video stream reflection it is probably irrelevant as long as the server can associate it between connections.

Client->Server : send the audio video packets (the packets are sent from the source as indicated on the streamIndex via the same channel as the publish request)

Now, this is what's really confusing about this god-awful RTMP protocol. The authors of the protocol were either intentionally attempting to obfuscate it to prevent compatibility-related reverse engineering with packet sniffers (which is all that is legal), or they were simply using some sort of higher-level representation for the protocol that resulted in very inconsistent low-level data going out the wire (much more likely). The way that AV data is associated with a stream once a publish() command has succeeded, for example, is by virtue of an integer field in the RTMP packet header. That's where the hilarity begins.

The server tells the client what stream to use by sending a double-precision floating-point value in its _result() call to the client, as described above. Fine so far. The client, for live streams at least, then sends an RTMP-level set_buffer(S, T) command back to the server, where I'm using 'S' to represent the stream being targeted and 'T' as the buffering time desired. Both of these parameters are sent as 32-bit big-endian integers.

Finally, the server replies with an RTMP-level reset(S) command, also using a big-endian integer to identify the stream, and an AMF0-formatted status message of the following form:

[C:-:S](RMI) onStatus(0, NULL, struct {
"level" => "status",
"code" => "NetStream.Publish.Start",
"description" => " is now published.",
"cliendid" =>
});

where I'm using the shorthand [C:T:S] to indicate the channel, time-index and stream values in the RTMP header, respectively. Here the stream identifier is sent in the header instead of as a call parameter (this is also how actual raw AV data is tagged). Guess what: that stream field in the header is a LITTLE-ENDIAN 32-bit integer.

So, the RTMP protocol uses 64-bit floats, 32-bit little-endian integers, and 32-bit big-endian integers to identify the exact same stream in three different contexts, in back-to-back messages sent between client and server.

If anyone is still interested, I may write up an annotated packet trace and post a PDF of it here. I wound up going back to the protocol analyzer because it turned out to be easier than trying to read the RED5 source :-) I'm just not a Java affictionado--actually, I'm writing up a real-time, media-only reflector in Erlang for an application that I cannot make public, but I am willing and able to provide any information I glean about RTMP back to the community (although it may not be much, in contrast to what RED5 appears to be capable of doing, which is why it was easier to analyze the wire rather than read the source). --ovrlod3

The PDF would be great to help on my Erlang version of this -SimpleEnigma

//(this needs more detailed description I think, maybe adding an introduction explaining in general how streams are identified and that RTMP allow stream interleaving over the same connection)//

===== Useful Links =====

- [[http://wiki.gnashdev.org/wiki/index.php/RTMP|RTMP on Gnash dev wiki]]

No comments:

Post a Comment