[MUD-Dev] Networking architecture overview

Brian Hook brianhook at pyrogon.com
Wed Oct 17 10:56:01 New Zealand Daylight Time 2001

So I finally called up a friend of mine (John Carmack) that really,
really understands real-time networking and asked him to core dump on me
how he has implemented his networking.  The answer is not what I
expected, but it's a damn elegant solution.

His original implementation of networking, back in 1995 (for QuakeTest),
used TCP.  This was actually fine for LAN play, because the largish
packets (8K) wouldn't fragment on a LAN, but would fragment on the net.

His next iteration involved using UDP with multipled reliable and
unreliable data, pretty much what we've discussed on here.  Different
messages had different requirements, and the network layer would manage
this.  However there were always difficult to find bugs that would crop
up as a result of mixing reliable and unreliable data, e.g. guaranteed
messages that referenced entities that were altered through unreliable

The final iteration (Quake3), where he feels he pretty much "got it
right", uses a radically different approach.  There are no reliable
packets of any kind.  In fact, there is only a single packet type -- the
client's necessary game state.  The server sends sequenced game state
updates that are delta compressed from the last acknowledged game state
the client received.  This "just works".  Dropped packets are handled
very gracefully, and specific commands are never acknowledged, just last
known game states.  Obviously the client is sending traffic back to the
server ACKing the last game state received.

The client's receive logic is basically:

if ( newState.sequence < lastState.sequence )
else if ( newState.sequence > lastState.sequence )
   lastState = deltaUncompress( lastState, newState );
   ackServer( lastState.sequence );

The client then extrapolates movement and whatever other information it
needs based on the last game state it received.

On the server side, it's simple as well:

deltaCompressState( client.lastAckState, newState, &compressedState );
sendToClient( client, compressedState );

So the server never sits around waiting for an acknowledgement.  As a
result, latencies are much lower than if you have code that sits there
twiddling its thumbs waiting for a synchronous ACK.

There are two downsides to this implementation.  The big one is that it
soaks more bandwidth, because the server is constantly pumping new state
to the client instead of just sending new state when it doesn't get an
ACK.  Also, the amount of data it's sending grows with the number of
dropped packets since the delta grows as a result.  

The other downside is that the server has to do a lot of buffering,
specifically of the last acked state to the client (so it can delta
compress) along with a list of all the previous states it has sent to
the client (back to the last acked one).  This is necessary so it can
rebase its delta comparisons on any of the game states that the client
has acked.

For example, let's say the client last ACKed sequence 14.  The server is
sending out new game states with incrementing sequences.  It may be up
to sequence 20, when it gets an ack from the client that it received
sequence 17.  The server has to have the state that was sent as part of
sequence 17 in order to rebase the delta compression, so if game states
are large enough or the buffers are long enough, this can grow fairly
high (to the tune of several hundred K per client).

Addressing a question from someone else about "are there advantages to
using multiple ports?".  There is one significant advantage that I don't
think was mentioned, and that's that the OS buffers incoming data _per
port_.  So if you have a chance of a port buffer overflow, then multiple
ports may not be a bad idea.  

Speaking of port buffer overflows, John doesn't think this is a problem.
People that spin a second thread just to block on incoming ports are
probably just doing extra work that doesn't need to be done.
Effectively a thread that pumps the data ports is duplicating the exact
same work the network stack is doing in the OS.  Instead, John just
pumps the network stack at the top of the event loop.  Goes to show that
brute force sometimes "just works" =)  On a listen server (i.e. you're
hosting a server on your client), where frame times can be several dozen
milliseconds, there is a chance that you'll get a buffer overrun.  In
that case there are some options in Q3 to minimize buffer sizes, etc. to
reduce the likelihood of a buffer overrun.

One nice thing about his networking architecture, however, is that in
the case of a buffer overrun -- it just keeps working.  There is NO
difference to his client and server between a dropped packet and a
buffer overrun; as a matter of fact, this may actually be masking some
real buffer overruns, but so far he's pretty sure that he's not even
come close to hitting an overrun situation.  Of course, the dynamics of
an FPS are different than those of an MMOG (fewer clients, much higher
bandwidth per client), but it's interesting nonetheless.

One interesting note on NAT: apparently some routers will randomly
switch the client's port when behind a NAT.  This is very rare, but
causes lost connections "randomly" for many users.  It took forever to
track this down, but once it was discovered he solved it by having a
client randomly generate a unique client-id (16-bits) at connection time
that is appended to every incoming packet.  That way multiple clients
with the same IP can work just fine, even if their ports get re-assigned
mid-session.  Humorously enough, he never bothered handling the (so rare
as to be insignificant) case where two clients behind the same firewall
randomly generate the exact same ID.

Aside from application level delta compression, Q3 also does per-packet
Huffman compression.  Works great, simple to implement, and the upside
to something more aggressive is very low.

He has done some half-hearted work on encryption, but basically
app/stream-level encryption is pointless because of the sophistication
of hackers.  In the future he'll probably rely on higher level stuff (a
la Punkbusters for CounterStrike) instead of cute bit-twiddling.

Finally, I asked about packet size.  He thinks the notion of a 512-byte
optimal MTU is pretty much at least 5 years out of date.  He sees almost
no problems with a max packet size of 1400 bytes, and fragmentation just
isn't an issue.  During the course of typical activity, the actual
payload size is often smaller than the UDP header size (!), and even
during hectic activity the packet size is on the order of several
hundred bytes.  The large packets (up to 1.4K) are typically when a
large amount of game state must be sent at once, specifically at
connection time when there is no state to delta against and when there
is a lot of startup initialization going on.

So this architecture basically obviates the need to even have a
discussion about UDP vs. TCP and unreliable vs. reliable and
out-of-order vs. in-order.  It's all unreliable UDP delta compressed
against the last known state.

Damn that's smooth.


MUD-Dev mailing list
MUD-Dev at kanga.nu

More information about the MUD-Dev mailing list