ceo at grexengine.com
Mon Jun 30 11:21:41 New Zealand Standard Time 2003
Peter "Pietro" Rossmann wrote:
> As I wrote earlier, I have been thinking about an architecture,
> that would scale "pretty well", or, almost unlimitedly.
> I have drawed this basic design (very initial, many subsystems are
> Functionaly, the client connects to Connection Manager (CM) which
> in turn connects to the Cell Handler (CH). Now, the CM can request
> a handower request to the client, to connect to other CM. This
> way, the redundancy of CM is achieved. Ideally, the client would
> have opened connections to 2 CMs at all time, but this would be a
> overdo, i think :)
There's nothing *wrong* with your approach (although even at this
high level there are a couple of alternatives you could go with; I'm
not too keen personally on clients having additional open
connections at all times, but it's perfectly workable), but one
problem with distributed-system design is that the devil's in the
There'll be nothing for people to comment on - for or against -
until you've delved quite a lot deeper.
The primary things I'd be evaluating if you had a more detailed
design would be:
1 - How much each stage of processing adds to the mean RTT
(round trip time). Is the RTT heading towards taking a long
time? (if so, you need to cut stages, or find a way of offering
alternative paths through the system, so that the average path
2 - How it copes with each failure-mode. Bear in mind that any
computer anywhere could "fail" at any moment (and also that any
internet connected client could appear to fail for an
arbitrarily long time without actually doing so, anything up to
2 minutes). So, it becomes critically important where you are
storing state in the individual machines - and even more
critical if you start talking about load-balancing.
3 - Protocols (high-level ones, not byte-stuffing
stuff)...what's the algorithm (for example) for load balancing?
How does it initiate, what data does it send, does it require
the servers to synchronise in advance, or to always be
synchronized (so that all the state is shared)? Can it cut in
partway through a request, or are outstanding requests lost? The
network overhead for a permanently-synchronized load-balancing
pair can be massive...
4 - Overload: unless you only want to support very low numbers
of players (*low for a distributed system*), one of the critical
questions is "what happens when the system gets
overloaded?". Does it just fall over and die? Can it corrupt
data in the process? Does it gracefully slow to a halt, but
actually remain running (albeit very, very, very slowly?
Obviously, the last of these is the "ideal" case; all (!) you
have to do is remove some of the connected clients (recall that
this is compared to it *crashing*, so forceably disconnecting
clients is not so bad :)), and the server will carry on happily
without further intervention.
5 - ...and what's the *intrinsic* overhead of your system, and
how does it scale with additional players? If you've got any
algorithm anywhere in your system that is worse than log-linear,
you've got a problem. If you've got any that are worse than
quadratic, you most likely have a very BIG problem. But even
linear-cost algorithms can be devastating, if the constant of
proportionality is too high (what's the RAM consumption per
client? you only have limited RAM, no matter waht you
do. Processor time is infinite (it may take a long time to
execute, but you'll get there eventually), but RAM isn't).
Ahem. I've glossed over a few points there, and made some gross
generalizations, but I hope it will give you a flavour of the
questions you need to ask (and answer). This is off the top of my
head ... I've probably ommitted something important, so don't take
it as a thorough checklist.
MUD-Dev mailing list
MUD-Dev at kanga.nu
More information about the MUD-Dev