workload simulation (was: Re: [MUD-Dev] MMORPG/MMOG Server design)
adam at grexengine.com
Sat Feb 22 23:33:11 New Zealand Daylight Time 2003
Mike Shaver wrote:
> All this (fascinating) talk about server design leads me to ask a
> question I never get tired of:
> Are large-scale server designers doing anything to simulate
> workload, in order to test their algorithmic changes and
> hardware capabilities? I know that the Shadowbane server crew
> have been hard at work on improvements to their server
> architecture, with very impressive reduction in lag
> (esp. timer-based), so maybe they have some tips to share. How
> to make sure that a given change to the server won't reintroduce
> those problems, or similar ones? Do you just try to get a few
> thousand testers logged in at once and see how she holds up, or
> is there some logging+replay system used to verify new builds?
> I suppose, actually, that this question extends to other parts of
> game design. Are people running "simulations of their
> simulations", to validate play-balance changes? Again to pick on
> the Shadowbane guys, a recent build saw one power (a
> health-draining/transfer spell) spiral way out of balance when the
> set of attributes that affected was changed. Do designers
> generally have systems set up to compute such effects before
> play-test begins ("given these character stats taken from our
> player base, what damage/mana cost/hit-rate/etc. will we have for
> Power X after these changes?")? Any best-practices to share?
I believe you're talking about three problems:
- 1. simulating a "heavily loaded" system (to run continuously in
parallel with all other simulations)
- 2. stimulating (note the extra letter) the "emergent behaviour"
problems that only occur due to the complex interaction of
- 3. tracking, logging, and examining that emergent behaviour to
try and find out why undesirable effects happened.
Number one is, well, either "very hard" or "pretty easy". If you
need to test anything that depends upon the realism of the
connections (e.g. testing your front-facing socket-listening stuff),
then it's hard. Good example of a problem that I know some games
failed to discover (because they didn't do this kind of testing
until too late):
1. Some event causes a lot of clients to disconnect
simultaneously (e.g. a "hiccup" on a minor backbone connection)
2. They all try to reconnect simultaneously.
3. Every Server involved receives a DDOS attack (in effect) and
at least one falls over - they had provisioned for a sensible
value for "maximum peak connection attempts in one second"; this
was way above sensible.
(In this example, they could also have avoided the problem
altogether by doing some failure-mode analysis, but then that's
what testing is there for :).
In the easy case, there are many ways of simulating heavy load on a
server, on a sliding scale from "get x thousand machines to connect
only once simultaneously" through to "get 1 machine to open x
thousand connections simultaneously" - although for large X the
latter case is not feasibly due to the client-simulator not being
able to exert much actual in-game load. This isn't a problem: you
just don't go below e.g. 5 client-simulators and x/5 thousand
Number 2, AFAIAA, is an NP-complete problem. In other words, if
anyone can come up with a solution that is more efficient than "try
every possible interaction in turn" then they'll win a Nobel prize
for mathematics. No, seriously :). [All Nobel-wannabes please note:
this isn't quite true unless the solution is a generic one; if your
solution depends on game-specific knowledge to reduce the workload,
I'm afraid you won't win anything].
In effect, if there's a better way of testing number 2, it's only
because of a particular feature of the design document of *your*
Number 3 is, I believe, similar to number 2 (although I'm not so
sure on this one: it's a bit less intrinsically obvious).
So. In conclusion, if you want such tools they are in fact very easy
to write :). But they're going to have to do the slowest possible
search of all possible outcomes (and probably generate terrifying
amounts of data); there's no "clever" improvements possible. Unless
there's a specific speedup available to your game in particular...
The GrexEngine has some decent tools for this - but they
cheat. Essentially, part of the development-environment has a
runtime component that has to also be embedded into the main
runtime-system. Now, because this component was around at
development-time, its able to take advantage of special knowledge
particular to your game (as described above as a possible short-cut)
and use that knowledge to simplify the search process.
The "knowledge of the game" that the component has comes in two
- Deduced knowledge. (development-time optimizations;
"compile"-like processes which essentially pre-assess various
data and behaviours, and store a summary; etc)
- Human-dictated knowledge. (at development time, the tool
either suggests constraints to the developer, or the developer
adds his/her own constraints, feeding into the deduced
knowledge. These constraints are things that are not
mathematically deducible, but a human can predict.)
An example of a constraint above might be "no player can ever move
directly upwards, except when on a ladder". For many games, this
might be an intrinsically obvious constraint, e.g. for a
maze-searching game it's actually undesirable for the constraint to
be broken (but it's not mathematically deducible; the tool needs a
designer's decision). Of course, it completely disallows jumping, so
for a 3D-platform game it would completely suck.
MUD-Dev mailing list
MUD-Dev at kanga.nu
More information about the MUD-Dev