(svn r26370) -Doc: Desync debugging and command replaying.
parent
e9f7103ad7
commit
c6ce57e8a7
@ -0,0 +1,256 @@
|
||||
Some explanations about Desyncs
|
||||
Last updated: 2014-02-23
|
||||
------------------------------------------------------------------------
|
||||
|
||||
|
||||
Table of contents
|
||||
-----------------
|
||||
1.0) Desync theory
|
||||
* 1.1) OpenTTD multiplayer architecture
|
||||
* 1.2) What is a Desync and how is it detected
|
||||
* 1.3) Typical causes of Desyncs
|
||||
2.0) What to do in case of a Desync
|
||||
* 2.1) Cache debugging
|
||||
* 2.2) Desync recording
|
||||
3.0) Evaluating the Desync records
|
||||
* 3.1) Replaying
|
||||
* 3.2) Evaluation the replay
|
||||
* 3.3) Comparing savegames
|
||||
|
||||
|
||||
1.1) OpenTTD multiplayer architecture
|
||||
---- --------------------------------
|
||||
OpenTTD has a huge game state, which changes all of the time.
|
||||
The savegame containts the complete game state at a specific point
|
||||
in time. But this state changes completely each tick: Vehicles move
|
||||
and trees grow.
|
||||
|
||||
However, most of these changes in the gamestate are deterministic:
|
||||
Without a player interfering a vehicle follows it's orders always
|
||||
in the same way, and trees always grow the same.
|
||||
|
||||
In OpenTTD multiplayer synchronisation works by creating a savegame
|
||||
when clients join, and then transfering that savegame to the client,
|
||||
so it has the complete gamestate at a fixed point in time.
|
||||
|
||||
Afterwards clients only receive 'commands', that is: Stuff which is
|
||||
not predictable, like
|
||||
- player actions
|
||||
- AI actions
|
||||
- GameScript actions
|
||||
- Admin Port command
|
||||
- rcon commands
|
||||
- ...
|
||||
|
||||
These commands contain the information on how to execute the command,
|
||||
and when to execute it. Time is measured in 'network frames'.
|
||||
Mind that network frames to not match ingame time. Network frames
|
||||
also run while the game is paused, to give a defined behaviour to
|
||||
stuff that is executing while the game is paused.
|
||||
|
||||
The deterministic part of the gamestate is run by the clients on
|
||||
their own. All they get from the server is the instruction to
|
||||
run the gamestate up to a certain network time, which basically
|
||||
says that there are no commands scheduled in that time.
|
||||
|
||||
When a client (which includes the server itself) wants to execute
|
||||
a command (i.e. a non-predictable action), it does this by
|
||||
- calling DoCommandP resp. DoCommandPInternal
|
||||
- These functions first do a local test-run of the command to
|
||||
check simple preconditions. (Just to give the client an
|
||||
immediate response without bothering the server and waiting for
|
||||
the response.) The test-run may not actually change the
|
||||
game state, all changes must be discarded.
|
||||
- If the local test-run succeeds the command is sent to the server.
|
||||
- The server inserts the command into the command queue, which
|
||||
assigns a network frame to the commands, i.e. when it shall be
|
||||
executed on all clients.
|
||||
- Enhanced with this specific timestamp, the command is send to all
|
||||
clients, which execute the command simultaneously in the same
|
||||
network frame in the same order.
|
||||
|
||||
1.2) What is a Desync and how is it detected
|
||||
---- ---------------------------------------
|
||||
In the ideal case all clients have the same gamestate as the server
|
||||
and run in sync. That is, vehicle movement is the same on all
|
||||
clients, and commands are executed the same everywhere and
|
||||
have the same results.
|
||||
|
||||
When a Desync happens, it means that the gamestates on the clients
|
||||
(including the server) are no longer the same. Just imagine
|
||||
that a vehicle picks the left line instead of the right line at
|
||||
a junction on one client.
|
||||
|
||||
The important thing here is, that noone notices when a Desync
|
||||
occurs. The desync client will continue to simulate the gamestate
|
||||
and execute commands from the server. Once the gamestate differs
|
||||
it will increasingly spiral out of control: If a vehicle picks a
|
||||
different route, it will arrive at a different time at a station,
|
||||
which will load different cargo, which causes other vehicles to
|
||||
load other stuff, which causes industries to notice different
|
||||
servicing, which causes industries to change production, ...
|
||||
the client could run all day in a different universe.
|
||||
|
||||
To limit how long a Desync can remain unnoticed, the server
|
||||
transfers some checksums every now and then for the gamestate.
|
||||
Currently this checksum is the state of the random number
|
||||
generator of the game logic. A lot of things in OpenTTD depend
|
||||
on the RNG, and if the gamestate differs, it is likely that the
|
||||
RNG is called at different times, and the state differs when
|
||||
checked.
|
||||
|
||||
The clients compare this 'checksum' with the checksum of their
|
||||
own gamestate at the specific network frame. If they differ,
|
||||
the client disconnects with a Desync error.
|
||||
|
||||
The important thing here is: The detection of the Desync is
|
||||
only an ultimate failure detection. It does not give any
|
||||
indication on when the Desync happened. The Desync may after
|
||||
all occured long ago, and just did not affect the checksum
|
||||
up to now. The checksum may have matched 10 times or more
|
||||
since the Desync happend, and only now the Desync has spiraled
|
||||
enough to finally affect the checksum.
|
||||
|
||||
1.3) Typical causes of Desyncs
|
||||
---- -------------------------
|
||||
Desyncs can be caused by the following scenarios:
|
||||
- The savegame does not describe the complete gamestate.
|
||||
- Some information which affects the progression of the
|
||||
gamestate is not saved in the savegame.
|
||||
- Some information which affects the progression of the
|
||||
gamestate is not loaded from the savegame.
|
||||
- The gamestate does not behave deterministic.
|
||||
- Cache mismatch: The game logic depends on some cached
|
||||
values, which are not invalidated properly. This is
|
||||
the usual case for NewGRF-specific Desyncs.
|
||||
- Undefined behaviour: The game logic performs multiple
|
||||
things in an undefined order or with an undefined
|
||||
result.
|
||||
- The gamestate is modified when it shall not be modified.
|
||||
- The test-run of a command alters the gamestate.
|
||||
- The gamestate is altered by a player or script without
|
||||
using commands.
|
||||
|
||||
|
||||
2.1) Cache debugging
|
||||
---- ---------------
|
||||
Desyncs which are caused by inproper cache validation can
|
||||
often be found by enabling cache validation:
|
||||
- Start OpenTTD with '-d desync=2'.
|
||||
- This will enable validation of caches every tick.
|
||||
That is, cached values are recomputed every tick and compared
|
||||
to the cached value.
|
||||
- Differences are logged to 'commands-out.log' in the autosave
|
||||
folder.
|
||||
|
||||
Mind that this type of debugging can also be done in singleplayer.
|
||||
|
||||
2.2) Desync recording
|
||||
---- ----------------
|
||||
If you have a server, which happens to encounter Desyncs often,
|
||||
you can enable recording of the gamestate alterations. This
|
||||
will later allow the replay the gamestate and locate the Desync
|
||||
cause.
|
||||
|
||||
There are two levels of Desync recording, which are enabled
|
||||
via '-d desync=2' resp. '-d desync=3'. Both will record all
|
||||
commands to a file 'commands-out.log' in the autosave folder.
|
||||
|
||||
If you have the savegame from the start of the server, and
|
||||
this command log you can replay the whole game. (see Section 3.1)
|
||||
|
||||
If you do not start the server from a savegame, there will
|
||||
also be a savegame created just after a map has been generated.
|
||||
The savegame will be named 'dmp_cmds_*.sav' and be put into
|
||||
the autosave folder.
|
||||
|
||||
In addition to that '-d desync=3' also creates regular savegames
|
||||
at defined spots in network time. (more defined than regular
|
||||
autosaves). These will be created in the autosave folder
|
||||
and will also be named 'dmp_cmds_*.sav'.
|
||||
|
||||
These saves allow comparing the gamestate with the original
|
||||
gamestate during replaying, and thus greatly help debugging.
|
||||
However, they also take a lot of disk space.
|
||||
|
||||
|
||||
3.1) Replaying
|
||||
---- ---------
|
||||
To replay a Desync recording, you need these files:
|
||||
- The savegame from when the server was started, resp.
|
||||
the automatically created savegame from when the map
|
||||
was generated.
|
||||
- The 'commands-out.log' file.
|
||||
- Optionally the 'dmp_cmds_*.sav'.
|
||||
Put these files into a safe spot. (Not your autosave folder!)
|
||||
|
||||
Next, prepare your OpenTTD for replaying:
|
||||
- Get the same version of OpenTTD as the original server was running.
|
||||
- Uncomment/enable the define 'DEBUG_DUMP_COMMANDS' in
|
||||
'src/network/network_func.h'.
|
||||
- Put the 'commands-out.log' into the root save folder, and rename
|
||||
it to 'commands.log'.
|
||||
- Run 'openttd -D -d desync=3 -g startsavegame.sav'.
|
||||
This replays the server log and creates new 'commands-out.log'
|
||||
and 'dmp_cmds_*.sav' in your autosave folder.
|
||||
|
||||
3.2) Evaluation the replay
|
||||
---- ---------------------
|
||||
The replaying will also compare the checksums which are part of
|
||||
the 'commands-out.log' with the replayed gamestate.
|
||||
If they differ, it will trigger a 'NOT_REACHED'.
|
||||
|
||||
If the replay succeeds without mismatch, that is the replay reproduces
|
||||
the original server state:
|
||||
- Repeat the replay starting from incrementally later 'dmp_cmds_*.sav'
|
||||
while truncating the 'commands.log' at the beginning appropriately.
|
||||
The 'dmp_cmds_*.sav' can be your own ones from the first reply, or
|
||||
the ones from the original server (if you have them).
|
||||
(This simulates the view of joining clients during the game.)
|
||||
- If one of those replays fails, you have located the Desync between
|
||||
the last dmp_cmds that reproduces the replay and the first one
|
||||
that fails.
|
||||
|
||||
If you have the original 'dmp_cmds_*.sav', you can also compare those
|
||||
savegames with your own ones from the replay. You can also comment/disable
|
||||
the 'NOT_REACHED' mentioned above, to get another 'dmp_cmds_*.sav' from
|
||||
the replay after the mismatch has already been detected.
|
||||
See Section 3.2 on how to compare savegames.
|
||||
If the saves differ you have located the Desync between the last dmp_cmds
|
||||
that match and the first one that does not. The difference of the saves
|
||||
may point you in the direction of what causes it.
|
||||
|
||||
If the replay succeeds without mismatch, and you do not have any
|
||||
'dmp_cmd_*.sav' from the original server, it is a lost case.
|
||||
Enable creation of the 'dmp_cmd_*.sav' on the server, and wait for the
|
||||
next Desync.
|
||||
|
||||
Finally, you can also compare the 'commands-out.log' from the original
|
||||
server with the one from the replay. They will differ in stuff like
|
||||
dates, and the original log will contain the chat, but otherwise they
|
||||
should match.
|
||||
|
||||
3.2) Comparing savegames
|
||||
---- -------------------
|
||||
The binary form of the savegames from the original server and from
|
||||
your replay will always differ:
|
||||
- The savegame contains paths to used NewGRF files.
|
||||
- The gamelog will log your loading of the savegame.
|
||||
- The savegame data of AIs and the Gamescript will differ.
|
||||
Scripts are not run during the replay, only their recorded commands
|
||||
are replayed. Their internal state will thus not change in the
|
||||
replay and will differ.
|
||||
|
||||
To compare savegame more semantically, there exist some ugly hackish
|
||||
tools at:
|
||||
http://devs.openttd.org/~frosch/texts/zpipe.c
|
||||
http://devs.openttd.org/~frosch/texts/printhunk.c
|
||||
|
||||
The first one decompresses OpenTTD savegames. The second one creates
|
||||
a textual representation of an uncompressed savegame, by parsing hunks
|
||||
and arrays and such. With both tools you need to be a bit careful
|
||||
since they work on stdin and stdout, which may not deal well with
|
||||
binary data.
|
||||
|
||||
If you have the textual representation of the savegames, you can
|
||||
compare them with regular diff tools.
|
Loading…
Reference in New Issue