Last week, I promised to write a bit more about how multiplayer could work (assuming I find funding for it).
Even the basics of multiplayer are complex, so this update turned out to be a lot of words - and it's a bit more technical than normal too. If that's not your jam, feel free to skip past the multiplayer theory-crafting.
(After the words re multiplayer, you can play with some sort-of-working oil - yes, we have oil now).
Paperwork Progress
The ScreenAustralia "give me money to make a cool game please" application (see Boil and Toil) is done! It took another 4 days to finish off, at least half of which was making a 2 minute video - phew.
If it's successful, then I should be able to have a crack at making the prototype multiplayer!
Multiplayer Maybe
Let us begin this section with the traditional ritual necessary when writing anything that could be constituted as a guarantee of a feature in a game:
What follows is all theory, no multiplayer code has been written yet, and most especially, this is not a promise that the game will have online multiplayer!
Basics
Disclaimer number 2: there are whole books on this topic and I am not (yet) well qualified to teach this. What follows is my understanding, somewhat simplified for a "interested, technical but not necessarily programmer" audience. Follow links below to learn more.
Networking a game (making a game multiplayer over a network, whether internet or home network) is making it seem like each player playing the game has the same view of the game state (where is Bob, is that enemy alive, etc) as other players.
This is difficult because:
- Mutable state: each player can change the game's state in some way (e.g. by moving or killing an enemy), so the state changes need to be transferred across the network to the running copy of each player's game (the player's game "client"), except...
- Trust: on PC, players can hack their copy of the game to do whatever they want to get an unfair advantage (i.e. cheat), such as seeing through walls or pretending their player is always alive - so state changes may need to be generated from (and/or relayed through) a trusted running copy of the game (a "server").
- Parallelism: each client (and possibly a server) is running the game's simulation in parallel, and the game's simulation will typically take a variable amount of time up to 16ms (1 second spread over 60 updates a second - but varied based on computer speed) to do one update (a "tick"), which would be okay if it weren't for...
- Latency: signals don't travel instantly across networks, so it can take anywhere from 2ms to 250ms for data ("packets") from one player to travel to another player (e.g. 60-80ms "round-trip" for east to west coast and back again for Australia or USA), which is compounded by...
- Unreliability: 0-2% of the time (but typically not uniformly distributed, and more often if wireless signals are involved), packets will be lost ("dropped") so that the data doesn't arrive. And you can't just always send duplicate data ("just in case some data is lost") due to...
- Bandwidth: computers can only download and upload a certain amount of data per second, which for players is commonly limited by1 their internet speed (e.g. 50/25mbps means downloading 6.25MB/s and uploading ~3.1MB/s).
So, to spell it out with a naive example:
- Alice and Bob are in the same city and so are 5ms (10ms round-trip) away from each other. Alice connects her running game as a client to Bob's running game (Bob's game is both client and server in this context).
- Trust: what if Bob hacked his copy of the game so that his character jumps higher? What if Bob's game collects Alice's network address and tries to attack it with other hacking tools?
- The game begins at T+0ms and Alice immediately jumps. Her game simulates one tick and takes 15ms, so at T+15 sends the state of Alice's character in the air to Bob.
- Parallelism: what if Bob also jumps? What if Bob does some action that stops Alice from jumping?
- It would take until T+20ms for Bob to see that Alice's character is in the air and send an acknowledgement to Alice saying "yes your character jumped".
- Latency: Bob's simulation is now halfway through tick #2 so Alice's character jumping will only be visible in Bob's tick #3.. but the position of the character will be as it was from Alice's tick #1, because the data Bob received is 2 ticks old!
- Unreliability: what if an internet quirk meant that first packet (from Alice, saying she jumped) was actually dropped so it doesn't arrive at Bob's computer - Alice would have to wait for a missing acknowledgement from Bob (T+15+5ms plus time for Bob's simulation to tick, plus 5ms for packet to arrive from Bob, plus some error bar to allow for a small delay), then retransmit (another 5ms). Bob's simulation will have moved on several ticks during this time, so now Bob and Alice are out of sync.
You want to design your network code ("netcode") to mitigate as many of these problems as is necessary for your game.
Netcode Designs
The way Glenn Fiedler2 sees it, there are roughly 3 netcode approaches to make multiplayer work:3
- State Synchronization: the server periodically sends entities' state to clients, where clients run the usual game logic to simulate things locally based on the information they do have. Clients send either player inputs to the server, or sometimes directly send some state for entities they control ("own") to the server.
- This is what most game engines that claim to support multiplayer - such as Unity - seem to do. It's quick to get started with, but can be buggy (each client has a slightly different view of the world & it's easy to forget to replicate some state for certain entities!) and is prone to cheating (as clients have full information & can be hacked to modify state for any entities they have ownership over).
- Snapshot Interpolation: the server periodically sends entities' state to clients, but instead of running game logic, clients instead interpolate between the last 2 states received (and the current time) to estimate the current state for all entities. Clients send inputs only to the server, so in effect they act sort of like a "remote keyboard, mouse and screen": they see only what they should see and just send keyboard & mouse input to the server.
- This is what any serious competitive online shooter does, as it's necessary to prevent (many but not all forms of) cheating. It can be network-bandwidth intensive though, so it is often paired with non-player cloud-hosted ("dedicated") servers.
- Lockstep Determinism: all clients and the server (who is often a client too) exchange inputs with each other (possibly relayed through the server), and attempt to simulate the game in such a way that they all get exactly the same outputs. To make this work, all game logic must be 100% deterministic, and you need to have a way of smoothing over the fact that inputs from different players are going to arrive at different times.
- This is what (all?) real time strategy games do (including Factorio, mostly4), as well as (all?) online fighters.
Netcode Designs That Probably Won't Work
This game will have thousands millions of "entities" (i.e. the atoms) whose state must be kept in sync for 2-4 clients. For both State Synchronization and Snapshot Interpolation, my back-of-the-envelope calculations suggest that even with clever programming tricks to reduce the amount of data sent, it would take somewhere between 2 and 20MB/s. 5
That's 16 to 160mbps - and for the server, that's 16-160mbps of upload bandwidth, per-connected-player! I want players to be hosting their own servers, so I think it is unlikely that either of the first 2 approaches would work well.
A Netcode Design That Might Work
That leaves Lockstep Determinism, and requires finding a way to smooth over the fact that inputs won't be applied immediately. In strategy games, the game can delay each step of the simulation by the amount of time it takes for messages to travel between players - for example, give an order to a unit & the game plays a sound effect for that client so they think it's happening, but the unit doesn't actually start moving until the server says that it has received the "please move this unit" message.
We don't have that luxury because our game is an action game, so any delay above say 30-50ms in moving your character or shooting (casting spells) would feel bad & cause the game to pause if any "button X was pressed" messages to clients are lost. I think we also can't fake our movement & spell-casting locally on each client because player movement & spells can dramatically impact the rest of the level - such as by jumping in a pool of water to splash that water onto some nearby fire to extinguish it.
Rollback Netcode
What I think we could do, maybe, is use a technique commonly used by online fighters: "Deterministic Lockstep Networking with Rollback".
There's an excellent writeup of rollback networking by Ars Technica here that I highly recommend, but the short version of it6 is this:
- Simulate ("tick") the game in lockstep.
- When Alice presses a button on simulation step ("tick") X, Alice('s client) sends that input plus "it was on tick X" to Bob('s client).
- When Bob('s client) receives that input, maybe while Bob is about to simulate tick X+4 or so, Bob rewinds the entire game's state to what it was at the start of tick X, applies the input from Alice, and re-simulates what would have happened from ticks X up to X+4 (including buttons pressed by Bob between X and X+4)
- Now tick X+4 is about to be ticked - and Bob sees that Alice is now 3 ticks into jumping.
In short, it gives a clean and principled way to make a game multiplayer that shouldn't entail too much extra effort on my part: once the multiplayer machinery is in place (and the game is deterministic), multiplayer should just work.
But! The catch is that I need to make the game run fast enough that it can roll-back and re-simulate all the ticks that it needs to. And the larger the geographical distance (and hence round-trip time) between 2 players is, the faster it needs to be able to roll-back & re-simulate each tick (because more ticks may have passed before a copy of the game gets input from the distant player).
And doing fast rollbacks & re-simulations might not be possible because the game is very CPU7 intensive. And more players mean more rollbacks8, so the rollbacks & re-simulations need to be consistently fast.
But I think there's a better than even chance that it would be possible and would work well. 9
That's all assuming that the aforementioned free money from ScreenAustralia materializes - without further funding, it's pretty unlikely I'll be able to afford to do the extra work10 to get online play working.
Burning Oil
Oil is a liquid that's less dense than water, so it floats on top - and fire can consume it, so it burns:
Problem 1: if you have two bodies of water and oil rush towards each other, it's pretty unnatural:
You can see the spreading happens a little and then just stops: this is because the oil atoms are "put to sleep" (their movement isn't simulated) to save on processing power when they're surrounded by other atoms. The atom sleep system isn't clever enough to realize when atoms could actually still move by displacing other atoms.
Problem 2: Oil burns kinda slowly:
This is because fire only spreads "every so often", and that amount is tuned to look okay for wood and coal, but looks off for oil.
My silicon sage tells me that oil burns faster than water because it has a lower ignition temperature and releases more energy when it combusts - which unfortunately brings us back to implementing temperature.
So both of those problems are still on the todo list.
Playable web build
Here, tell me how oil is misbehaving:
Small control changes, to account for the increasing number of elements:
- 1 to 0 now select elements.
- Shift plus 1 to 0 set your brush size.
Although, outside the context of game networking, actually it's not that unheard of for the limit to be the router's processing power or (far more commonly) wifi connection quality also. Wifi users especially are often shocked to find out that their "slow internet" is a lot faster when they set up in the same room as their modem (if curious, google "5ghz wifi wall signal quality").
Glenn Fiedler became well known amongst game programmers for his article Fix Your Timestep (which I have of course done!), but in the last decade he wrote a lot about how to network physics simulations. I highly recommend his GDC 2015 talk: Physics for Game Programmers: Networking for Physics Programmers, which shows neat demos of each of the 3 netcode architectures.
These 3 ways are the rough ways, but you can sort of blend them. I'm also skipping over an enormous amount of detail like input prediction and lag compensation that are needed to make State Synchronization and Snapshot Interpolation work.
Factorio's multiplayer architecture is discussed in #41 - Back to the MP through to #52 - Ups and Downs, #76 - MP inside out, #83 Hide the latency #147 Multiplayer rewrite, #149 Deep down in multiplayer, #302 - The multiplayer megapacket, and probably a few more I haven't found yet. The short version is that the entire game simulation runs in deterministic lockstep, but player inputs that haven't been applied by everyone (because they haven't had time to make it across the network to everyone else yet) are locally tacked onto the game state each time the game logic runs, which is used to hide the latency for various things like moving the player and opening UI windows.
Happy to share my back of the envelope bandwidth calculations to anyone that's interested/curious - just ask.
Looking for more on rollback networking? There's a GDC article here from the GGPO website. GGPO is - from what I can gather - sort of the "standard" implementation of all the reusable rollback machinery, but I would be using GGRS instead as it's a native library for Rust and its control flow isn't based around handling a bunch of callbacks (which is often hard to follow, and often also kind of a painful pattern with Rust).
The "more players -> more rollbacks" relationship is why I assume most big name commercial games don't go down this route. Either that, or I'm way underestimating A) the difficulty of making the game logic deterministic B) the difficulty of fast state snapshop & restores C) the difficulty of making the game simulation fast enough or D) all of A to C :)
The game is CPU intensive, so could we offload some of the processing to the graphics card instead, via compute shaders or similar? Well, leaving aside that I am currently targeting OpenGL ES3 that I think doesn't support compute shaders (todo: check that at some point)... GPU-offloading could be done, but it would be tricky because floating point calculations across GPUs are not guaranteed to be deterministic - and we can't give up determinism to gain speed.
So e.g. if we want to handle all atom movement in parallel, it could be possible to rewrite the atom movement logic to use integer (or fixed point) maths, but I'm still not sure how to avoid having atoms clobber each others' state while also allowing closely-packed blocks atoms to move in the same direction (as would happen each time a spell is cast). So in short: things will definitely be easier if I don't have to go there.
And if deterministic lockstep multiplayer was implemented, then that process of making the game logic 100% deterministic should also make replays, automatic prevention of some kinds of cheats, and reproduction of bugs much easier (or almost free) to implement. That last one would be great: I've always admired that the Factorio bug report process is usually "here is a save-game that shows the bug", and that's enough for the developers to reproduce & fix the exact problem.
What extra work would be required to make deterministic lockstep with rollback networking work? Well, the game is already mostly deterministic, but there's a few things I'd need to audit to make sure it is 100% deterministic. And then there would be a lot of performance optimization that I'd need to do - though I'd want to do a lot of it anyway.