Rollback Performance | Slow Rush Studios

I simulated what game performance would be like in a rollback-networked online multiplayer world, and the answer was "Terrible".

So I made it a lot less terrible!

And I incorporated some feedback on last week's Lightning spell too.

Stunning Changes

After last week's update, the Discord brain trust told me it felt wrong to have zapped enemies still run towards you and attack.

I agreed - so I fixed it!

Now enemies (and players) who get zapped by the Lightning spell get stunned, and can't move or attack.

The new visual zapping effect is hand drawn per character,¹ which I think makes this the first "animation" in the whole game.

Multiple Players and Game Design

It's arguably a little premature to think about networked multiplayer when the game isn't particularly fun to play yet.

But I've been kicking around designs for how the larger game should be structured, and some of those designs work much better with other players. ²

And it's much easier to play with other players if you don't have to convince them to leave their own ~~cave~~ home.

So, "can we do online multiplayer" influences not just marketability, but also significantly affects the design of the game; therefore, I spent some time investigating that question.

Online Multiplayer Feasibility

The tentative plan for online multiplayer is this:

Each game sends their player's input (e.g. buttons pressed) for a simulation-step to all other connected players.
Once the game on one computer has input for all players, it "ticks" (runs for one simulation-step) the game simulation & snapshots (saves) the result.
1. If the game didn't get an input from a remote player in time, it predicts what their input would have been, and uses that instead.
2. If the game later receives a previously-missing input that differs from an old predicted input, after say 3 ticks of further simulation, the game rolls back (restores) its game state to the state 3 simulation steps ago, re-ticks those 3 simulation steps with the corrected input, then continues again.
(Repeat from step 1, until someone rage quits.)

This approach is so cool!

It uses very little network bandwidth, it resists cheating, and it doesn't result in multiplayer-only game-breaking glitches from lag or temporary packet loss.

But... always saving the world state, doing a rollback and then re-simulating the world N times takes a whole lot of processing power - in a game design that's already very heavily using the CPU.

Is it too much?

Prediction Accuracy

I was tempted to performance test immediately, but if you paid attention you've probably realized that you only need to resimulate the world when:

Remote players' input doesn't arrive (packet loss³)
AND
The predicted input didn't match what the remote player actually input.

We don't have control over packet loss, but what about input prediction?

Well, input prediction is super simple:

If a player was pressing a button, assume they'll keep pressing that button.

Apparently even the fastest players only press 5 buttons per second, so it works.

But what about analog inputs, such as thumb-stick position on a gamepad?

I was worried that even the slightest movement would cause a mis-prediction...

And I was right! Here's a little visualization I built:

Red and (occasional) green lines show movement and aiming thumb-stick positions on gamepad. Flashing red circle indicates a misprediction. Text shows total and last 10 seconds number and % of input mispredictions.

You can see even tiny changes in thumb-stick movement cause mispredictions, and over 50% of inputs end up mispredicted (& hence would cause world resimulation) - not good!

So I tried snapping⁴ those analog inputs:

I snap the movement input to one of 8 directions with 10 possible magnitudes.
I snap the aim input to either⁵ 2 or 256 possible directions with only 2 possible magnitudes.

That drastically reduces mispredictions, at least for normal inputs: ⁶

Inputs snapped: notice green movement input only extends in 8 directions. Mispredictions stay under 20%.

Okay, 20% is much better than 50%, but in a 4 player game there are a total of 3 other players who can each cause rollbacks.

So we're going to have to deal with a fair few rollbacks - what does that mean?

Performance Matters

Well, if we want to support playing with anyone anywhere in the world,⁷ the game must support rollback and re-simulations of up to 7 ticks. ⁸

To see what that felt like right now, I added a debug option to do an extra 7 world-clones and re-simulations each frame:

Here's what 7 extra world-clones and re-simulations each frame feels like.
(Artificially slowed down to 25FPS; real experience was a lot jerkier.)

My performance immediately dropped from 60 FPS (frames per second) to an eye-watering 30 FPS, with sustained dips to 20 FPS. Ouch.

To get back to 60 FPS, the game must simulate the whole game world 8 times in ~16ms. That's ~2ms per tick, on average.

Only 2ms to simulate the whole game world?! Geez, how long are we taking right now?

Screenshot of profiler showing it takes about 3.6ms on average to simulate the game world

Starting point: it takes 3.6ms on average to simulate the game world for one tick.
Graph shows two peaks; #1 is steady state, and #2 is barrels falling and exploding.
(Graph's x-axis is 'time taken' and y-axis is 'count of times it took that long'.)

And that's from a special benchmark where I wasn't even moving or casting spells at enemies! (Just standing still, with a couple barrels exploding in the background.)

Let's try improving performance to cut that time in half.

Moving Body Atom Writes

A while back I optimized the "time to clone the world" performance by breaking the world into copy-on-write chunks of atoms: each chunk would be shared with the "saved" versions of past ticks' game worlds, right up until the chunk needed to be written to by the current game world, at which point it would get copied.

That worked really well in my initial benchmarking, but in a level with lots of moving bodies, it performed quite badly - even if the bodies lay still!

Why?

Each moving body is composed of atoms, and each tick a moving body would remove its atoms, simulate its rigid body physics to find where it moves to, then put its atoms back into the new positions dictated by its new position. Those writes would cause whole chunks of atoms to be copied each simulation tick, even if the body wasn't even moving.

I optimized that by only removing and re-placing the atoms if the moving body, you know, actually moved. Duh. ⁹

Terrain Collider Reuse

Each moving body is only able to collide with the world because each chunk of atoms is processed into a physics "collider".

Even after earlier optimizations to skip chunks devoid of moving bodies, over half of each tick was being spent on collider creation!

Screenshot of profiler showing time spent creating games

Each vertical bar is a frame; light white is time spent on creating those colliders; dark white is everything else.

After some moderately successful attempts to optimize the processing itself, I hit on the obvious approach: if the atoms that contribute to colliders¹⁰ haven't changed, then keep the old collider around.

New: purple-outlined chunks had their colliders re-used instead of regenerated. (Red chunks are skipped, and green chunks are when colliders get regenerated, such as when rockets or explosions damage terrain.)

That lead to another small win: physics engines normally put moving bodies "to sleep" if they haven't moved in a while, which reduces physics simulation costs.

With the world's colliders previously being recreated each tick, moving bodies weren't ever able to sleep - but now they can!

And that lead to an even better optimization: I realized that if a chunk only contained sleeping moving bodies, then it could be skipped entirely!

Sleeping (stationary) rigid bodies are ignored for purposes of determining which chunks to generate colliders for.
Notice chunks are outlined in red, until they (are about to) contain a character or an in-motion moving body.

End Result

Finally, after all that plus a few less interesting tweaks, let's see how we did:

Screenshot of profiler showing that game_step now runs in

End point: simulating one tick dropped from 3.6ms to 0.9ms (on average). That's a 4X speedup!

Yay, 0.9ms is under our 2ms goal!

There are three big caveats to this performance test result though:

I'm benchmarking on a pretty fast computer. ¹¹
This test level is 4-8 times smaller than I want levels to be.
There wasn't much happening in it; a few explosions, but no spells or other combat.
- If I actually play the game with the 7-extra-world-simulations debug option, I see 50-60 FPS with occasional brief dips to 30 FPS - so further work is required!

I have some other tricks up my sleeve,¹² but we'll have to see if they're enough.

Still, even if rollback-based online multiplayer doesn't work out, these changes will still make the game perform better for everyone!

Playable web build‎

You can try out the tweaked lightning spell below!

The rollback simulation is available in the web build too; press F2 then scroll down to the Rollback section and set Logic Repeat to (say) 7. Performance on the web build is a bit worse than on desktop, mind.

Press F1 for help, including to see keyboard/mouse controls. Mobile devices probably won't work! By playing you agree to our Privacy Policy and Terms of Service.

It's a fine artisanal sprite.

Initially I was thinking I'd procedurally (i.e. write code to) generate the zap effect by drawing a white outline around the sprite and turning the normal pixels black.

But I think it looks better to have the characters switch into a cartoon-ish getting-zapped pose.

The best of both worlds might be to combine the new pose sprite with drawing a white outline procedurally, so the outline can dynamically crackle - but that seemed like a bit too much polish to bother with at this point.

A trivial example is any game built around defeating other players.

But some less extreme examples are Left 4 Dead, Payday 2, Deep Rock Galactic and other similar co-op shooters - having someone around to revive you or rescue you makes those games a lot more fun.

It's a spectrum, and to some extent clever design can mitigate missing players (e.g. Helldivers 2 has a drop-pod-based respawning that works fine in singleplayer) - but you can't e.g. have a player crack a safe while their buddy holds off a wave of enemies if there's usually no buddy playing.

⁷

"Play with anyone in the world" sounds unrealistic, but as a small indie I can't rely on any kind of large player numbers - so the only active online game for you to drop into might be hosted half a world away.

If you restrict online multiplayer to the "invite friends from the same continent as you" use case then you get a less strict requirement of about 8ms.

But I prefer aiming for the worst-case because it gives some extra buffer against strings of lost packets, so you can still play with your mate Ivan who insists that Wifi networks are just as good as wired ethernet networks.

(You are so very wrong Ivan!)

⁸

The maths is hard to follow unless you understand rollback networking, but for those who do:

The furthest any places on earth are away from each other is ~320ms round trip time, which is 160ms one way.
A 60hz tick rate gives ~16.6ms per tick, during which all world-(re-)simulation, world-saving and world-restoring must happen.
Usually there are 3 ticks of "input delay" in a rollback implementation (meaning everyone's inputs are only applied 3 ticks after they're issued), so that's ~50ms gone from 160ms, leaving 110ms.
110ms divided by 16.6ms per tick gives 7 ticks that might have been wrong and need to be re-simulated, meaning at most 8 ticks will need to resimulated (7 + present tick).
A framerate of 60hz gives 16ms per frame, assuming we reserve 0.6ms for other things like file I/O, input polling & rendering.
Divide 16ms per frame by 8 ticks to get the 2ms per tick necessary to maintain a 60hz framerate.

In practice, it's complicated further by packet loss tending to happen for a few packets in a row but you can approximate that by pretending you have fewer input delay frames in this calculation.

The phenomenon whereby data sent over the internet sometimes just doesn't arrive.

Common internet data transmission protocols have built-in ways of coping with this based on retransmitting data, but in rollback networking you just send the N last frames worth of input each time, so that even if you miss one frame's input, you'll get it as long as the next N packets weren't also lost.

⁴

Also known as discretizing or quantizing; basically transforming from a continuous input domain to a discrete output range.

⁵

Initially I always snapped aim to one of 256 possible directions. Unfortunately, when you use a mouse and your character jumps, the direction from your character to your mouse cursor changes rapidly, causing 60%-ish mispredictions!

So now while holding a spell casting button you still get the full 256 possible aim directions, but in the normal case the game only records "aim to left or aim to right" (to face the character the right way).

⁶

Aggressively adverse inputs (such as repeatedly rolling the aiming thumb-stick around in its socket while casting) are unfortunately still able to reach misprediction rates of 90% and above.

I tried snapping aiming to fewer angles which does help somewhat, but - at least with the current spell-casting system - it feels really bad to not be able to point beam or rock spells at enemies precisely.

⁹

It was not "duh". It was actually a major pain in the butt with a lot of edge cases to solve, but you don't want to read about those.

¹⁰

Only "Grounded" atoms are considered. TLDR: Liquid and gas atoms are never included in collider calculations, and sand atoms that haven't settled into a resting place yet aren't included either.

¹¹

I upgraded to a 16 core/32 thread AMD Ryzen 7950X last year so my Rust code would compile much faster.

Now, sure, the game is 100% single threaded right now so the core count doesn't matter much.

But this CPU has a pretty high clock speed and an enormous amount of L2/L3 cache, which does really help.

¹²

Multithreading the atom simulation and world collider generation is a big one, but difficult-to-impossible to do for the playable web build, so I've been putting it off.
Similarly, rendering is happening in the same thread as the game simulation right now, and rendering takes 3-10ms depending on window size & zoom level, so we often drop frames because of that.
Characters (players and enemies) have their movement physics simulated via the rigid body physics engine, so they force collider generation to happen for any chunk of atoms containing a character. It should be a lot more performant to implement my own character physics directly based on atoms instead, as [Noita](https://store.steampowered.com/app/881100/Noita/) seems to have done.. but it'll also be very fiddly.