Balancing Numbers

There’s an issue I’ve noticed comes up a lot in game design, particularly when it comes to balancing items and abilities against each other: The differentiation and conflation of absolute and relative values.

Here’s the first place I noticed it: I haven’t really played Crusader Kings 2 myself, but I was watching someone else play it and he was talking about one of the traits an army’s general could have, ‘experimental’. This trait makes it so in each encounter the commanded army is at 50-150% strength, or +/-50% of the normal strength. This seems reasonable at first: You’re gambling, and if you win you get 50 and if you lose you pay 50. However, the problem with this approach becomes clear if you experiment with increasing the numbers: Say +/-100%. With these values, on an exceptionally lucky roll you’d be at 200% power, and able to fight an army twice your size, which is a pretty nice bonus to be sure. However, on an exceptionally unlucky roll you’d be at 0% power, and would lose to an ‘army’ of three asthmatic soldiers with one sword between them. These are not even close to equivalent, and the same is true of a 50% power army, which would lose to an army half its size, and a 150% power army, which would lose to an army 3/2 its size.

It’s easy to gravitate towards tradeoffs that are balanced in absolute terms, because they’re simple and easy to understand. 50% is 50%, right? Conversely, calculating the relative benefits/tradeoffs, while not exactly advanced mathematics, is a non-trivial step. Rather than merely looking at how each upgrade or drawback relates to the norm, you have to start looking at how it relates to every other upgrade and drawback, which can become overwhelming.

It’s important to keep in mind the relative vs absolute benefits of a given upgrade or downgrade at all times, and it can lead to unpredictable results. At first glance, it seems like the engine upgrades in FTL give you diminishing returns, starting at 5% dodge chance and eventually going down to a piddling 3%, with the final upgrade popping back up to 4% for whatever reason. However, given the relative dodge value before and after upgrades, it turns out that that last 4% is actually one of the most valuable of all the upgrades

Another striking example lies in Diablo 2, in which the Necromancer class has a curse called amplify damage which reduces damage resistance of enemies by 100%. In early areas, this has the effect of doubling damage: A useful ability to be sure, but roughly equivalent to the other, mutually exclusive, curses at his disposal. However, in later areas, where enemies had some existing resistance to damage, this curse became incredibly powerful: An enemy with a base resistance of 95% damage would now take more than 20 times the damage they would normally. Many other skills in Diablo 2 have diminishing returns as they increase in level, while those which boost damage passively do so at a consistent rate of 5% per level, content that as that level increases those returns will diminish naturally. After all, the difference in scale between 200% damage and 205% damage is only half the difference between 100% damage and 105% damage

That’s not to say that all tradeoffs should be based on relative values. If a player has the ability to gamble resources with an expected payout of either losing half their resources or doubling their resources, it will almost always be in their interest to gamble, since it will take them half as much time to recover their lost 50%, in the case of a loss, as the time they would save collecting the same amount again by winning. As a general rule of thumb, if the player is making a big one-time decision, a strategic decision that will change their abilities or a one-shot opportunity to gamble on a resource that is extremely scarce, the benefits should probably be scaled relative to the penalties. However, when this is a frequently made decision where the stakes are low and renewable, the investment and payout should be balanced in absolute terms, a penny for a penny, to keep the system from being game-able.

The other big concern is thresholding. Not all bonuses or penalties are equal: if the most common or troublesome enemy you face has 100hp and you do 99 damage, a 2% damage upgrade becomes far more valuable than a 1% damage upgrade. This can get really tricky, and requires paying a lot of attention to the specifics of numbers.

This is something of a contentious example, but I think this is a big problem with the weapon The Enforcer in Team Fortress 2. This is a weapon for the spy which has a +20% damage bonus at the cost of firing 20% slower. On paper this doesn’t seem so bad – possibly even slightly underpowered, since it should have 96% of the damage output of the normal revolver which it is an alternative to – but in practice this ends up being a big problem for a few reasons. First, the damage bonus for this weapon makes it just powerful enough that at close range two shots will do 132 damage, enough to kill 4 out of 9 classes, since the standard is 125 hp. So first we have a thresholding problem, which makes it clearly superior to the stock revolver. Second, in terms of actual practical use, it’s extremely uncommon for a spy to actually fire their revolver as quickly as possible, usually trying to pick shots carefully to make sure they land, so the speed penalty is barely a factor. Third, after the stock revolver, it’s in competition with the spy’s other revolvers, all of which have damage penalties to make up for their other abilities. So, relative to the Diamondback and Ambassador, which both have a -15% damage penalty, it does +41% damage. Compared to the L’Etranger, which has a -20% damage penalty, it does +50% damage.

So the point I’m trying to make is that numbers are tricky. You can’t balance a game based on percentage increases and decreases without looking at how each option is positioned relative to each other and its place in the greater design of the game. Pragmatically, most of these situations have a way of working themselves out, either via balance patches or players finding ways around the problem or simple word-of-mouth spreading that this option which seems useless is actually quite strong. Still, it’s something worth considering.