보드게임 수학: 확률과 주사위가 불공평하게 느껴지는 이유

Q: How many playtests are needed to statistically validate board game balance?

The minimum number of playtests for statistically meaningful balance data depends on the number of variables being tested and the acceptable margin of error. For a 2-player game with 2 asymmetric factions, 30 games provides a baseline sample for detecting win rate imbalances larger than 10% at 80% confidence. For a 4-player game with 6 factions, the combination space is much larger and 30 games is insufficient — you would need 150+ games to get meaningful data on each faction pair. In practice, most indie publishers cannot run this volume of blind playtests. The practical approach is: use math to verify expected values and check for obvious dominance, use playtesting to find outliers and edge cases the math misses, and use community feedback post-release to identify balance issues that survived both stages.

Every board game mechanic has a mathematical identity. A dice roll has an expected value and a variance. A card draw has a probability distribution. A resource trade has an exchange rate that can be expressed as a ratio. Designers who understand this math make better decisions than designers who work by feel — not because math replaces intuition, but because intuition frequently disagrees with reality in ways that testing alone is slow to correct.

This article covers the mathematical concepts that matter most for board game design and play: probability distributions, expected value, variance, and the psychological gap between what the math says and what players experience. Whether you are designing a game or just trying to understand why your dice sessions feel so catastrophically unlucky, the framework here will change how you think about randomness in games.

Why Math Matters in Game Design

A game designer who has not calculated the expected value of their game's core action economy does not know whether their game works. This sounds harsh, but it is functionally true. If the expected income from the best available action is 4 resources per round and the cost of the victory-condition action is 30 resources, the designer needs to know whether that income rate is achievable over the game's typical duration — before playtesting, not after six sessions wondering why no one ever wins.

Math and playtesting are complementary tools, not alternatives. Math tells you what the theory predicts. Playtesting tells you whether human behavior matches the theory. Most of the time, they diverge — not because the math is wrong, but because players do not always choose the theoretically optimal action. The gap between theoretical optimal play and actual human play is itself a design variable: a game where only optimal play produces interesting decisions is a worse game than one where suboptimal play creates interesting situations too.

Every mechanic has an expected value, and designers must know it. When a Neutronium: Parallel Wars player gains income from Nuclear Ports, they are receiving a precisely calculated expected value per port per round. When they choose to attack rather than build, they are making a decision that has computable expected outcomes under different scenarios. The designer who knows these numbers can make meaningful balance decisions; the designer who does not is guessing.

The critical asymmetry is that randomness feels unfair even when it is balanced. A 50/50 coin flip produces heads six times in a row approximately 1.6% of the time — rarely, but not impossibly. When that happens to a player in a game, they experience it as the game being broken, not as a normal statistical event. Understanding why this happens — and how designers can structure randomness to feel less punishing while maintaining the same underlying probabilities — is the most practically valuable application of game design math.

Dice Probability 101

The single d6 is the most common randomization tool in board games and also one of the most misunderstood. A standard d6 produces a uniform distribution: each face (1 through 6) has a 1/6 probability of occurring, and the expected value is 3.5. Players intuitively understand this, but they often fail to understand what it means for repeated rolls over a session.

The single d6 versus 2d6 distinction is foundational to understanding why different dice mechanics feel different. A single d6 has a flat probability distribution — every outcome from 1 to 6 is equally likely. Two d6 summed produce a bell curve: 7 is the most likely result (probability 6/36 = 16.7%), while 2 and 12 each have probability 1/36 = 2.8%. The 2d6 distribution concentrates outcomes near the middle and makes extreme results rare. This is why Catan, which uses 2d6 for resource production, feels less punishing on individual rolls than single-die systems — the distribution naturally limits extreme outcomes.

2d6 Probability Distribution Sum: 2 → 1/36 = 2.8% Sum: 3 → 2/36 = 5.6% Sum: 4 → 3/36 = 8.3% Sum: 5 → 4/36 = 11.1% Sum: 6 → 5/36 = 13.9% Sum: 7 → 6/36 = 16.7% ← most likely Sum: 8 → 5/36 = 13.9% Sum: 9 → 4/36 = 11.1% Sum: 10 → 3/36 = 8.3% Sum: 11 → 2/36 = 5.6% Sum: 12 → 1/36 = 2.8%

Custom dice with non-standard face distributions give designers precise control over probability profiles that standard dice cannot provide. A die with the faces [0, 0, 0, 1, 1, 2] has a very different character than a d6: it produces zero 50% of the time, one 33% of the time, and two 17% of the time, with an expected value of 0.67. Neutronium: Parallel Wars uses custom D6 dice with color-coded faces: blue faces represent standard combat outcomes, red faces represent critical results, and green faces represent special ability triggers. The distribution of face types — not just the number of faces — determines the probability of each outcome. A die with three blue faces, two red faces, and one green face produces blue outcomes 50% of the time, red 33%, and green 17%. The designer can adjust these ratios by changing face count rather than creating mathematically complex resolution systems.

Exploding dice are dice that, when rolling the maximum value, are rolled again and the results added. A d6 that explodes on 6 has an expected value of (1+2+3+4+5+6)/6 + (1/6 × expected value of a d6) = 3.5 + (1/6 × 3.5) = 3.5 + 0.583 = 4.083. The open-ended nature creates theoretically unbounded results — a lucky sequence of explosions can produce very high totals — which produces the "feeling lucky" moments that some games deliberately cultivate. The tradeoff is high variance and the occasional game-defining lucky roll.

Bounded dice are the opposite philosophy: capping the maximum outcome to constrain variance. Dice pool systems where you roll multiple dice and take only the best N results (advantage systems like D&D 5E's advantage mechanic, or Gumshoe's multiple dice take-highest) mathematically reduce variance while maintaining probabilistic feel. Taking the higher of two d6 rolls shifts the expected value from 3.5 to 4.47 — a 28% improvement — while reducing the probability of low outcomes significantly.

Expected Value in Resource Games

Resource accumulation games — Euros, engine builders, economic strategies — are built on expected value calculations that the designer must understand precisely even if they never appear explicitly in the rulebook. When a player chooses between two actions, they are (consciously or not) comparing the expected value of those actions over the relevant time horizon.

Neutronium: Parallel Wars's Nuclear Port income system is an explicit example of designed expected value. The income formula establishes that a player with N Nuclear Ports receives income at a rate that scales non-linearly with N. The specific formula — 1 port yields 2 Neutronium units per round; 10 ports yields 220 Nn per round — is not accidental. It is the designer's explicit statement that port accumulation should produce exponential rather than linear returns, because exponential returns create the coalition threshold that drives the game's competitive dynamics.

Nuclear Port Income Scaling (Neutronium: Parallel Wars) 1 port → 2 Nn/round (base) 2 ports → 5 Nn/round 3 ports → 9 Nn/round 5 ports → 20 Nn/round 7 ports → 42 Nn/round ← coalition threshold 10 ports → 220 Nn/round (runaway potential)

This formula is intentional game design expressed as mathematics. The gap between 7-port income (42 Nn/round) and 10-port income (220 Nn/round) is the economic argument for why coalitions form at the 7-port threshold rather than waiting until 9 or 10 ports. At 7 ports, the player has enough income to be threatening — but coalition action can still be decisive before the income advantage becomes mathematically insurmountable. A designer who arrived at these numbers through playtesting alone might get them approximately right; a designer who understood the exponential function from the beginning could specify the threshold precisely.

The broader principle: when exponential scaling is intentional game design, the designer must document the scaling function and verify that the thresholds it creates are where they want them. If the coalition threshold should be at 6 ports rather than 7, the income formula needs to be adjusted — which requires knowing what the formula is, not just observing that "the game feels balanced."

Variance and Player Perception

Variance is the measure of how much actual outcomes spread around the expected value. High variance means individual results can differ dramatically from the expectation; low variance means results cluster tightly around the average. For game designers, variance is a control knob that affects both the mathematical fairness of the game and the subjective experience of playing it.

The key psychological insight: high variance feels bad even when it is mathematically balanced. A coin flip is perfectly fair — 50/50, expected value exactly equal for both players — but playing a game where every decision is resolved by coin flip feels arbitrary and unrewarding. Players need to feel that their decisions matter, which means they need the causal connection between good decisions and good outcomes to be perceivable within the game session. High variance severs that connection.

The 7 versus 2 Catan hex problem illustrates this clearly. In Catan, the number 7 is printed on the most hexes because it has the highest probability with 2d6 (16.7%). The number 2 is printed on the fewest hexes (2.8%). Experienced players know to prioritize resources on 6s, 8s, 5s, and 9s — high-probability hexes. But in any given session, a player who correctly places their initial settlements on these hexes can still be significantly underperformed by a player with lower-probability placements if the actual dice rolls deviate from expected values. This is not unfair — it is normal statistical variation. But it feels unfair because the relationship between the decision (good placement) and the outcome (frequent resource income) is obscured by variance.

The design solutions for managing perceived unfairness from variance include: mitigation mechanics (rerolls, resource banks, catch-up mechanisms that activate on bad luck runs), decision points that remain meaningful even after bad luck (so a player who rolls poorly still has interesting choices), and variance that favors trailing players (catch-up via variance: the leading player wants stable, predictable income; trailing players benefit from high-variance approaches that can close the gap quickly, even though the expected value is the same).

Kingmaker moments from dice — where a random roll determines which player wins or loses in the final round — are the most damaging variance outcomes for player satisfaction. The solution is not eliminating dice but structuring the late game so that dice outcomes affect the path to victory rather than determining it outright. When multiple players have viable winning positions going into the final round, a lucky roll is satisfying for the winner but does not feel illegitimate to the losers — because the losers also had a path to win that could have been enabled by their own lucky rolls.

Balance Testing with Math

The MEQA framework (Measurability, Engagement, Quality, Accessibility) provides a structured approach to game balance testing. The Measurability pillar — the M in MEQA — is where mathematics enters the design process formally: before playtesting begins, the designer defines what "balanced" means in measurable terms.

For a game with asymmetric factions like Neutronium: Parallel Wars, measurable balance means: each faction should achieve a win rate within a defined tolerance band across a sufficient sample of games at comparable skill levels. If the target is 50% win rate (pure balance) with a ±10% acceptable range, then a faction winning 42% of games is within tolerance and a faction winning 63% is not. But achieving this standard requires knowing the target before testing — not declaring post-hoc that observed win rates are "close enough."

Defining metrics before playtesting changes what you observe. If you know you are measuring win rate per faction, you track faction assignments and outcomes across sessions. If you know you are measuring average game length, you record timestamps. These decisions must be made before the first playtest session, because retrospective metrics are unreliable — memory is selective and humans naturally remember sessions that support existing beliefs.

Sample size requirements for balance conclusions are often larger than designers expect. For a 2-player game with 2 factions, 30 games provides baseline data for detecting imbalances larger than 15% at 80% confidence. For 4-player games with 6 factions, the combination space is much larger: 30 games gives you approximately 5 games per faction pair — barely sufficient for detecting extreme imbalance, and insufficient for detecting subtle advantages. Indie publishers rarely have the resources for rigorous statistical validation; the practical approach is to use math to verify expected values, playtesting to catch outliers, and community feedback post-release to identify surviving issues.

For the full framework — including how Measurability integrates with the other MEQA pillars — see the MEQA game balance framework guide, which covers the complete approach to defining, measuring, and achieving balance across game systems.

The income scaling formula in Neutronium connects directly to the mechanics detail at /mechanics/nuclear-port-scaling, where the exponential function is documented alongside the design reasoning for each threshold value.

Probability Tools for Designers

Several tools make game design math accessible without requiring advanced statistical training. These are the ones that work in practice.

AnyDice (anydice.com) is the standard dice probability calculator for game designers. It accepts natural language dice notation (2d6, d4+d8, 3d6 keep highest 2) and returns probability distributions, expected values, and cumulative probabilities. For any mechanic involving dice, AnyDice should be the first tool consulted. Its output graphs make distributions immediately legible and comparable — paste two different dice expressions side by side to see immediately how their distributions differ.

Spreadsheet simulations (Google Sheets, Excel) handle calculations that AnyDice cannot: resource accumulation over multiple rounds, income with multiple sources, expected game length under different strategic assumptions. A basic spreadsheet model of a game's economy — with columns for each turn, rows for each resource type, and formulas representing the game's core income and spending mechanics — takes 2–3 hours to build and reveals balance issues that would take 20+ playtests to discover empirically.

Monte Carlo simulation is the highest-precision tool: running a game's mechanics thousands of times computationally to produce statistical distributions across all possible outcomes. For designers with programming background, Python with NumPy is sufficient for most game simulation needs. For designers without programming background, there are visual Monte Carlo tools and even spreadsheet-based simulations that produce meaningful results with limited technical knowledge. Monte Carlo is most valuable for games with complex interdependencies where analytical calculation is difficult — when multiple random events interact, simulation produces more reliable distribution estimates than manual calculation.

When to trust math versus when to playtest: use math to verify theoretical balance and catch obvious design errors before investing in playtesting. Use playtesting to discover how human psychology interacts with the math — the places where the optimal strategy differs from what players actually do, and the places where the math predicts balance but the experience feels unfair. Both are necessary. Neither is sufficient alone.

Frequently Asked Questions

Why do dice feel unfair in board games even when the probability is balanced?

Dice feel unfair because human memory is biased toward negative outcomes. Psychological research on loss aversion shows that a bad dice roll is remembered and weighted approximately twice as heavily as an equally good dice roll. When you roll poorly three times and well three times in a session, you leave the table feeling unlucky — because the losses were more emotionally salient than the wins. Additionally, high variance means individual sessions can diverge significantly from the expected average: a "fair" dice system can produce a run of six low rolls in a row purely by chance, which feels manipulated even though it is within normal statistical variation.

What is expected value in board games?

Expected value (EV) in board games is the average outcome of a probabilistic event calculated across all possible outcomes, weighted by their probability. For a standard d6, the expected value is (1+2+3+4+5+6)/6 = 3.5. Designers use expected value to ensure that different strategic choices offer comparable return on investment — if one action has a much higher expected value than alternatives, rational players will always choose it, eliminating meaningful decision points. Good game design means giving players choices where the expected values are close enough that other factors (risk tolerance, current game state, opponent behavior) determine the optimal choice.

How do board game designers control randomness?

Board game designers control randomness through several techniques: dice pool mechanics that reduce variance (rolling multiple dice and choosing the best result), custom dice with non-standard face distributions for precise probability control, card draw from shuffled decks for pseudo-randomness that trends toward expected outcomes over time, and mitigation mechanics (rerolls, resource banks) that let skilled players reduce bad luck impact without eliminating randomness. The designer's goal is not to eliminate randomness but to make it feel responsive to skill.

How many playtests are needed to statistically validate board game balance?

For a 2-player game with 2 asymmetric factions, 30 games provides a baseline for detecting win rate imbalances larger than 15% at 80% confidence. For a 4-player game with 6 factions, the combination space requires 150+ games for meaningful data on each faction pair. In practice, most indie publishers use math to verify expected values and catch obvious dominance, playtesting to find outliers and edge cases, and community feedback post-release to identify balance issues that survived both stages. The combination of all three produces more reliable balance than any single approach.

A Game Where the Math Is Designed to Be Visible

Neutronium: Parallel Wars's income scaling, coalition thresholds, and dice system are built on explicit probability mathematics. Join the waitlist for launch updates.

Join the Waitlist →