Skip to main content
Fantasy Butler

Engine methodology

How we test the engine

The 91.5% top-3 and 57.5% win-rate numbers come from 600 simulated drafts. Here is the setup, the scoring rules, the caveats, and what the numbers actually mean before you trust them.


What we built before Fantasy Butler

DraftButler is the simulation engine that started this whole project. It was built first because fantasy baseball is the hardest sport for a manual manager: ten scoring categories, twenty-three roster slots, batters and pitchers on different distributions, and a draft where a single mistake compounds for six months.

If an engine could win at fantasy baseball, it could be extended to football, basketball, and every other format. So baseball was the proving ground. The 91.5% top-3 and 57.5% win-rate numbers cited across the site come from this engine, in this format. Fantasy Butler, the autonomous agent that handles waivers, lineups, FAAB, and roster operations, is being built on the same approach, extended to football.

This page is the receipt for those numbers.

The simulation setup

The engine was tested across 600 complete drafts. Each draft used the same setup:

Parameter Value
Format 12-team snake draft
Roster 23 rounds (14 starters + 9 bench)
Scoring Roto 5x5 (10 category standings)
Player pool 663 players with both projections and qualifying actual stats
Projections STEAMER (FanGraphs)
Season 2025 (backtested against actual season results)
Opponents 11 ADP-based bots

The engine drafted from one of the 12 seat positions. The 11 other seats were filled by simulated drafters following Average Draft Position: picking from the top 3 available players weighted 60% / 25% / 15%. This produces opponents who behave like a typical public-league field that mostly follows consensus.

Every draft ran to completion. Every roster was scored against actual 2025 season results, using starter-only category standings. The bench was excluded from scoring, mirroring how roto leagues actually work.

What the numbers mean

Top-3 rate (91.5%) In a 12-team league, finishing in the top 3 means 1st, 2nd, or 3rd place at the end of the season. Random chance puts you in the top 3 about 25% of the time. The engine finished top-3 in 549 of 600 drafts.

Win rate (57.5%) Finishing 1st outright. Random chance is 8.3% (1 in 12). The engine finished 1st in 345 of 600 drafts. That is roughly 6.9 times better than random.

Average finish (1.82) The mean placement across all 600 drafts. A perfect engine that always wins would score 1.00. An average team scores 6.50. The engine's 1.82 means it typically finishes around 2nd place.

Floor (0 finishes in 10th or worse) Across 600 drafts, the engine never finished in the bottom three positions. The worst finish was 9th.

Three-way comparison

The same simulation framework was run against two other strategies on the same opponents:

Strategy Top-3 Rate Win Rate Avg Finish
Fantasy Butler engine 91.5% 57.5% 1.82
Pure ADP (follow consensus) 25.5% 11.2% 6.81
Yahoo pre-season expert rankings 0.0% 0.0% 11.54

Pure ADP performs at roughly random, which is what you would expect, since it is what most of the field is doing. Yahoo's expert rankings finish dead last when applied without in-draft adjustments, because flat ranked lists ignore positional scarcity, category balance, and starter slot requirements.

How the engine actually decides

The engine evaluates each available player on a small number of dimensions:

  • Per-category strength. Standardized z-scores for each scoring category, so home runs and stolen bases can be compared on the same scale.
  • Positional scarcity. When a position is running out across the league, the engine raises the priority of remaining players at that position so you do not end up without a catcher.
  • Category balance. If your team is weak in stolen bases, the engine nudges toward players who fill that gap.
  • Draft strategy. Six game-theory factors that account for the draft as a game, not just a list: how soon a player is likely to be drafted by someone else, whether you are reaching too early, whether you have the right mix of pitchers and hitters, whether this pick fills a starting slot, and how much pressure your remaining roster holes are creating.

The weights on those factors are not guesses. They were fit via systematic search: 300 different weight combinations were each tested across 120 drafts, and the top 5 combinations were re-tested across 600 drafts each. The combination that performed best on the larger sample is what runs in production.

Honest caveats

The numbers above describe performance against ADP-based simulated opponents. Real human opponents are smarter than ADP bots: they adjust, they remember last year, they target specific players, they read team needs. Expect lower absolute win rates in real leagues against good opponents.

What we believe holds

  • The relative advantage (engine over ADP, ADP over flat expert rankings) should persist regardless of opponent strength.
  • The engine's worst-case behavior (never finishing in the bottom three) is structural. It comes from filling starter slots before bench stacking, and from balancing pitchers against hitters early. That should carry over.

What might shift

  • Specific top-3 and win rates may compress. A 91.5% top-3 against bots could become 70% against a sharp league.
  • Auction draft formats, dynasty formats, and non-roto formats use different math. These results are for standard 12-team roto snake drafts.

Replicability

The simulation framework records every pick, every roster, every category standing, and every final placement. If you want to audit the methodology before trusting the numbers, the underlying simulation logs are available on request. Write us at hello@fantasybutler.com with the subject line "Methodology audit" and we will share the full run output.

The simulation is deterministic given fixed projections and a fixed random seed for opponent behavior. The same setup re-run will produce the same results within statistical noise.

Football extension

Fantasy Butler, the football autonomous agent, is being architected on the same approach as DraftButler. The engine that wins at roto baseball generalizes to football's weekly head-to-head format with adaptations for:

  • Weekly lineup setting instead of single-draft optimization
  • Position-locked scoring slots (QB, RB, WR, TE, FLEX, K, DST) instead of category accumulation
  • Waiver wire and FAAB decisions throughout the season instead of just at the draft
  • Approval mode or full auto, depending on how much delegation you want

The same z-score-plus-context-plus-strategy approach applies. The weights and the optimizer will be refit for football specifically. We will publish football methodology numbers here once the football engine has its own simulation track record.

Until then, the baseball numbers above are what we have validated.

Last updated

Stats reflect the post-roster-correction re-optimization (April 1, 2026). Engine version: 10-parameter strategy, starter-only roto scoring. Earlier 84.2% top-3 / 47.5% win-rate figures (pre-correction) are superseded.