Hockey Data Centre
Advanced Hockey analytics for casual fans or supernerds. No paywalls.
xG values are generated by a custom XGBoost model trained on ~1.5 million shots between 2015-2025 with location, distance, angle, game state, player strength-state, shift data (and others) totalling to 40+ numeric, categorical and ordinal features. On unseen shots, this model acheives close to (open-source) SOTA performance (MSE/Brier: 0.04287, LogLoss: 0.15970, AUC: 0.83716) but training is still a WIP and shot features are expanding. Alternative feature representaions and model architectures like simple MLPs are next. Other models like win likelihood, entertainment value, defensive prowess are in development. YT videos with data stories in development.
Data is automatically refreshed nightly from the NHL API. Model inference and playoff simulation are run nightly after incorporating the latest games.