NFL球员PCA分析

NFL球员PCA分析

Author

Li Zongzhang

Published

October 17, 2025

安装包

install.packages("tidyverse")

install.packages("factoextra")

install.packages("MASS")

install.packages("psych")

加载包

library(tidyverse)

library(factoextra)

library(psych)

数据文件

NFL Play Statistics dataset

点击下载数据文件: NFL.xlsx

第1步 评估数据是否适合做PCA

library(readxl)

data <- read_excel("NFL.xlsx")

#提取data中的第5至12列,保存为combine

combine <- data[, 5:12]

library(psych)

KMO(combine)

Kaiser-Meyer-Olkin factor adequacy

Call: KMO(r = combine)

Overall MSA = 0.87

MSA for each item =

height weight forty vertical bench broad_jump three_cone

0.80 0.80 0.91 0.91 0.74 0.89 0.91

shuttle

0.92

bartlett.test(combine)

Bartlett test of homogeneity of variances

data: combine

Bartlett's K-squared = 89371, df = 7, p-value < 2.2e-16

第2步 估计PCA

#方法一:调用prcomp函数

combine.pr <- prcomp(combine, scale = TRUE)

combine.pr

Standard deviations (1, .., p=8):

[1] 2.3679065 0.9227977 0.7890378 0.6134782 0.4681091 0.3717784 0.3483419

[8] 0.2526648

Rotation (n x k) = (8 x 8):

PC1 PC2 PC3 PC4 PC5

height 0.2913200 -0.36243690 -0.78456426 0.201411503 -0.10676615

weight 0.3982567 -0.23599332 -0.08084940 0.006795590 0.19904888

forty 0.3967636 0.08177256 -0.02806160 0.007186135 0.47533576

vertical -0.3467039 -0.37295634 -0.11888605 -0.570750251 0.50876800

bench 0.2433913 -0.73405585 0.56077575 0.100768662 -0.16400782

broad_jump -0.3707226 -0.29425658 -0.20956737 -0.282435248 -0.38806157

three_cone 0.3779002 0.12306906 0.06454279 -0.546224235 0.08716904

shuttle 0.3733848 0.16307221 -0.02114816 -0.495272480 -0.52830179

PC6 PC7 PC8

height 0.04131390 -0.25540971 -0.22209123

weight -0.03852389 0.23485745 0.82634941

forty -0.10347494 0.61777481 -0.46557052

vertical 0.36727384 -0.07881529 -0.02938450

bench 0.01827782 -0.06658600 -0.21361784

broad_jump -0.51246785 0.48890122 0.00708534

three_cone -0.56866804 -0.45335440 -0.05483418

shuttle 0.51465642 0.20678970 -0.03889983

#方法二:计算combine的相关系数矩阵的特征值和特征向量

combine %>% cor() %>% eigen()

eigen() decomposition

$values

[1] 5.60698132 0.85155552 0.62258070 0.37635549 0.21912616 0.13821921 0.12134208

[8] 0.06383951

$vectors

[,1] [,2] [,3] [,4] [,5] [,6]

[1,] -0.2913200 -0.36243690 0.78456426 -0.201411503 -0.10676615 0.04131390

[2,] -0.3982567 -0.23599332 0.08084940 -0.006795590 0.19904888 -0.03852389

[3,] -0.3967636 0.08177256 0.02806160 -0.007186135 0.47533576 -0.10347494

[4,] 0.3467039 -0.37295634 0.11888605 0.570750251 0.50876800 0.36727384

[5,] -0.2433913 -0.73405585 -0.56077575 -0.100768662 -0.16400782 0.01827782

[6,] 0.3707226 -0.29425658 0.20956737 0.282435248 -0.38806157 -0.51246785

[7,] -0.3779002 0.12306906 -0.06454279 0.546224235 0.08716904 -0.56866804

[8,] -0.3733848 0.16307221 0.02114816 0.495272480 -0.52830179 0.51465642

[,7] [,8]

[1,] 0.25540971 0.22209123

[2,] -0.23485745 -0.82634941

[3,] -0.61777481 0.46557052

[4,] 0.07881529 0.02938450

[5,] 0.06658600 0.21361784

[6,] -0.48890122 -0.00708534

[7,] 0.45335440 0.05483418

[8,] -0.20678970 0.03889983

combine.pr$sdev^2

[1] 5.60698132 0.85155552 0.62258070 0.37635549 0.21912616 0.13821921 0.12134208

[8] 0.06383951

#查看主成分的方差贡献率、累计方差贡献率

summary(combine.pr)

Importance of components:

PC1 PC2 PC3 PC4 PC5 PC6 PC7

Standard deviation 2.3679 0.9228 0.78904 0.61348 0.46811 0.37178 0.34834

Proportion of Variance 0.7009 0.1064 0.07782 0.04704 0.02739 0.01728 0.01517

Cumulative Proportion 0.7009 0.8073 0.88514 0.93218 0.95957 0.97685 0.99202

PC8

Standard deviation 0.25266

Proportion of Variance 0.00798

Cumulative Proportion 1.00000

#查看第1主成分载荷

combine.pr$rotation[,1:2]

PC1 PC2

height 0.2913200 -0.36243690

weight 0.3982567 -0.23599332

forty 0.3967636 0.08177256

vertical -0.3467039 -0.37295634

bench 0.2433913 -0.73405585

broad_jump -0.3707226 -0.29425658

three_cone 0.3779002 0.12306906

shuttle 0.3733848 0.16307221

PC1含义解释

PC1对所有变量的系数(载荷)多数为正,主要由体重(weight)和速度/敏捷相关(forty、three_cone、shuttle)指标主导。

broad_jump和vertical为负系数,表明跳跃能力好的个体PC1得分低。

PC1可以理解为“体型+速度/敏捷主成分”,与跳跃能力呈负相关。

PC2含义解释

PC2主要由卧推(bench),其次是vertical、height、broad_jump,且系数均为负,说明这些能力高的人在PC2得分低。

PC2可以理解为“力量+弹跳能力逆向主成分”。

第3步 确定保留的主成分的个数

# 计算各个主成分的方差

pr.var <- combine.pr$sdev^2

pr.var

[1] 5.60698132 0.85155552 0.62258070 0.37635549 0.21912616 0.13821921 0.12134208

[8] 0.06383951

# 计算各个主成分的方差贡献率

pve <- pr.var/sum(pr.var)

pve

[1] 0.700872665 0.106444440 0.077822587 0.047044437 0.027390770 0.017277401

[7] 0.015167760 0.007979939

#绘制各个主成分的方差贡献率

plot(pve, xlab = "Principal Component",

ylab = "Proportion of Variance Explained",

ylim = c(0, 1), type = "b")

# 绘制累计方差贡献率

plot(cumsum(pve), xlab = "Principal Component",

ylab = "Cumulative Proportion of Variance Explained",

ylim = c(0, 1), type = "b")

# 绘制碎石图

library(factoextra)

fviz_eig(combine.pr)

第4步 可视化

# 绘制主成分1和主成分2的分组散点图

combine.pcscore <- cbind(data,combine.pr$x)

combine.pcscore %>% ggplot(aes(PC1,PC2, col= position))+

geom_point()

combine.pcscore %>% ggplot(aes(PC2,PC3, col= position))+

geom_point()

#Graph of individuals.

#Individuals with a similar profile are grouped together.

fviz_pca_ind(combine.pr,

col.ind = "cos2", # Color by the quality of representation

geom = c("point"),

gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),

repel = TRUE # Avoid text overlapping

)

# Graph of variables.

# Positive correlated variables point to the same side of the plot.

# Negative correlated variables point to opposite sides of the graph.

fviz_pca_var(combine.pr,

col.var = "contrib", # Color by contributions to the PC

gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),

repel = TRUE # Avoid text overlapping

)

#Biplot of individuals and variables

fviz_pca_biplot(combine.pr, repel = TRUE,

geom = c("point"),

col.var = "#2E9FDF", # Variables color

col.ind = "#696969" # Individuals color

)

table(data$position)

C CB DE DT EDGE FB FS ILB LB LS OG OL OLB OT QB RB

115 311 279 253 8 77 123 130 1 2 232 1 240 273 12 245

S SS TE WR

7 107 194 275

缩写

英文全称

中文解释

C

Center

中锋,进攻线中央负责开球给四分卫

CB

Cornerback

角卫,防守组主要负责盯防外接手

DE

Defensive End

防守端锋,防守线两侧,冲传和防跑

DT

Defensive Tackle

防守截锋,防守线中间,堵截跑动

EDGE

Edge Rusher (OLB/DE)

边锋冲传手,专注于冲击四分卫

FB

Fullback

近卫跑卫,主要负责掩护和短码冲球

FS

Free Safety

游动安全卫,防守后场,负责深区

ILB

Inside Linebacker

内线卫,防守中路,兼顾跑动和传球

LB

Linebacker

线卫,防守二线,分内外线卫

LS

Long Snapper

长传手,专门负责长开球(开球、射门)

OG

Offensive Guard

进攻护锋,中锋两侧,保护四分卫

OL

Offensive Lineman

进攻线球员总称,包括C、OG、OT等

OLB

Outside Linebacker

外线卫,负责边路防守和冲传

OT

Offensive Tackle

进攻截锋,进攻线两侧,保护四分卫

QB

Quarterback

四分卫,进攻组织核心,传球主力

RB

Running Back

跑卫,负责持球冲跑或接球

S

Safety

安全卫,泛指FS和SS

SS

Strong Safety

强侧安全卫,靠近防跑,兼顾传球

TE

Tight End

近端锋,既能接球也参与阻挡

WR

Wide Receiver

外接手,主要负责接球推进

table(data$position) %>%

as.data.frame() %>%

arrange(desc(Freq))

Var1 Freq

1 CB 311

2 DE 279

3 WR 275

4 OT 273

5 DT 253

6 RB 245

7 OLB 240

8 OG 232

9 TE 194

10 ILB 130

11 FS 123

12 C 115

13 SS 107

14 FB 77

15 QB 12

16 EDGE 8

17 S 7

18 LS 2

19 LB 1

20 OL 1

combine.pcscore %>%

filter(position %in% c("CB","DE","WR")) %>%

ggplot(aes(PC1, fill= position))+

geom_histogram() +

facet_wrap(~position, ncol = 1)

PC1: “体型+速度/敏捷主成分”

DE Defensive End 防守端锋,防守线两侧,冲传和防跑

CB Cornerback 角卫,防守组主要负责盯防外接手

WR Wide Receiver 外接手,主要负责接球推进

第5步 针对主成分得分的进一步分析

不同位置球员的主成分得分比较

# 绘制PC1主成分得分按位置分组的箱线图

combine.pcscore %>%

ggplot(aes(x = fct_reorder(position, PC1, .fun = median, .desc = TRUE),

y = PC1, fill = position)) +

scale_fill_brewer(palette = "Set3") +

geom_boxplot() +

theme(axis.text.x = element_text(angle = 45, hjust = 1)) +

labs(title = "不同位置球员PC1主成分得分分布", x = "position", y = "PC1得分")

# 绘制PC2主成分得分按位置分组的箱线图

combine.pcscore %>%

ggplot(aes(x = fct_reorder(position, PC2, .fun = median, .desc = TRUE),

y = PC2, fill = position)) +

scale_fill_brewer(palette = "Pastel1") +

geom_boxplot() +

theme(axis.text.x = element_text(angle = 45, hjust = 1)) +

labs(title = "不同位置球员PC2主成分得分分布", y = "PC2得分")

# 方差分析:不同位置的PC1得分是否有显著差异

anova_pc1 <- aov(PC1 ~ position, data = combine.pcscore)

summary(anova_pc1)

Df Sum Sq Mean Sq F value Pr(>F)

position 19 14133 743.9 1046 <2e-16 ***

Residuals 2865 2037 0.7

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# 方差分析:不同位置的PC2得分是否有显著差异

anova_pc2 <- aov(PC2 ~ position, data = combine.pcscore)

summary(anova_pc2)

Df Sum Sq Mean Sq F value Pr(>F)

position 19 497.8 26.198 38.33 <2e-16 ***

Residuals 2865 1958.1 0.683

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

基于主成分得分的聚类分析

# k-means聚类(以2类为例)

set.seed(123)

km <- kmeans(combine.pcscore[, c("PC1", "PC2")], centers = 3)

combine.pcscore$cluster <- as.factor(km$cluster)

# 可视化聚类结果

ggplot(combine.pcscore, aes(PC1, PC2, color = cluster)) +

geom_point() +

labs(title = "基于主成分得分的球员聚类")

聚类与位置的对应关系

prop.table(table(combine.pcscore$position, combine.pcscore$cluster), 1)

1 2 3

C 0.930434783 0.000000000 0.069565217

CB 0.000000000 0.996784566 0.003215434

DE 0.132616487 0.021505376 0.845878136

DT 0.790513834 0.000000000 0.209486166

EDGE 0.000000000 0.000000000 1.000000000

FB 0.051948052 0.116883117 0.831168831

FS 0.000000000 0.991869919 0.008130081

ILB 0.007692308 0.153846154 0.838461538

LB 0.000000000 1.000000000 0.000000000

LS 0.000000000 0.000000000 1.000000000

OG 0.969827586 0.000000000 0.030172414

OL 1.000000000 0.000000000 0.000000000

OLB 0.004166667 0.262500000 0.733333333

OT 0.945054945 0.000000000 0.054945055

QB 0.083333333 0.250000000 0.666666667

RB 0.000000000 0.881632653 0.118367347

S 0.000000000 1.000000000 0.000000000

SS 0.000000000 0.962616822 0.037383178

TE 0.051546392 0.061855670 0.886597938

WR 0.000000000 0.956363636 0.043636364

1号聚类(Cluster 1)

主要特征: 高度集中于内线和进攻线球员: OL(进攻线): 100% OG(进攻护锋): 97% OT(进攻截锋): 95% C(中锋): 93% DT(防守截锋): 79%

这些位置典型特征是体型大、力量强,符合PCA能力分型的“体型/力量主导”类别。

2号聚类(Cluster 2)

主要特征: 集中于速度型和二线防守球员: CB(角卫): 99.7% FS(游动安全卫): 99% LB(线卫): 100% S(安全卫): 100% SS(强侧安全卫): 96% WR(外接手): 96% RB(跑卫): 88%

这些位置通常身材相对较小,速度、灵活性、爆发力强,对应“敏捷/速度主导”能力分型。

3号聚类(Cluster 3)

主要特征: 集中于力量+爆发型或多面手球员: EDGE(冲传手): 100% LS(长传手): 100% DE(防守端锋): 85% FB(近卫跑卫): 83% ILB(内线卫): 84% OLB(外线卫): 73% TE(近端锋): 89% QB(四分卫): 67% DE、FB、TE、QB、OLB、ILB等为多功能或力量/爆发型球员,显示这些球员的身体素质在主成分空间中更接近第三类。

这些位置球员兼具力量、体型和一定灵活性,属于“力量/多面手型”分型。

Copyright © 2022 世界杯比赛规则|年世界杯|世界杯精彩进球集锦站|13691688.com All Rights Reserved.