Predicting Elite NBA Lineups Using Individual Player Order Statistics

Susan E. Martonosi

, Martin Gonzalez

, Nicolas Oshiro

1: Harvey Mudd College

2: University of San Francisco

Abstract:

NBA team managers and owners try to acquire high-performing players. An important consideration

in these decisions is how well the new players will perform in combination with their teammates. Our objective is to

identify elite ﬁve-person lineups, which we deﬁne as those having a positive plus-minus per minute (PMM). Using

individual player order statistics, our model can identify an elite lineup even if the ﬁve players in the lineup have

never played together, which can inform player acquisition decisions, salary negotiations, and real-time coaching

decisions. We combine seven classiﬁcation tools into a unanimous consent classiﬁer (all-or-nothing classiﬁer, or ANC)

in which a lineup is predicted to be elite only if all seven classiﬁers predict it to be elite. In this way, we achieve high

positive predictive value (i.e., precision), the likelihood that a lineup classiﬁed as elite will indeed have a positive

PMM. We train and test the model on individual player and lineup data from the 2017-18 season and use the model

to predict the performance of lineups drawn from all 30 NBA teams’ 2018-19 regular season rosters. Although the

ANC is conservative and misses some high-performing lineups, it achieves high precision and recommends positionally

balanced lineups. Keywords: Basketball; point differential; lineups; classiﬁcation.

1 Introduction

The strategy of professional basketball continues to evolve, leading NBA team managers and coaches to continuously

seek new players who have the relevant skills to win the annual championship. While individual player performance is a

signiﬁcant contributor towards team success, we are interested in the overarching factors contributing to high-performing

teams and the collection of players that comprise them. This leads us to develop a model to predict high-performing

ﬁve-person lineups, including those which have never played together before, using individual player order statistics.

This model can help general managers considering trades and free agents evaluate how potential incoming players

might perform in combination with existing team members. Coaches can also use this model as an in-game aid to see

which player sitting on the bench may have a positive impact as a substitute.

Much of the work in the literature focuses on methods for predicting the outcomes of basketball games, tournaments,

or seasons.

Ozmen quantiﬁes the incremental change in win probability per unit change in several game statistics [

Loeffelholz et al. use neural networks to predict the outcomes of NBA games more successfully than expert prediction

[

]. Lin uses historical team data with real-time updating during a game to predict the game’s outcome [

]. Shen et

al. and Hua present methods for predicting the results of the NCAA “March Madness” collegiate tournament [

as do several articles in a 2015 special issue of this journal [

]. Gumm et al. and Zimmerman et al. use machine

learning approaches on historical data to predict outcomes of NCAA and NBA playoffs [

]. Vaz de Melo et al.

forecast NBA outcomes using network analysis [

], while Cheng et al. use the maximum entropy principle to predict

the outcome of NBA playoffs [

]. Ruiz and Cruz present a model to predict the win probability of teams participating

in the NCAA March Madness Tournament by combining a Poisson factorization method with a model borrowed from

soccer which quantiﬁes a team’s attack and defense coefﬁcients [27].

A body of work examines the role of individual players in determining team success. Several papers use centrality

metrics and other methods from the ﬁeld of complex social networks to identify prominent players and evaluate team

performance [

]. Deshpande and Jensen quantify an individual player’s contribution to their team’s overall

win probability [

]. However, it has also been suggested that an individual’s contribution towards a team’s success, as

measured by real Plus-Minus, cannot be disentangled from the inﬂuence of other players [

]. While statistics such

as Box Plus Minus (BPM) attempt to measure an individual player’s contributions to the team relative to that of their

teammates, BPM does not give an indication of how a ﬁve-player lineup will perform together.

Our paper builds upon work that examines the effectiveness of ﬁve-person lineups in the NBA. For instance,

Maymin et al. calculate the synergies of each NBA team by comparing their 5-player lineups’ effectiveness to the

“sum-of-the-parts” [

]. Robertson uses a graphical model to capture the series of events that occur during a possession

and estimate the probabilities of game-play events given the players on the ﬂoor [

]. Kalman and Bosch redeﬁne

traditional basketball positions using model-based clustering and identify how interactions between these new positions

arXiv:2303.04963v1 [stat.AP] 9 Mar 2023

result in lineups with the highest net-rating [

]. Pelechrinis presents LinNet, which exploits the dynamics of a directed

network that captures the performance of lineups against a speciﬁc opponent lineup [

]. In addition, Sisneros and Van

Moer use point differential (plus-minus) to measure the contribution of individual players towards a team’s success,

and predicts the winning percentage of a given team [

]. Oh et al. simulate game play using a probabilistic network

model; however, a limitation of their work is the inability to predict the performance of a lineup using players from

outside the team [22].

Our work contributes towards this general discussion by developing a model that can predict the performance of

unseen lineups, as measured by the predicted sign of their point differential per minute. While most of the work we

have encountered focuses on guiding coaching decisions for in-game substitutions, our work goes beyond and can be

used as a tool for free agency decisions, trade negotiations, and calling up players from the NBA G league.

Our model identiﬁes lineups with a high probability of contributing a positive point differential, also known as

plus-minus, per minute (PMM). A positive PMM indicates that the lineup generally contributes positively to the team’s

relative score, whereas a negative PMM indicates that the lineup generally detracts from the team’s relative score. Our

objective is to predict elite lineups, which we deﬁne to be those having a high likelihood of contributing a positive

PMM. We combine seven classiﬁcation tools into a unanimous consent classiﬁer, which we call the all-or-nothing

classiﬁer (ANC). The ANC predicts a lineup to be elite only if all seven subclassiﬁers predict it to be elite. In this way,

we achieve high precision, the likelihood that a lineup classiﬁed as elite will indeed have a high PMM. Each tool takes

as input the individual player order statistics for the ﬁve players on a proposed lineup, capturing the lineup’s offensive

and defensive capabilities, and classiﬁes the lineup as either elite or not. The subclassiﬁers used are a decision tree,

random forest, boosting, a support vector machine, k-nearest neighbors, logistic regression, and linear discriminant

analysis.

We train and test the model on individual player data from the 2017-18 NBA regular season purchased from

BigDataBall.com and supplemented with hustle stats from NBA.com. Hustle stats are included since they quantify a

player’s defensive contribution to the lineup. To train the models, we use a random sample of 712 lineups with at least

25 minutes of playing time. We then test the model on a random sample of 176 lineups having at least 25 minutes of

playing time. We achieve a precision of 86.7% on the testing set, indicating that lineups classiﬁed as elite using the

ANC are highly likely to contribute a positive PMM to the team. We then train the model on the full 2017-18 NBA

regular season data and use the trained model to predict elite lineups for all 30 NBA teams during the 2018-19 regular

season. We ﬁnd that the classiﬁer achieves a high precision of 76.9% (compared to 62.1% prevalence of elite lineups),

even when used to make predictions from one season to the next.

This paper will be structured as follows. In Section 2, we provide an overview of the methodology used in our

classiﬁcation framework. Section 3 describes the datasets we use to train and test our model and outlines the steps taken

to clean and merge the data. Section 4 presents results of applying our model to a test data set as well as to making

predictions for the 2018-19 regular season, and we compare our predictions against actual season outcomes. We present

suggestions for future work and our conclusions in Section 5.

2 Methodology

In this section, we describe the classiﬁcation problem our method solves, the all-or-nothing classiﬁer (ANC) we develop

to solve it, and the method of cross-validation we use to tune hyperparameters to achieve a reliably high positive

predictive value, or precision.

2.1 Classifying the Plus-Minus Per Minute of an Unseen Lineup

The performance of a ﬁve-person lineup is often summarized in their plus-minus per minute (PMM) metric. This

measures the cumulative point differential (points earned minus points scored by the opponent) accrued by the lineup

each time they appear together on the court, divided by their total playing time together as a lineup. Lineups with

positive PMM contribute to the team’s net score per unit time, while lineups with negative PMM give up more points

than they earn per unit time. Existing methods for predicting the performance of unseen lineups, such as the adjusted

plus-minus, attempt to account for a player’s individual contribution to the score after controlling for the other players

Table 1: Example individual player statistics for Lineup 1 [1].

Jaylen Kyrie Marcus Terry Jayson

Player Brown Irving Morris Rozier Tatum

Field Goals per Minute 0.17 0.28 0.18 0.15 0.16

Defensive Blocks per Minute 0.01 0.01 0.01 0.01 0.02

Table 2: Example individual player statistics for Lineup 2 [2].

James Chris Eric Nene Clint

Player Harden Paul Gordon Hilario Capela

Field Goals per Minute 0.26 0.20 0.19 0.18 0.22

Defensive Blocks per Minute 0.02 0.01 0.01 0.02 0.07

on the court [

]. However, these methods are based solely on ﬁtting the observed plus-minus to indicators of who is or

is not on the court; they do not incorporate other statistics about individual player performance.

The method we pursue is to predict an unseen lineup’s PMM using individual player statistics. For a useful primer

on basketball statistics, see Kubatko et al. [

]. Let

be the set of all ﬁve-person lineups, and let

be the set of

players. For each lineup

l ∈ L

, let

p(l) ∈ P

be the set of ﬁve players included in lineup

. For each player

p ∈ P

let

, . . . , s

be the collection of statistics observed on a per-minute basis, e.g., ﬁeld goals per minute played, or

defensive rebounds per minute. We thus wish to predict the plus-minus per minute of lineup

, denoted

P MM(l)

as a

function of the values of s

, . . . , s

for each player p ∈ p(l):

P MM(l) = f (s

|p ∈ p(l), i ∈ 1, . . . , n).

Earlier work demonstrates the challenge of predicting

P MM(l)

directly [

]. Thus, in this work we focus on

predicting the sign of P MM(l), classifying a lineup l as elite if P M M (l) > 0.

We make the assumption that the connection between PMM and player statistics is distributional. For each

lineup and each statistic, we sort the values of the statistic from smallest to largest for the ﬁve players on that lineup.

We then use these order statistics as predictors in the model. This is effectively predicting

sign(P MM)

using the

, 40

, 60

, 80

, and

100

quantiles of the distributions of each statistic for the ﬁve players on the lineup. For

instance, one player might contribute highly to ﬁeld goal percentage, while another player is contributing highly to

blocks, and all of these factors contribute to the overall PMM experienced by the lineup.

As a small example, consider the two player statistics of ﬁeld goals per minute (FGM) and defensive blocks per

minute (DBM) in the context of Lineup 1 (Jaylen Brown, Kyrie Irving, Marcus Morris, Terry Rozier, Jayson Tatum of

the Boston Celtics) and Lineup 2 (James Harden, Chris Paul, Eric Gordon, Nene Hilario, Clint Capela of the Houston

Rockets), during the 2017-18 regular season. We list each player’s values for the two statistics in Tables 1 and 2 [

For each lineup, we sort the ﬁve values of FGM from smallest to largest and the ﬁve values of DBM from smallest

to largest to obtain the ten predictors

F GM

(1)

F GM

(2)

. . .

F GM

(5)

DBM

(1)

DBM

(2)

. . .

DBM

(5)

, where the

subscript

(i)

represents the

smallest value of the ﬁve players (known as the

order statistic). For the two lineups

given, this results in the two rows of the data set shown in Table 3. Thus, our set of predictors is the order statistics

of the ﬁve players on the lineup, for each player performance statistic collected. By using the order statistics for both

offensive and defensive metrics, we can capture the overall playing proﬁle of the lineup.

2.2 Choice of Classiﬁer

In order for our classiﬁer to be an effective decision support tool for personnel decisions, we want to be conﬁdent that

lineups ﬂagged as

elite

by our classiﬁer will indeed perform well. Most classiﬁcation frameworks strive to maximize

the overall accuracy of the classiﬁer, which is the probability that the classiﬁcation given by the model matches the true

class of the datum. However, in the case of identifying elite basketball lineups, we are more interested in achieving

a high positive predictive value, or precision. This is the conditional probability that a lineup is actually elite (has a

positive PMM) given that the classiﬁer predicts it to be elite.

Table 3: Example player order statistics for the two lineups given in Tables 1 and 2. These order statistics serve as ten predictors in the classiﬁcation

framework.

Lineup F GM

(1)

F GM

(2)

F GM

(3)

F GM

(4)

F GM

(5)

DBM

(1)

DBM

(2)

DBM

(3)

DBM

(4)

DBM

(5)

1 0.15 0.16 0.17 0.18 0.28 0.01 0.01 0.01 0.01 0.02

2 0.18 0.19 0.20 0.22 0.26 0.01 0.01 0.02 0.02 0.07

Figure 1: The decision process used by the all-or-nothing classiﬁer to categorize lineups.

To enhance the precision of the classiﬁer, we combine seven commonly used classiﬁers (which we will refer to as

subclassiﬁers) into an all-or-nothing classiﬁer (ANC), in which a lineup is classiﬁed as

elite

if and only if all seven

subclassiﬁers predict it to be so. Consensus classiﬁers have successfully been used in other applications, including

computational biology [

]. The seven subclassiﬁers are a decision tree, a random forest, boosting, a support vector

machine,

-nearest neighbors, logistic regression, and linear discriminant analysis. Each of these algorithms casts a

vote on the classiﬁcation of a given lineup based on its individual player order statistics. A schematic of the ANC is

shown in Figure 1.

2.3 Parameter Tuning

Each of these subclassiﬁcation methods has a set of parameters governing the algorithm. Rather than tuning each

subclassiﬁer individually, we tune the parameters as an ensemble to achieve a reliably high precision of the overall

ANC classiﬁer. We use

-fold cross validation on a training set of lineups. (The data are described in greater detail in

Section 3.) The parameters associated with each subclassiﬁer and with the overall ANC classiﬁer are described brieﬂy

here. Then we provide pseudocode outlining the ensemble tuning procedure.

2.3.1 Decision Tree Parameters

A decision tree iteratively determines threshold values of the predictor variables along which to split the classiﬁcation

of a lineup as

elite

or not. Common practice is to build a full tree and then prune it according to a cost-complexity

tradeoff. The cost-complexity parameter

in R’s

rpart

package for building and pruning decision trees represents

the percentage reduction in inter-branch variability required to justify each subsequent split. We use

cp = −1

to build

the initial complete tree, and then tune the value of

used to prune the tree. Additionally, we use the

weights

argument to weight the observations by the total number of minutes played by each lineup so that lineups having low

total playing time have less inﬂuence on the ﬁtting of the tree. To prioritize achieving a high positive predictive value,

we tune the false positive element of the

loss

matrix option in

parms

to more heavily penalize misclassifying a

not

elite lineup as elite. All other parameters are assigned the default value given in the rpart function of R.

2.3.2 Random Forest Parameters

In a random forest classiﬁer, many decision trees are built on bootstrapped samples of the training data, and the subset of

predictors considered in each splitting decision is randomly selected from the full set of predictors. The classiﬁcations

yielded by each tree are used as votes in the forest classiﬁcation of an individual observation. The parameter that

governs this voting in a binary classiﬁer is the cutoff vector

(c, 1 − c)

. If we observe a fraction

of trees classifying a

lineup as

elite

and a fraction

1 − p

of trees classifying the lineup as

not elite

, the forest classiﬁcation would be

the category achieving the maximum of

(p/c, (1 − p)/(1 − c))

. In other words,

represents the proportion of trees

yielding an

elite

classiﬁcation at which the forest classiﬁer would be ambivalent between the two groups. Because

we seek to maximize the precision of the ANC, we tune our random forest using values of

c ≥ 0.5

. We also tune the

parameter

ntree

governing the number of trees in the forest. All other parameters are assigned the default value

given in the

randomForest

function in R’s

randomForest

package, including the parameter

mtry

, governing

the number of predictors included in each subset, whose default value is the square root of the number of predictors.

2.3.3 Boosting Parameters

The boosting algorithm iteratively builds a classiﬁcation tree such that subsequent splits are themselves determined by

decision trees ﬁt to the residuals (misclassiﬁcations) from earlier splits. The three parameters tuned in our classiﬁer are

the number of trees to build (referred to as

mfinal

in R’s

boosting

function of the

adabag

package), the depth

of each tree (

maxdepth

) and the cost complexity value (

) used to prune each tree. Additionally, we weight the

observations by the total number of minutes played by each lineup to de-emphasize lineups with low playing times. All

other parameters are assigned the default value given in the boosting function of the adabag package.

2.3.4 Support Vector Machine Parameters

Our support vector machine (SVM) subclassiﬁer uses a radial basis kernel and tunes two parameters,

cost

and

gamma

cost

represents the penalty associated with misclassiﬁcation, and

gamma

controls the rate of decay of inﬂuence an

individual support vector has on the classiﬁcation of points a given distance away. Smaller

gamma

reduces the rate of

decay, indicating that the support vector can inﬂuence the classiﬁcation of points farther away; larger

gamma

has the

opposite effect. Other tunable parameters are set to the default used in the svm function of R’s e1071 package.

2.3.5 K-Nearest Neighbors Parameters

The

-nearest neighbors subclassiﬁer labels a lineup according to the most common label of its

closest lineups in the

predictor space. Thus, we tune the value of k, using the knn function of package class.

2.3.6 Logistic Regression Parameters

Logistic regression predicts the log odds,

, of a lineup being elite as a linear function of the predictors. We ﬁt

the model using the function

glm

in the R package,

stats

. Then the classiﬁcation of the lineup is determined by

calculating the estimated probability of elite classiﬁcation,

(1 + e

−LO

)

−1

and deeming a lineup

elite

if the estimated

probability of being elite exceeds a stated threshold. A natural default threshold is

50%

; however, given our interest

in maximizing precision, we tune the threshold for values greater than or equal to

50%

, essentially requiring stronger

evidence before classifying a lineup as

elite

. The parameter

thresh

glm

refers to one minus the desired

probability threshold.

2.3.7 Linear Discriminant Analysis Parameters

Linear discriminant analysis, as implemented in the

lda

function in the

MASS

package of R, does not rely on tuning

parameters. However, it does permit us to weight the observations by the total number of minutes played by each lineup.

2.3.8 ANC Parameters

The tunable parameter for the overall ANC classiﬁer is the number of subclassiﬁers identifying a lineup as

elite

required for the ANC to classify the lineup as

elite

. We refer to this as

numVotes

. While we hypothesize that we

will get the highest precision by requiring all seven classiﬁers to agree (as the name “all-or-nothing classiﬁer” suggests),

we verify this assumption by tuning the number of votes required before a lineup is classiﬁed as elite.

2.3.9 Ensemble Tuning Procedure

Rather than tune each subclassiﬁer individually, we wish to select the best combination of parameters across all

subclassiﬁers that yields a reliably high precision of the ANC in

-fold cross validation. We perform grid search

over the combination of parameters for all seven subclassiﬁers and the

numVotes

parameter. For each parameter

combination, we compute the ANC classiﬁcation on each fold and calculate the precision achieved on that fold. We

then average the precision over the folds for each parameter combination. This gives us a good metric for the expected

precision of the ANC for each combination of tuning parameters. However, in addition to maximizing average precision,

we also seek a classiﬁer that performs robustly. That is, of those parameter combinations achieving a relatively high

precision, we will favor those for which the standard deviation of precision across the

folds is low (i.e., it is a robust

classiﬁer), and for which the precision on any one fold is not too low (i.e., it is a reliable classiﬁer). We can then examine

these metrics across all combinations of parameters and select parameter values that yield a sufﬁciently high average

precision along with a high minimum precision and a low standard deviation. The pseudocode for this procedure is

outlined in Algorithm 1.

Algorithm 1 Pseudocode for parameter tuning in the ANC.

Split the data into training and testing sets

Split the training set into 10 folds for cross-validation

for each fold k do

Standardize the fold-training data by subtracting the mean and dividing by the standard deviation of each numerical

value over the

k − 1

folds. Scale and shift the left-out fold by the fold-training data’s means and standard deviations

to prevent data leakage.

for each subclassiﬁer s do

for each combination c of tuning parameters do

Fit a model to the 9 folds not in k.

Predict the classes of the observations in the k

fold.

Store these predictions as preds[k,s,c].

end for

for each combination c of tuning parameters, including numVotes do

Count the number of subclassiﬁers s, for which preds[k,s,c] is elite.

The ANC classiﬁcation for the observations in the

fold is

elite

if this number is at least as large as

numVotes, and not elite otherwise.

Compute the confusion matrix comparing the ANC classiﬁcation to the known classiﬁcation for the observations

on the k

fold, and store precision[k,c].

end for

for each combination c of tuning parameters, including numVotes do

Calculate the average, minimum and standard deviation of precision over the 10 folds.

end for

Choose the combination of parameters that achieves a reliably high precision (high average value, high minimum

value, low standard deviation).

Because we tune all possible combinations of parameters for the seven classiﬁers and the ANC simultaneously, the

search space grows exponentially with the number of distinct values tested for each parameter. Thus, we use a coarse

Table 4: Parameter values used in grid search.

Subclassiﬁer Parameter Values Tested

Decision Tree cp (cost complexity) (-1, 0.01, 0.05)

−1 refers to full tree

loss (misclassiﬁcation penalty) (1, 1.5, 2)

Random Forest c (cutoff) (0.5, 0.7)

ntree (number of trees) (100,500)

Boosting mfinal (number of trees) (100,500)

maxdepth (depth of each tree) (1, 2, 3)

cp (cost complexity) (0.01, 0.05)

Support Vector Machine cost (misclassiﬁcation penalty) (0.1, 1, 10)

gamma (inﬂuence decay) (0.01, 0.1, 1)

K-Nearest Neighbors k (number of neighbors) (3, 5,7)

Logistic Regression thresh (1-probability threshold) (0.05, 0.25, 0.5)

All-or-Nothing Classiﬁer (ANC) numVotes (agreement required) (1, 2, 3, 4, 5, 6, 7)

grid, selecting a few distinct values for each parameter, with values informed by preliminary testing not reported here.

The speciﬁc values tested are given in Table 4.

With the model framework in place, we now describe our data.

3 Data and Implementation

The novelty of this work lies in predicting lineup performance using the sorted individual statistics of the players

comprising the lineup. To do this requires merging data from two sources, cleaning the data to ensure lineups and

players are matched correctly, ﬁltering the data based on minutes played to reduce noise, and splitting the data into

training and testing sets. This process is described here.

3.1 Data Sources

We wish to make predictions and compare those predictions to actual outcomes for the 2018-19 regular NBA season,

the most recent season unaffected by the Covid-19 pandemic. We train and test our model using player and lineup

statistics from the 2017-18 NBA regular season, using data from several sources. We purchased play-by-play data

from BigDataBall.com that contains, for every play in each game of the season, the ten people on the court, which play

occurred, and the clock time. From this source, we are able to recreate per minute point differentials (PMM) for each

lineup. While we could also use this play-by-play data to compute individual player statistics, we instead obtained

player box and hustle statistics directly from NBA.com for simplicity and accuracy.

The 28 individual player statistics used to form our 140 order statistic predictors are listed in Appendix A. We focus

on statistics that offer direct measurements of play rather than statistics such as the Box Plus-Minus (BPM) which itself

is an aggregation of many statistics and is therefore less interpretable.

3.2 Data Cleaning and Filtering

Because we are merging data from two sources, NBA.com and BigDataBall.com, we must ﬁrst clean both data sets

to impose consistent player naming, for instance in the use of punctuation, nicknames or sufﬁxes. Then we match

individual player statistics to lineups, sorting each statistic from smallest to largest among the players in the lineup; this

creates our vectors of order statistics which we use as predictors.

After matching players and lineups, any players appearing in only one of the two data sets (and the lineups in which

they appear) should be discarded. There was only one such player who is discarded in this way: Ty Lawson, who has

playoff statistics appearing in the NBA.com data but no regular season information in the BigDataBall.com stints data.

Therefore, lineups including Lawson are not used for training or testing.

Next, we ﬁlter out individual players (and the lineups to which they belong) having fewer than 50 minutes of playing

time during the season. This is to ensure that the individual player statistics, when adjusted per minute, are estimated

with low variance. Of the 540 players represented in the raw data, 483 meet this playing time threshold. Likewise, we

ﬁlter out lineups, comprised of these 483 players, having fewer than 25 minutes of playing time together. Of the over

14,000 distinct NBA lineups appearing in the 2017-18 season in which each player had at least 50 minutes of playing

time, 888 have at least 25 minutes of lineup playing time. It is worth noting that some previous work recommends

requiring a higher playing time threshold for lineups to improve model accuracy. However, doing so would dramatically

reduce our dataset. For example, requiring 50 minutes of lineup playing time would yield only 374 lineups on which to

train and test. Given the large number of order statistics used as predictors, doing so runs the risk of overﬁtting the

model. Instead, to mitigate the inﬂuence of lineups with shorter playing time, we use lineup playing time as a weight in

the regression tree, linear discriminant analysis, and boosting classiﬁers.

3.3 Data Splitting

From these 888 lineups, we use an 80%-20% split to create a training set of 712 lineups and a testing set of 176

lineups. In both sets, a lineup is given the label

elite

if its PMM is strictly positive; otherwise it is labeled as

not

elite

. Our training set includes 375 lineups labeled as

elite

and 337 lineups labeled as

not elite

; our testing

set includes 95 lineups labeled as elite and 81 labeled as not elite.

4 Results

We now describe the results of tuning the ANC parameters and applying the trained ANC to the test data. Once satisﬁed

with the predictive power of our trained ANC, we interpret the tuned subclassiﬁers to understand which predictor

variables appear most important in predicting lineup quality. We then use the ANC to make predictions about lineup

combinations for the 2018-19 NBA team rosters.

4.1 Parameter Tuning Results

Common practice in parameter tuning is to choose the combination of parameters that achieves the best metric when

averaged over the ten cross-validation folds. In our case, that would be the parameter combination that achieves

maximum average precision. However, in our initial analysis of the parameter tuning results, we found that the

combination of parameters for which the average precision is highest often has high variability in precision across the

ten folds. This implies a classiﬁer that occasionally and unpredictably will perform poorly on certain data sets. Figure

2(a) shows for each parameter combination used in our tuning process the average precision achieved over ten folds

against the worst-case (minimum) precision over ten folds. We exclude the results from parameter combinations in

which the precision was calculated as

in at least one fold. Upon further investigation, these combinations arise

exclusively when

numVotes = 7

and the logistic regression parameter

thresh = 0.05

. In such cases, the ANC

is so conservative that some folds have zero lineups classiﬁed as

elite

, yielding the value

when computing

precision. Moreover, those lineups attaining a numerical value for precision have such a small number of lineups

classiﬁed as

elite

that the standard deviation in precision is quite high. We prefer a parameter combination that

achieves a satisfyingly high average precision and a high worst-case precision over the ten folds. Speciﬁcally, we

choose a parameter combination that lies on the efﬁcient frontier of average precision and worst-case precision, shown

in red in Figure 2(a). Additionally, although we are more interested in precision than overall accuracy, Figure 2(b) plots

the average precision against the average accuracy attained over ten folds in all parameter combinations, along with the

efﬁcient frontier (in red) of average precision and average accuracy.

The blue point in Figure 2(a) corresponds to parameter combinations achieving an average precision of 79.8% and

a worst-case precision of 50%. Figure 2(b) shows this point in blue on the plot of average precision versus average

accuracy; we see that these parameter combinations also lie on the efﬁcient frontier of precision and accuracy, attaining

an average accuracy of 53.2%. We choose the optimized parameter combination to train the ﬁnal model from among

Figure 2: Results of the parameter tuning. For each combination of parameters, we plot (a) the average precision

achieved by the ANC over ten folds versus the minimum precision of that classiﬁer on the ten folds; and (b) average

precision versus average accuracy. In both plots, the efﬁcient frontier is shown in red, and the combination of minimum

and average precision achieved by our optimized parameter combination is plotted as the blue diamond.

this set, as given in Table 5. The values of

(KNN),

(decision tree),

loss

(decision tree),

ntree

(random forest),

thresh

(logistic regression), and

numVotes

(ANC) are uniquely determined. However, the

parameter tuning was not sensitive to the values of

mfinal

(boosting),

maxdepth

(boosting),

cost

(SVM), or

gamma

(SVM). For these parameters, we select the values yielding the second best outcome in the efﬁcient

frontier shown in Figure 2, corresponding to an average precision of 77.0%, a minimum precision of 66.7%, and an

average accuracy of 59.3%.

We note that the performance of the classiﬁer appears most sensitive to the number of votes required by the

seven subclassiﬁers for a lineup to be predicted as

elite

. Average precision begins to drop off considerably when

requiring fewer than six out of the seven subclassiﬁers to agree, and seven is preferable to six. Additionally, the ANC’s

performance appears sensitive to the choice of the cost complexity parameter for the decision tree, the probability

threshold for logistic regression, and the cutoff parameter for the random forest. The precision of the ANC appears

robust to the choice of the remaining parameters.

4.2 Testing Results

We ﬁt our ANC to the full, standardized, training set. We shift and scale the testing data using the training data’s

means and standard deviations to avoid data leakage, and we apply our trained ANC to the testing data. The testing

data contains 176 lineups, of which 95 lineups are labeled as

elite

. The confusion matrix is given in Table 6. Of

15 lineups predicted to be

elite

, 13 of these have a true label of

elite

, indicating a strictly positive PMM. This

corresponds to a precision of 86.7%. The model has an overall accuracy of 52.3%, which is tolerated because our focus

is on predicting elite lineups.

Additionally, we can compare the known PMM for lineups predicted to be

elite

against those that are not. Figure

3 provides comparative boxplots for the average point differential per minute of lineups predicted (or not) to be

elite

The mean PMM of predicted

elite

lineups is

+0.30

, which is statistically higher than the mean PMM of predicted

not elite

lineups of

−0.00049

(

p = 0.00013

). Thus, while the ANC does not predict PMM directly, it does a good

Table 5: Tuned parameter values used in ﬁnal ANC.

Subclassiﬁer Parameter Chosen Value

Decision Tree cp (cost complexity) 0.05

loss (misclassiﬁcation penalty) 1

Random Forest c (cutoff) 0.7

ntree (number of trees) 500

Boosting mfinal (number of trees) 500

maxdepth (depth of each tree) 3

cp (cost complexity) 0.01

Support Vector Machine cost (misclassiﬁcation penalty) 1

gamma (inﬂuence decay) 1

K-Nearest Neighbors k (number of neighbors) 7

Logistic Regression thresh (1−probability threshold) 0.25

All-or-Nothing Classiﬁer (ANC) numVotes (agreement required) 7

Table 6: Confusion matrix for the tuned ANC applied to the test data set. Of the 15 lineups predicted to be

elite

, 13

have a true label of elite, corresponding to a precision of 86.7%.

Predicted Class

Elite Not Elite

True

Class

Elite 13 82

Not Elite 2 79

job partitioning the teams into an

elite

group predicted to have a positive PMM and a

not elite

group predicted

to have a negative PMM. It is worth noting that there is overlap in the distributions, and the ANC misses some very

good lineups (e.g., Toronto Raptors’ Delon Wright, DeMar DeRozan, Fred Van Vleet, Jakob Poeltl, and Pascal Siakam,

which had 40.0 minutes of playing time and a PMM of 1.00.). Nonetheless, the distribution of PMM of

elite

lineups

is generally higher than the distribution of PMM of

not elite

lineups, and the two lineups having negative PMM

that were misclassifed by the ANC to be

elite

have only moderately negative PMMs. This points to the ANC having

good judgment about elite lineups.

One might also wonder whether the complete set of ﬁve-player order statistics is required by the ANC to achieve

high precision. Analysis of a simpler model, using only the ﬁrst order statistics (i.e., the minimum of each individual

player statistic on the lineup), is presented in Appendix B. The simpler model achieves a testing precision of only 75%

compared to the ANC’s testing precision of 86.7%.

4.3 Interpretation of the Subclassiﬁers

After training the ANC using the optimized parameter values given in Table 5, we now try to interpret the classiﬁcations

given by each subclassiﬁer to understand which variables are most important in determining lineup performance. The

-nearest neighbor and support vector machine are not particularly interpretable, so we focus primarily on the other ﬁve

subclassiﬁers.

By far, the most consistently important predictor of elite classiﬁcation over the ﬁve interpretable subclassiﬁers is the

smallest plus-minus per minute of the ﬁve players on the lineup, PMM

(1)

. This is not surprising because if the lineup’s

worst plus-minus is high, then all ﬁve players’ plus-minuses must be high, indicating a high level of play across any

lineups in which the players took part.

In the decision tree, PMM

(1)

is the only branch of the tree that remains after pruning. Elite classiﬁcation requires

standardized PMM

(1)

> 0.32.

Table 7 gives the estimated coefﬁcients of the 24 standardized predictors identiﬁed by backwards stepwise logistic

regression. The three most important predictors in the model, both in terms of coefﬁcient magnitude and statistical

signiﬁcance, are PMM

(1)

, the lowest plus-minus per minute among the ﬁve players, FG3A

(5)

, the highest rate of 3-point

Figure 3: Boxplots of lineup point differential per minute (PMM) by ANC-predicted label.

Table 7: Estimated coefﬁcients of logistic regression model. Signiﬁcance codes: ***:0.001, **:0.01, *:0.05, .:0.1

Predictor Estimate p-value

(Intercept) 0.12970 0.08068 .

FGM

(3)

0.17638 0.08454 .

FGA

(4)

-0.16022 0.10885

FG3M

(5)

0.43471 0.04551 *

FG3A

(5)

-0.60677 0.00666 **

FG3PCT

(2)

0.22172 0.01398 *

FG3PCT

(3)

-0.24260 0.01370 *

FTA

(4)

0.21057 0.02255 *

FTPCT

(1)

0.19298 0.02183 *

FTPCT

(3)

-0.22849 0.03709 *

FTPCT

(4)

0.27984 0.02016 *

FTPCT

(5)

0.20627 0.04374 *

DREB

(2)

0.17627 0.03486 *

TOV

(4)

-0.15526 0.06852 .

BLK

(5)

0.13514 0.10880

(5)

0.20912 0.01139 *

PFD

(1)

-0.19257 0.01774 *

PMM

(1)

0.85462 ¡ 2e-16 ***

CONTESTEDSHOTS

(4)

-0.19241 0.04944 *

CONTESTEDSHOTS2PT

(1)

0.19000 0.04680 *

CONTESTEDSHOTS3PT

(5)

0.20103 0.03562 *

SCREENASSISTS

(2)

-0.17644 0.06542 .

SCREENASSISTS

(3)

0.30631 0.00984 **

SCREENASSISTS

(5)

0.23956 0.01057 *

BOXOUTS

(3)

-0.26324 0.02719 *

ﬁeld goals attempted per minute among the ﬁve players, and SCREENASSISTS

(3)

, the median rate per minute of

screens that led to baskets. Additionally, we notice that a variety of offensive and defensive statistics, including hustle

statistics, are selected by the model to predict lineup performance.

For the cases of linear discriminant analysis, boosting, and random forests, predictor importance can be assessed

graphically, as shown in Figure 4. Figure 4(a) shows the coefﬁcient of each standardized predictor in the linear

discriminant analysis (LDA) model, plotted against quantiles of the normal distribution. The predictors in red have

coefﬁcients that are particularly large in magnitude, indicating their inﬂuence in classifying teams as

elite

not

elite

. In addition to PMM

(1)

, statistics related to contested shots carry large importance in the LDA model. Figure

4(b) shows predictor importance for boosting, measured as reduction in the Gini impurity index, plotted against

quantiles of the normal distribution. Points in red have very large importance, and correspond to the predictors PMM

(1)

FTPCT

(2)

(second-lowest free throw percentage) and FG3PCT

(3)

(median 3-point ﬁeld goal percentage). Figure 4(c)

shows predictor importance for the random forest subclassiﬁer, measured as reduction in the Gini impurity index, plotted

against quantiles of the normal distribution. Predictors with especially high importance are all ﬁve players’ plus-minuses

and the second smallest 3-point ﬁeld goal percentage (FG3PCT

(2)

). We conclude that PMM

(1)

is consistently the most

important predictor of lineup performance, followed to a lesser extent by three-point ﬁeld goal percentages and other

metrics.

4.4 Predictions for the 2018-19 Season

Having a classiﬁer that attains a high precision in identifying lineups with positive PMM on the 2017-18 season data,

we now use the ANC to identify promising lineups for the thirty teams participating in the 2018-19 NBA regular season.

We reﬁt the ANC using the combined training and testing data, standardized, from 2017-18. We shift and scale the

Figure 4: Predictor importance for (a) linear discriminant analysis, (b) boosting, and (c) random forest subclassiﬁers.

Predictors listed in red are inﬂuential in that subclassiﬁer’s determination of elite lineups.

2018-19 regular season data using the means and standard deviations from 2017-18 to avoid data leakage and we use

the ﬁtted ANC to predict 2018-19 lineup performance. We can then compare our predictions against actual performance

during this season.

The 2018-19 regular season data is gathered as follows. For a given team roster, we generate all possible ﬁve-player

lineup combinations and their individual player order statistics for the metrics used in the ANC (listed in Appendix A).

We exclude any lineups wherein any of the players had less than 50 minutes of playing time in the 2017-18 regular

season. Of the 530 players listed on 2018-19 regular season rosters of the thirty NBA teams, 392 of these had sufﬁcient

playing time during the 2017-18 regular season and are included in the analysis. We can then predict which of these

possible lineups will have a positive PMM. This part of the process is interesting for two reasons. First, for rebuilding

teams, such as the 2018-19 Los Angeles Lakers, we can predict which combinations of newly traded and existing

players are likely to play well together. Second, we can compare the relative strength of NBA teams by measuring the

depths of their benches in terms of number of elite lineups.

Table 8 gives the number of predicted

elite

lineups for each team having more than zero elite predictions. The

six teams having the largest number of predicted

elite

lineups are the Golden State Warriors, the Houston Rockets,

the Indiana Pacers, the Boston Celtics, the Utah Jazz, and the Charlotte Hornets. The remaining teams each have

four or fewer predicted

elite

lineups, and fourteen out of the thirty teams have zero predicted

elite

lineups. By

prioritizing precision, the ANC is a conservative predictor of elite lineups.

4.4.1 Comparing 2018-19 Lineup Predictions To Realized Performance

We now compare the predictions made by the ANC to actual performance of lineups used during the 2018-19 regular

season.

We begin by examining the confusion matrix. For any given roster, the vast majority of lineups (predicted to be

elite

or not) may never be used, and the lineups used in the 2018-19 regular season occasionally involve players who

did not have enough 2017-18 playing time to be included in ANC predictions. Thus, to construct the confusion matrix,

we restrict our focus to those lineups appearing in the 2018-19 regular season data that were predicted by the ANC. On

the left of Table 9, we have the confusion matrix calculated on all 2018-19 regular season lineups for which an ANC

prediction was obtained; on the right of Table 9, we have the confusion matrix calculated on only those 2018-19 regular

season lineups having at least 25 minutes of playing time for which an ANC prediction was obtained. The purpose of

this restriction is to focus on lineups for which PMM is well-estimated. In the unrestricted case, we see that 67.4%

of those lineups predicted to be

elite

by the ANC experienced a positive PMM during the 2018-19 regular season,

Table 8: Number of ANC-predicted

elite

lineups for the 2018-19 NBA regular season for the 16 teams having

nonzero elite predictions.

Team Number of Predicted Elite Lineups

Golden State Warriors 127

Houston Rockets 62

Indiana Pacers 30

Boston Celtics 25

Utah Jazz 18

Charlotte Hornets 14

Oklahoma City Thunder 4

Denver Nuggets 4

Portland Trail Blazers 3

New Orleans Pelicans 3

Minnesota Timberwolves 3

San Antonio Spurs 2

Miami Heat 2

Washington Wizards 1

Toronto Raptors 1

Philadelphia 76ers 1

Table 9: Confusion matrix for the tuned ANC applied to the 2018-19 regular season data. The unrestricted case on the

left includes all 2018-19 regular season lineups having ANC predictions. Of the 46 lineups predicted to be

elite

, 31

have a true label of

elite

, corresponding to a precision of 67.4%. The restricted case on the right includes only those

ANC-predicted lineups having at least 25 minutes of playing time during the 2018-19 regular season. Of the 26 lineups

predicted to be elite, 20 have a true label of elite, corresponding to a precision of 76.9%.

Unrestricted Restricted

Predicted Class Predicted Class

Elite Not Elite Elite Not Elite

True

Class

Elite 31 397 Elite 20 201

Not Elite 15 294 Not Elite 6 129

compared to a 58.1% overall prevalence of elite lineups. When we restrict our focus to those lineups having at least 25

minutes of playing time, the precision increases to 76.9% compared to a prevalence of 62.1%. We conclude that using

2017-18 individual player order statistics in the ANC to predict 2018-19 lineup performance yields high-precision, if

somewhat conservative, predictions of lineups achieving positive PMM.

The 26 lineups predicted to be elite by the ANC in the restricted case are listed in Tables 10 (positive 2018-19

PMM) and 11 (non-positive 2018-19 PMM), along with their 2018-19 lineup playing time. Of the 20 lineups that the

ANC predicted to be elite and ultimately were elite in 2018-19, further investigation reveals that six of these involved

players that were newly acquired between the 2017-18 and 2018-19 seasons, as noted in the ﬁnal column Table 10.

We can also compare the realized 2018-19 lineup PMM between lineups predicted to be

elite

by the ANC to

those predicted to be

not elite

. Figure 5 gives boxplots of observed PMM during the 2018-19 regular season of

lineups for which an ANC prediction was given. Figure 5(a) shows this in the unrestricted case, while Figure 5(b)

restricts to lineups that had at least 25 minutes of playing time in the 2018-19 regular season. In both cases, we see

that the PMMs of lineups predicted by the ANC to be

elite

exhibit lower variance than those predicted to be

not

elite

. Moreover, when we restrict consideration to those lineups having at least 25 minutes of playing time, for which

the PMM is more precisely estimated, we see that the distribution of PMM for those predicted to be

elite

lies nearly

entirely in the positive range. A one-sided test of the means reveals that the mean PMM for those lineups predicted to

be elite is likely higher than the mean PMM for those lineups predicted to be not elite (p = 0.090).

Next we explore the relationship between ANC predictions and overall team bench strength and performance.

Table 10: 2018-19 lineups predicted to be elite by the ANC classiﬁer that had positive PMM (restricted to those lineups

having at least 25 minutes of playing time, as in Table 9.) Also noted are those lineups involving newly acquired players

that had not played for the team during the 2017-18 season.

Team Lineup

Minutes

Played

PMM Note

Boston Celtics

A. Horford, K. Irving, M. Smart, J.

Brown, J. Tatum

56 0.3

Charlotte Hornets

M. Williams, N. Batum, K. Walker, J.

Lamb, C. Zeller

593 0.16

M. Williams, N. Batum, K. Walker, J.

Lamb, W. Hernangomez

34 0.03

Willy Hernangomez played for the NY Knicks

in 2017-18.

Golden State War-

riors

K. Durant, S. Curry, K. Thompson, D.

Green, K. Looney

313 0.39

K. Durant, S. Curry, D. Cousins, K.

Thompson, D. Green

268 0.29

DeMarcus Cousins played for the New Orleans

Pelicans in 2017-18.

A. Iguodala, K. Durant, S. Curry, K.

Thompson, D. Green

178 0.69

A. Iguodala, K. Durant, S. Curry, K.

Thompson, K. Looney

141 0.17

A. Iguodala, K. Durant, S. Curry, K.

Thompson, J. Bell

36 0.73

A. Iguodala, S. Curry, D. Cousins, K.

Thompson, D. Green

29 0.77

DeMarcus Cousins played for the New Orleans

Pelicans in 2017-18.

A. Iguodala, K. Durant, S. Curry, D.

Green, K. Looney

25 0.8

Houston Rockets

C. Paul, P. Tucker, E. Gordon, J.

Harden, C. Capela

420 0.15

C. Paul, P. Tucker, J. Harden, A. Rivers,

C. Capela

30 0.07

Austin Rivers played for the LA Clippers in

2017-18.

Indianapolis Pac-

ers

T. Young, D. Collison, B. Bogdanovic,

V. Oladipo, M. Turner

555 0.1

T. Young, D. Collison, B. Bogdanovic,

V. Oladipo, D. Sabonis

133 0.13

T. Young, C. Joseph, B. Bogdanovic, V.

Oladipo, M. Turner

29 0.03

D. Collison, B. Bogdanovic, V. Oladipo,

M. Turner, D. Sabonis

26 0.39

Minnesota Tim-

berwolves

T. Gibson, R. Covington, A. Wiggins,

T. Jones, K. Towns

77 0.32

Robert Covington played for the Philadelphia

76ers in 2017-18.

Oklahoma City

Thunder

R. Westbrook, P. George, S. Adams, A.

Abrines, J. Grant

88 0.21

Utah Jazz

D. Favors, R. Gobert, J. Ingles, R.

O’Neale, D. Mitchell

107 0.18

K. Korver, T. Sefolosha, R. Rubio, R.

Gobert, D. Mitchell

26 0.7

Kyle Korver played for the Cleveland Cavaliers

in 2017-18.

Table 11: 2018-19 lineups predicted to be elite by the ANC classiﬁer that had non-positive PMM (restricted to those

lineups having at least 25 minutes of playing time, as in Table 9.) Also noted are those lineups involving newly acquired

players that had not played for the team during the 2017-18 season.

Team Lineup

Minutes

Played

PMM Note

Boston Celtics

A. Horford, K. Irving, A. Baynes, J.

Brown, J. Tatum

25 0

Golden State War-

riors

K. Durant, S. Curry, J. Jerebko, K.

Thompson, D. Green

45 -0.22

Jonas Jerebko played for the Utah Jazz in 2017-

18.

K. Durant, S. Curry, K. Thompson, D.

Green, J. Bell

26 -0.38

Houston Rockets

G. Green, P. Tucker, E. Gordon, J.

Harden, C. Capela

66 -0.32

C. Anthony, C. Paul, P. Tucker, E. Gor-

don, C. Capela

45 -0.11

Carmelo Anthony played for the Oklahoma

City Thunder in 2017-18.

Indianapolis

Pacers

T. Young, T. Evans, D. Collison, B.

Bogdanovic, D. Sabonis

28 -0.43

Tyreke Evans played for the Memphis Griz-

zlies in 2017-18.

Figure 5: Boxplots of 2018-19 lineup point differential per minute (PMM) by ANC-predicted label. (a) Unrestricted

case. (b) 2018-19 lineups are restricted to those having at least 25 minutes of playing time.

Figure 6: Number of lineups on each team achieving a positive PMM during the 2018-19 regular season versus the

number of lineups predicted by the ANC to be

elite

on each team. (a) Unrestricted case. (b) 2018-19 lineups are

restricted to those having at least 25 minutes of playing time.

Figure 6 plots, for each of the thirty NBA teams, the number of lineups used in the 2018-19 regular season that had

a positive PMM versus the number of lineups the ANC predicted would be

elite

for that team. Figure 6(a) does

this for the unrestricted case, while Figure 6(b) restricts the counts on the

-axis to those lineups having at least 25

minutes of playing time in 2018-19. We ﬁrst note that of the six teams predicted to have a relatively large number

elite

lineups (GSW, HOU, IND, BOS, UTA, and CHA), Boston and the Golden State Warriors, and to a lesser

extent, Houston, also have a relatively large number of lineups achieving positive PMM during the 2018-19 season in

the unrestricted case. When we restrict our focus to those lineups having at least 25 minutes of playing time, for which

the PMM is more precisely estimated, we see in Figure 6(b) that Boston, and to a lesser extent Indiana and Golden

State, have a robust bench of elite lineups.

It is worth noting that the ANC misses some teams that ended up having many lineups with positive PMMs. For

example, in Figure 6(a) we see that none of the Milwaukee Bucks’ lineups were predicted to be

elite

by the ANC,

and yet 47 of its lineups achieved positive PMM, 19 of which experienced at least 25 minutes of playing time; this

performance is on par with that of GSW. For the case of Milwaukee, 2018-19 turned out to be an unprecedented season

with the hiring of Head Coach Mike Budenholzer. The team achieved its best regular season record in several decades,

and the best regular season record in the NBA overall, won the Central Division and reached the Eastern Conference

Finals [

]. At the end of that season, Coach Budenholzer was selected to coach the East team in the 2019 NBA All-Star

Game, was named NBA Coach of the Year, and was awarded by his peers the National Basketball Association’s Coach

of the Year Award [

]. We speculate that while the ANC does a good job capturing general lineup ability, it is unable

to account for changes to coaching staff and style.

Likewise, Figure 6(b) shows that the team having the highest number of lineups with positive PMM when restricted

for 25 minutes playing time during 2018-19 is the Los Angeles Clippers, for which the ANC also predicts zero

elite

lineups. In this case, 59 of the 66 lineups used by the team during the 2018-19 season involved players lacking sufﬁcient

individual playing time in 2017-18 to be considered by the ANC. Restricting consideration to the 36 lineups that had at

least 25 minutes of lineup playing time in 2018-19, 34 involved players with insufﬁcient individual playing time in

2017-18 to be included in ANC predictions. Teams with a large number of rookie players, players who moved up from

Figure 7: Distribution of number of distinct positions exhibited on actual lineups, versus lineups predicted to be

elite

or not elite by the ANC.

the G league, and players lacking playing time in the previous season will not obtain many predictions from the ANC.

4.4.2 Other aspects of ANC performance

We now discuss how ANC performance relates to player position and team pace.

Balance of Positions.

The ANC considers only player order statistics in its predictions of lineup performance and

ignores other player information such as position. A natural question, then, is whether the lineups predicted to be

elite

by the ANC exhibit a balance of positions. Using player position information [

], we deﬁne the positions to

be Center, Power Forward, Point Guard (either a dedicated point guard or a point guard / shooting guard combination

player), Shooting Guard (either a dedicated shooting guard, point guard / shooting guard combination player, or small

forward / shooting guard combination player), and Small Forward (either a dedicated small forward, or a small forward

/ shooting guard combination player). We tally the number of distinct positions reﬂected in a given lineup. Those

lineups having more distinct positions are more balanced than those having fewer distinct positions. Figure 7 compares

the distribution of distinct positions between actual lineups, and those predicted elite or not elite by the ANC.

We see that ANC-predicted-

elite

lineups have more lineups exhibiting four or ﬁve distinct positions than those

predicted to be

not elite

. Lineups that saw actual playtime in 2018-19 were slightly more balanced overall than

those predicted

elite

by the ANC, but even 13.4% of actual lineups used three or fewer distinct positions, among

which many had positive PMM. This is indicative of a general trend in the NBA to move away from rigid positional

play [19]. We conclude that the ANC-predicted-elite lineups exhibit sufﬁcient positional balance.

Additionally, the order statistics used by the ANC capture sufﬁcient information about positional roles such that

the ANC is unlikely to recommend a lineup of ﬁve players of the same position. To demonstrate this, we generated

ﬁctitious rosters of 2018-19 players all holding the same position; we limited the analysis to pure positions, omitting

hybrid positions such as point guard / shooting guard combination. As a proxy for player quality, we focused on players

having at least 2100 minutes of playing time during the 2017-18 season (this corresponds roughly to the top quartile of

playing time for each position). For each position, we sampled 200 lineups uniformly at random from the possible

5-player lineups and used the ANC to predict whether or not that lineup would have a positive PMM. As summarized in

Table 12, there was not a single lineup predicted to be

elite

in this manner. Thus, the order statistics used by the

Table 12: ANC predictions on lineups comprised of a single position for 2018-19 players having at least 2100 minutes

of 2017-18 playing time.

Position Players Lineups Sampled Predicted elite

Shooting Guard 22 200 0

Power Forward 14 200 0

Center 11 200 0

Point Guard 15 200 0

Small Forward 19 200 0

Figure 8: Number of lineups on each team predicted by the ANC to be elite versus team pace.

ANC appear to be capturing the contributions of different positions on the court.

Team Pace.

The ANC relies on per-minute player statistics rather than per-possession statistics. Because teams are

known to vary in their pace, a natural question is whether the ANC predicts more

elite

lineups for fast-paced teams

than slow-paced teams. Figure 8 shows the number of

elite

-predicted lineups for each team versus the team’s pace,

as reported by NBA.com [

]. We observe no trend between pace and the propensity of the ANC to label lineups as

elite

. We leave for future work the incorporation of team pace into the ANC when predicting the performance of

lineups having combinations of players from different teams.

4.4.3 Case Study of Golden State Warriors and Los Angeles Lakers

To shed more light on the ANC predictions for the 2018-19 regular season, we use as a case study the Golden State

Warriors (NBA champions in the two preceding years), and the Los Angeles Lakers, whose acquisition of LeBron James

during the 2018 off-season led many to hope at the time that this once powerful team would be staging a come-back.

The predictions from our ANC model are summarized in Table 13. The Los Angeles Lakers had ﬁfteen players

on their roster during this season with sufﬁcient 2017-18 individual playing time to be included in ANC predictions,

yielding 3003 possible ﬁve-person lineups. Of these, not a single lineup is predicted to be

elite

. The Golden State

Warriors also had ﬁfteen players with sufﬁcient 2017-18 individual playing time on their roster in 2018-19, yielding

3003 possible ﬁve-person lineups. In contrast to the Lakers, 127 of these lineups are predicted to be elite.

To understand why the Lakers have no lineups predicted to be

elite

, we examine the individual player PMM for

both the Lakers and Warriors rosters in Table 14 from the 2017-18 season that was used to train the model. We see that

Table 13: Sample

elite

lineups predicted by the ANC for the 2018-19 rosters of the Los Angeles Lakers and the

Golden State Warriors.

Team

Number of Pos-

sible Lineups

Number of Pre-

dicted Elite Line-

ups

Sample Elite Lineups

Los Angeles Lakers 3003 0

• None

Golden State Warriors 3003 127

•

D. Cousins, S. Curry, K. Durant, D. Green, A.

Iguodala

•

S. Curry, K. Durant, J. Jerebko, K. Looney, K.

Thompson

•

K. Durant, D. Green, A. Iguodala, K. Looney, K.

Thompson

•

S. Curry, J. Bell, K. Durant, K. Looney, K. Thomp-

son

•

D. Cousins, A. Iguodala, K. Looney, S. Liv-

ingston, K. Thompson

the Lakers had several players on the roster with negative PMM. Given the importance assigned by the subclassiﬁers to

PMM

(1)

, this sheds some insight into why the Lakers are not predicted by the ANC to have any

elite

lineup. The

lack of predicted elite lineups is also consistent with the team’s ultimately poor performance during that season.

Entering the 2018-19 season as two-time defending champions, the Golden State Warriors, unsurprisingly, have a

large proportion of total lineup combinations classiﬁed as

elite

. A sample of the 127 predicted

elite

Warriors

lineups is given in Table 13; it illustrates a range of lineups that the coaching staff could rely upon to optimize the use of

its players whenever the “Hamptons Five” (Stephen Curry, Klay Thompson, Andre Iguodala, Kevin Durant, Draymond

Green) were not all on the court. Additionally, the sheer number of

elite

lineups Golden State offers is an indicator

of the depth of the team and is consistent with its record at the time.

We now compare the predictions of the ANC based on 2017-18 performance to realized performance during the

2018-19 regular season, for the Los Angeles Lakers and the Golden State Warriors.

Table 15 gives, for the Lakers and Warriors respectively, the confusion matrix showing numbers of lineups predicted

elite

not elite

versus their true performance (positive PMM versus nonpositive PMM), for those lineups that

had at least 25 minutes of playing time during the 2018-19 regular season [

]. We see that of the nine GSW lineups

predicted to be

elite

by the ANC based on 2017-18 data, seven of these ultimately had a positive PMM in 2018-19,

corresponding to a precision of 77.8%. (The precision cannot be calculated for the Lakers lineups because no lineups

were predicted to be elite by the ANC.)

Table 14: Individual player plus-minus per minute (PMM) during the 2017-18 regular season for members of the Los

Angeles Lakers and Golden State Warriors 2018-19 rosters. Excluded from the table are LAL players Isaac Bonga,

Jemerrio Jones, Scott Machado, Sviatoslav Mykhailiuk, Moritz Wagner, and Johnathan Williams, and GSW players

Jacob Evans and Marcus Derrickson, who did not have enough data from the 2017-18 NBA regular season; and LAL

player Ivica Zubac, who was traded mid-season to the Los Angeles Clippers and is listed by NBA.com on that team’s

roster.

Los Angeles Lakers Golden State Warriors

Player Individual PMM Player Individual PMM

Lonzo Ball -0.01 Jordan Bell 0.20

Michael Beasley -0.08 Andrew Bogut -0.13

Reggie Bullock 0.01 Quinn Cook -0.05

Kentavious Caldwell Pope -0.02 Demarcus Cousins 0.05

Alex Caruso -0.01 Stephen Curry 0.30

Tyson Chandler -0.21 Kevin Durant 0.15

Josh Hart -0.05 Draymond Green 0.14

Andre Ingram 0.31 Andre Iguodala 0.17

Brandon Ingram -0.06 Jonas Jerebko 0.05

Lebron James 0.03 Damian Jones 0.00

Kyle Kuzma -0.05 Damion Lee -0.06

Mike Muscala -0.07 Shaun Livingston 0.10

JaVale McGee 0.06 Kevon Looney 0.13

Rajon Rondo 0.01 Alfonzo McKinnie -0.22

Lance Stephenson -0.06 Klay Thompson 0.16

Table 15: Confusion matrices for Los Angeles Lakers and Golden State Warriors, respectively, ANC predictions based

on 2017-18 data compared to 2018-19 actuals, for lineups that had at least 25 minutes of playing time during the

2018-19 regular season and whose players had at least 50 minutes of playing time during the 2017-18 regular season.

Los Angeles Lakers Golden State Warriors

Predicted Class Predicted Class

Elite Not Elite Elite Not Elite

True

Class

Elite 0 12

True

Class

Elite 7 12

Not Elite 0 9 Not Elite 2 4

Tables 19 and 20 in Appendix C give realized PMM for all ﬁve-person lineups for the Lakers and Warriors,

respectively, that had at least 25 minutes of playing time during the 2018-19 regular season, along with the ANC

prediction. ‘

−

’ denotes lineups for which no ANC prediction is given. We conﬁrm that while the ANC consistently

predicts low-PMM lineups as

not elite

, it is a conservative classiﬁer, occasionally missing lineups that actualized

high PMMs the following season. For example, the highest performing lineup of the two teams, GSW’s McKinnie,

Green, Looney, Livingston and Curry, is predicted by the ANC to be

not elite

. This

not elite

prediction could

be due to Alfonzo McKinnie’s negative individual PMM in 2017-18, and the relatively high actual PMM could be due

to imprecision caused by this lineup having only 28 minutes of playing time during the 2018-19 season.

5 Conclusion

We have developed an all-or-nothing classiﬁer that predicts, based on individual player order statistics, the performance

of a ﬁve-person basketball lineup. By tuning the classiﬁer to achieve high precision, the lineups predicted

elite

the ANC achieve a higher average plus-minus per minute of playing time than teams predicted to be

not elite

both in same-season predictions on the withheld testing data set and in next-season predictions. We also showed that

teams having a large number of predicted-

elite

lineups tend to have a large number of lineups that achieve high

PMM, demonstrating that the ANC can be an indicator of bench depth. By incorporating both offensive and defensive

individual player order statistics, the ANC captures positional play and recommends lineups that are consistent in

positional balance with those used in actual play.

While the lineups classiﬁed as

elite

by the ANC do have a statistically higher PMM than lineups classiﬁed as

not elite

, the ANC occasionally misses very high-performing lineups. One possibility for future work is to tune

the PMM threshold at which a lineup is labeled as

elite

. Our current threshold is zero, but we might be interested in

identifying and characterizing lineups whose PMM is some amount higher than that; the classiﬁer might work better

identifying more extreme cases. Additionally, evidence suggests the ANC loses its predictive power for teams that

undergo changes to coaching staff and style and for teams having a large number of novice players. Future work could

examine whether past-year data from other leagues (e.g. the G league or NCAA) can be used to make predictions those

players without sufﬁcient NBA history.

Nonetheless, because the ANC prioritizes precision, lineups predicted by the ANC to be

elite

often actualize

positive PMM. We conclude that this classiﬁer can be included among decision-support tools to inform player

acquisitions and substitutions.

Acknowledgement

This material is based upon work supported by the National Science Foundation under Grant

No. DMS-1757952. Any opinions, ﬁndings, and conclusions or recommendations expressed in this material are those

of the author and do not necessarily reﬂect the views of the National Science Foundation. The authors would also like

to acknowledge ﬁnancial support from Harvey Mudd College. The authors thank Isys Johnson, Lucius Bynum, and

Robert Gonzalez for their contributions to earlier phases of this work and to the code base, portions of which were

adapted and used in this paper. Lastly, the authors thank the anonymous reviewers and editors whose feedback greatly

improved the analysis.

References

[1]

Basketball-Reference.com (2021a). 2017-18 Boston Celtics roster and stats. Accessed at

https://www.

basketball-reference.com/teams/BOS/2018.html on 7 June 2021.

[2]

Basketball-Reference.com (2021b). 2017-18 Houston Rockets roster and stats. Accessed at

https://www.

basketball-reference.com/teams/HOU/2018.html on 7 June 2021.

[3]

Basketball-Reference.com (2022a). 2017-18 NBA Player Stats: Per Game. Accessed at

https://www.

basketball-reference.com/leagues/NBA_2018_per_game.html on 7 April 2022.

[4]

Basketball-Reference.com (2022b). 2018-19 NBA Player Stats: Per Game. Accessed at

https://www.

basketball-reference.com/leagues/NBA_2019_per_game.html on 10 May 2022.

[5]

Bendl, J., J. Stourac, O. Salanda, A. Pavelka, E. Wieben, J. Zendulka, J. Brezovsky, and J. Damborsky (2014).

PredictSNP: Robust and accurate consensus classiﬁer for prediction of disease-related mutations. PLoS Comput

Biol 10(1).

[6] Bynum, L. E. J. (2018). Modeling subset behavior: Prescriptive analytics for professional basketball data. Senior

thesis (Claremont: Harvey Mudd College).

[7]

Cheng, G., Z. Zhang, M. Kyebambe, and K. Nasser (2016). Predicting the outcome of NBA playoffs based on the

maximum entropy principle. Entropy 18, 450.

[8]

Clemente, F., F. Martins, D. Kalamaras, and R. Mendes (2015). Network analysis in basketball: Inspecting the

prominent players using centrality metrics. Journal of Physical Education and Sport 15, 212–217.

[9]

Deshpande, S. K. and S. T. Jensen (2016). Estimating an NBA player’s impact on his team’s chances of winning.

Journal of Quantitative Analysis in Sports 12(2), 51–72.

[10]

Ghimire, S., J. A. Ehrlich, and S. D. Sanders (2020). Measuring individual worker output in a complementary

team setting: Does regularized adjusted plus minus isolate individual NBA player contributions? PloS one 15(8),

e0237920.

[11]

Glickman, M. and J. Sonas (2015). Introduction to the NCAA men’s basketball prediction methods issue. Journal

of Quantitative Analysis in Sports 11, 1–3.

[12]

Gumm, J., G. Hu, and A. Barrett (2015). A machine learning strategy for predicting March Madness winners. In

Proc. of the 16th IEEE/ACIS International Conference on Software Engineering, Artiﬁcial Intelligence, Networking

and Parallel/Distributed Computing (SNPD), pp. 1–6. IEEE.

[13]

Hua, S. (2015). Comparing several modeling methods on NCAA March Madness. Ph.D. Dissertation (North

Dakota State University).

[14]

Kalman, S. and J. Bosch (2020). NBA lineup analysis on clustered player tendencies: A new approach to the

positions of basketball & modeling lineup efﬁciency of soft lineup aggregates. 42 analytics (2020). In Proceedings

of the 14th MIT sloan sports analytics conference, Boston, MA, USA.

[15]

Kubatko, J., D. Oliver, K. Pelton, and D. T. Rosebaum (2007). A starting point for analyzing basketball statistics.

Journal of Quantitative Analysis in Sports 3(3).

[16]

Lin, R. (2017). Mason: Real-time NBA matches outcome prediction. Ph.D. Dissertation (Arizona State University).

[17]

Loeffelholz, B., E. Bednar, and K. Bauer (2009). Predicting NBA games using neural networks. Journal of

Quantitative Analysis in Sports 5(1), 1–15.

[18]

Maymin, A., P. Maymin, and E. Shen (2013). NBA chemistry: Positive and negative synergies in basketball.

International Journal of Computer Science in Sport 12(2), 4–23.

[19]

McMahon, I. (2018). How (and why) position-less lineups have taken over the NBA play-

offs. The Guardian. Accessed at

https://www.theguardian.com/sport/blog/2018/may/01/

how-and-why-position-less-lineups-have-taken-over-the-nba-playoffs

on 11 May

2022.

[20]

NBA.com (2019a). NBA advanced stats: Stats home/lineups/traditional.

https://www.nba.com/

stats/lineups/traditional/?Season=2018-19&SeasonType=Regular%20Season&sort=

MIN&dir=1&PerMode=Totals Accessed 20 May 2021.

[21]

NBA.com (2019b). NBA advanced stats: Stats home/teams/advanced.

https://www.nba.com/stats/

teams/advanced/?sort=W&dir=-1&Season=2018-19&SeasonType=Regular%20Season

Accessed 9 May 2022.

[22]

Oh, M., S. Keshri, and G. Iyengar (2015). Graphical models for basketball match simulation. In Proc. of the 2015

MIT Sloan Sports Analytics Conference, Volume 2728.

[23]

Ozmen, M. U. (2016). Marginal contribution of game statistics to probability of winning at different levels of

competition in basketball: Evidence from the Euroleague. International Journal of Sports Science and Coaching 11,

98–107.

[24]

Pelechrinis, K. (2019). LinNet: Probabilistic lineup evaluation through network embedding. In U. Brefeld,

E. Curry, E. Daly, B. MacNamee, A. Marascu, F. Pinelli, M. Berlingerio, and N. Hurley (Eds.), Machine Learning

and Knowledge Discovery in Databases, pp. 20–36. Springer International Publishing.

[25]

Ribeiro, J., P. Silva, R. Duarte, K. Davids, and J. Garganta (2017). Team sports performance analysed through the

lens of social network theory: Implications for research and practice. Sports Medicine 47, 1–8.

[26] Robertson, M. (2017). An analysis of NBA spatio-temporal data. M.S. Dissertation (Duke University).

[27]

Ruiz, F. J. and F. Perez-Cruz (2015). A generative model for predicting outcomes in college basketball. Journal of

Quantitative Analysis in Sports 11(1), 39–52.

[28]

Shen, G., D. Gao, Q. Wen, and R. Magel (2016). Predicting results of March Madness using three different

methods. Journal of Sports Research 3, 10–17.

[29]

Sisneros, R. and M. Van Moer (2013). Expanding plus-minus for visual and statistical analysis of NBA box-score

data. 1st IEEE Workshop on Sports Data Visualization.

[30]

Vaz de Melo, P., V. Almeida, A. Loureiro, and C. Faloutsos (2012). Forecasting in the NBA and other team sports:

Network effects in action. ACM Trans. Knowl. Discov. Data 6, 13.

[31]

asche, H., G. Dickson, A. Woll, and U. Brandes (2017). Social network analysis in sport research: an emerging

paradigm. European Journal for Sport and Society 14, 1–28.

[32]

Wikipedia (2021). 2018–19 Milwaukee Bucks season. Accessed at

https://en.wikipedia.org/wiki/

2018%E2%80%9319_Milwaukee_Bucks_season on 4 May 2022.

[33]

Wikipedia (2022). Mike Budenholzer. Accessed at

https://en.wikipedia.org/wiki/Mike_

Budenholzer on 4 May 2022.

[34]

Winston, W. L. (2012). Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in

Baseball, Basketball, and Football. Princeton University Press.

[35]

Zimmermann, A., S. Moorthy, and Z. Shi (2013). Predicting NCAAB match outcomes using ML techniques -

some results and lessons learned. In MLSA@PKDD/ECML.

A Individual Player Statistics Used as Predictors in ANC

Table 16: Individual player statistics used as predictors in ANC.

FGM Field goals made per minute

FGA Field goals attempted per minute

FGPCT Field goal percentage

FG3M Three-point ﬁeld goals made per minute

FG3A Three-point ﬁeld goals attempted per minute

FG3PCT Three-point ﬁeld goals percentage

FTM Free throws made per minute

FTA Free throws attempted per minute

FTPCT Free throw percentage

OREB Offensive rebounds per minute

DREB Defensive rebounds per minute

AST Assists per minute

TOV Turnovers per minute

STL Steals per minute

BLK Blocks per minute

BLKA Blocks attempted per minute

PF Personal fouls per minute

PTS Points earned per minute

PFD Personal fouls drawn per minute

PMM Plus-Minus per minute

CONTESTEDSHOTS Shots contested per minute

CONTESTEDSHOTS2PT Two-point shots contested per minute

CONTESTEDSHOTS3PT Three-point shots contested per minute

CHARGESDRAWN Charges Drawn per minute

DEFLECTIONS Passes deﬂected per minute

LOOSEBALLSRECOVERED Loose balls recovered per minute

SCREENASSISTS Screens that led to baskets per minute

BOXOUTS Box outs per minute

Table 17: Tuned parameter values used in simple model based on ﬁrst order statistics.

Subclassiﬁer Parameter Chosen Value

Decision Tree cp (cost complexity) -1

loss (misclassiﬁcation penalty) 1

Random Forest c (cutoff) 0.7

ntree (number of trees) 100

Boosting mfinal (number of trees) 500

maxdepth (depth of each tree) 3

cp (cost complexity) 0.01

Support Vector Machine cost (misclassiﬁcation penalty) 0.1

gamma (inﬂuence decay) 0.01

K-Nearest Neighbors k (number of neighbors) 5

Logistic Regression thresh (1−probability threshold) 0.25

All-or-Nothing Classiﬁer (ANC) numVotes (agreement required) 7

Table 18: Confusion matrix for the simple model based on ﬁrst order statistics applied to the test data set. Of the 12

lineups predicted to be elite, nine have a true label of elite, corresponding to a precision of 75.0%.

Predicted Class

Elite Not Elite

True

Class

Elite 9 86

Not Elite 3 78

B Comparison of ANC to Simpler Model

One might also wonder whether the complete set of ﬁve-player order statistics is required by the ANC to achieve high

precision. In this section, we analyze a simpler model that uses only the ﬁrst order statistics (i.e., the lineup’s minimum)

of each individual player metric used by the ANC.

We tune the simple model parameters as in Section 4.1, using ten-fold cross-validation. The parameter combination

that lies on the efﬁcient frontier of average precision and worst-case precision over the folds on the training data is

given in Table 17. This combination achieved an average precision of 86.5%, minimum precision of 57.1% and average

accuracy of 51.8% on the training data. When the performance was insensitive to a parameter value, the value was

chosen to match that used in the ANC.

Having tuned the parameters, we ﬁt the ﬁrst order statistic model to the full, standardized, training set, as described

earlier, and apply the trained model to the testing data. The confusion matrix is given in Table 18.

Of twelve lineups predicted to be

elite

, nine of these have a true label of

elite

, indicating a strictly positive

PMM. The simpler model achieves a testing precision of only 75% compared to the ANC’s testing precision of 86.7%.

C Actual Lineup Performance for LAL and GSW Case Study

Table 19: Actual lineup performance compared to ANC predictions for the Los Angeles Lakers during the 2018-19

season, for all lineups having at least 25 minutes of playing time. ‘

−

’ denotes lineups for which no ANC prediction is

given.

Los Angeles Lakers

Lineup

Minutes

Played

Actual

PMM

ANC Prediction

R. Rondo, K. Caldwell-Pope, B. Ingram, I. Zubac

, J. Hart 25 0.68 −

L. James, B. Ingram, I. Zubac, L. Ball, K. Kuzma 55 0.36 −

T. Chandler, L. James, K. Caldwell-Pope, L. Ball, K. Kuzma 39 0.36 Not Elite

L. James, J. McGee, L. Ball, K. Kuzma, J. Hart 133 0.31 Not Elite

L. James, R. Rondo, J. McGee, K. Caldwell-Pope, K. Kuzma 47 0.23 Not Elite

T. Chandler, L. Stephenson, K. Caldwell-Pope, B. Ingram, J. Hart 37 0.21 Not Elite

T. Chandler, B. Ingram, L. Ball, K. Kuzma, J. Hart 36 0.14 Not Elite

T. Chandler, L. James, B. Ingram, L. Ball, K. Kuzma 61 0.13 Not Elite

L. James, R. Rondo, J. McGee, K. Caldwell-Pope, B. Ingram 31 0.13 Not Elite

B. Ingram, I. Zubac, L. Ball, K. Kuzma, J. Hart 39 0.13 −

L. James, J. McGee, K. Caldwell-Pope, L. Ball, K. Kuzma 34 0.12 Not Elite

L. James, J. McGee, R. Bullock, B. Ingram, K. Kuzma 73 0.11 Not Elite

T. Chandler, K. Caldwell-Pope, B. Ingram, K. Kuzma, J. Hart 31 0.10 Not Elite

T. Chandler, K. Caldwell-Pope, B. Ingram, L. Ball, K. Kuzma 45 0.04 Not Elite

T. Chandler, L. James, L. Ball, K. Kuzma, J. Hart 66 0.02 Not Elite

L. James, R. Rondo, J. McGee, B. Ingram, K. Kuzma 43 0.00 Not Elite

L. James, J. McGee, B. Ingram, L. Ball, K. Kuzma 234 0.00 Not Elite

L. James, R. Rondo, R. Bullock, B. Ingram, K. Kuzma 62 -0.05 Not Elite

J. McGee, K. Caldwell-Pope, M. Muscala, A. Caruso, J. Jones

31 -0.06 −

L. James, K. Caldwell-Pope, L. Ball, K. Kuzma, J. Hart 25 -0.16 Not Elite

L. James, R. Rondo, J. McGee, R. Bullock, K. Kuzma 62 -0.21 Not Elite

R. Rondo, K. Caldwell-Pope, B. Ingram, I. Zubac, K. Kuzma 29 -0.24 −

L. James, L. Stephenson, L. Ball, K. Kuzma, J. Hart 31 -0.25 Not Elite

R. Rondo, M. Beasley, K. Caldwell-Pope, B. Ingram, I. Zubac 25 -0.28 −

J. McGee, K. Caldwell-Pope, B. Ingram, L. Ball, J. Hart 25 -0.32 Not Elite

L. James, R. Rondo, B. Ingram, I. Zubac, K. Kuzma 33 -0.43 Not Elite

J. McGee, B. Ingram, L. Ball, K. Kuzma, J. Hart 83 -0.47 Not Elite

R. Rondo, J. McGee, K. Caldwell-Pope, A. Caruso, M. Wagner

27 -1.31 −

Ivica Zubac was traded to the Los Angeles Clippers and was not included in ANC predictions for the Lakers.

Jemerrio Jones did not have data from the 2017-18 NBA regular season.

Moritz Wagner did not have data from the 2017-18 NBA regular season.

Jacob Evans did not have data from the 2017-18 NBA regular season.

Table 20: Actual lineup performance compared to ANC predictions for the Golden State Warriors during the 2018-19

season, for all lineups having at least 25 minutes of playing time.‘

−

’ denotes lineups for which no ANC prediction is

given.

Golden State Warriors

Lineup

Minutes

Played

Actual

PMM

ANC Prediction

A. McKinnie, D. Green, K. Looney, S. Livingston, S. Curry, 28 1.01 Not Elite

A. Iguodala, D. Green, K. Durant, K. Looney, S. Curry 25 0.80 Elite

A. Iguodala, D. Cousins, D. Green, K. Thompson, S. Curry 29 0.77 Elite

A. Iguodala, J. Bell, K. Durant, K. Thompson, S. Curry 36 0.73 Elite

A. Iguodala, D. Green, K. Durant, K. Thompson, S. Curry 178 0.69 Elite

A. Iguodala, K. Durant, K. Looney, K. Thompson, Q. Cook 35 0.63 Not Elite

A. McKinnie, J. Jerebko, K. Durant, K. Looney, S. Curry 26 0.61 Not Elite

D. Green, K. Durant, K. Looney, K. Thompson, S. Curry 313 0.39 Elite

D. Cousins, D. Green, K. Durant, K. Thompson, S. Curry 268 0.29 Elite

A. Bogut, D. Green, K. Durant, K. Thompson, S. Curry 83 0.27 Not Elite

A. Iguodala, D. Cousins, D. Green, K. Thompson, S. Livingston 67 0.24 Not Elite

D. Jones, J. Jerebko, K. Durant, K. Thompson, Q. Cook 29 0.20 Not Elite

A. McKinnie, A. Iguodala, K. Durant, K. Looney, S. Curry 48 0.19 Not Elite

A. Iguodala, K. Durant, K. Looney, K. Thompson, S. Curry 141 0.17 Elite

A. Iguodala, J. Jerebko, K. Durant, K. Looney, K. Thompson 47 0.13 Not Elite

A. McKinnie, A. Iguodala, J. Jerebko, K. Looney, S. Curry 27 0.11 Not Elite

D. Jones, D. Green, K. Durant, K. Thompson, S. Curry 142 0.11 Not Elite

A. Iguodala, D. Green, J. Jerebko, S. Livingston, S. Curry 54 0.07 Not Elite

A. Iguodala, D. Cousins, K. Thompson, Q. Cook, S. Livingston 39 0.05 Not Elite

A. Iguodala, D. Green, J. Jerebko, K. Thompson, S. Livingston 26 0.00 Not Elite

D. Lee, J. Jerebko, K. Looney, K. Thompson, S. Livingston 30 0.00 Not Elite

D. Green, J. Jerebko, K. Durant, K. Thompson, S. Curry 45 -0.22 Elite

A. Iguodala, D. Jones, K. Durant, K. Thompson, Q. Cook 77 -0.33 Not Elite

D. Green, J. Bell, K. Durant, K. Thompson, S. Curry 26 -0.38 Elite

A. McKinnie, D. Cousins, D. Green, K. Durant, S. Curry 32 -0.56 Not Elite

A. McKinnie, J. Evans

, J. Jerebko, J. Bell, Q. Cook 37 -0.57 −