Swish or Miss: The Role of Data Bias in NCAA Basketball Predictions

NCAA Basketball Data Bias

Swish or Miss: The Role of Data Bias in NCAA Basketball Predictions

The 2023 college basketball season has crowned two unexpected champions, with the LSU women’s and UConn men’s teams hoisting trophies in Dallas and Houston, respectively.

I say unexpected because, before the season began, neither one of these teams was thought of as a title contender. Both were given 60-1 odds to win the whole thing, and media and coaches polls weren’t giving them much respect.

Still, teams have been proving rankings and polls wrong since they first came around in the 1930s. And being atop rankings doesn’t guarantee success.

Since the expansion of the men’s basketball tournament in 1985, only six teams ranked preseason No. 1 in the AP Poll have won the title. It’s almost more of a curse than a blessing at that point.

How many of these rankings and polls are out there?

Even though we have access to a plethora of well-regarded rankings from individual journalists like ESPN’s Charlie Creme and Jeff Borzello, Big Ten Network’s Andy Katz, and Fox Sports’ John Fanta, there are three polls being widely recognized.

The chief among them is the aforementioned AP Top 25 Poll, compiled from a group of 61 sports journalists from across the country.

Then you have the USA Today Coaches Poll consisting of 32 Division I head coaches, one from each of the conferences that receive an automatic bid to the NCAA tournament. And the newest addition is the Student Media Poll, run out of Indiana University. This is a poll of student journalist voters who cover sports at their university daily.

These three groups will all look at teams with similar criteria, particularly before a single game is played. Without anyone scoring a point, media and coaches alike have to use the data that is accessible and make their early predictions.

Here are some of the most common:

Previous season results

It makes sense right? Whoever was best last season will likely be just as good. Well…between graduation, the transfer portal, and the world of one-and-done basketball, many rosters experience significant overhauls in the offseason.

When a team hits the top of the preseason rankings, odds are they’ve retained most of their key players. North Carolina — who missed the NCAA tournament entirely — was selected No. 1 for all three preseason polls after finishing as the runner-up in 2022 and returning four starters.

Experience

Veterans are crucial to any sport. But, in a sport with such a long season — upwards of 30 games a year — to get through, experience is greater.

Iowa women’s basketball made its longest-ever run in the tournament this year. Beyond the talent on the team, the Hawkeyes’ first five played 92 games together as starters. That’s unheard of in today’s game.

It’s no surprise a team like that can make a deep run and it’s a big reason Iowa was picked between No. 4 and No. 6 ahead of the season.

Strong recruiting class

Basketball is, arguably, the collegiate sport where a freshman can have the most impact. Limited roster spots and the rise of pro-ready players have seen many first-years become instant superstars.

And it shows in the polls. Eight of the top-10 men’s recruiting classes were represented in all three preseason polls.

The star factor

Big-time players are a major reason we watch college basketball. The top four men’s teams going into the season featured four of the biggest names in the league (Armando Bacot-North Carolina, Drew Timme-Gonzaga, Marcus Sasser-Houston, and Oscar Tshiebwe-Kentucky).

The reigning national player of the year Aliyah Boston’s South Carolina was almost a unanimous No. 1 in the preseason women’s polls, garnering 85 of 88 possible first-place votes across the three polls.

Where do polls differ?

Journalists and coaches that are responsible for rankings will use some combination of these factors while adding some of their own reasoning.

A journalist or student journalist that covers the Big 12 on a day-to-day basis might rank a team from that conference differently because they likely see all their highs and lows. If a national media member is only paying attention after a big win, it’s likely they could overrate that team.

For instance, Kevin McNamara had UConn the highest of anyone in the preseason AP Poll at 15. McNamara covers sports in New England based out of Providence, Rhode Island. Providence men’s basketball is in the Big East with UConn. It’s likely he would’ve seen more of the Huskies than his counterparts and looks all the wiser because of it.

On the other side, a coach might be inclined to rank a team higher if that team beat their own squad. It makes the coach’s team look better if a loss is to a stronger team while also using the rationale, “Well, they must be good if they beat us!”

Although we’re all working with a lot of the same data when looking at these teams, it’s not always a total consensus. Each person that votes on these polls brings their own experience and biases or puts their own weight on different factors.

Even as we’ve jumped further into analytic-led polling, the predictions aren’t much more successful. KenPom has become the gold standard in basketball rankings from statistics. It ranks all 363 NCAA teams based on adjusted efficiency margin (based on offensive and defensive efficiency per 100 possessions and team possessions per game).

KenPom was, rightfully, more wary of North Carolina, ranking it No. 9 preseason. But, it had UConn as low as anyone, at 27.

Where were our champions ranked preseason?

LSU- Coaches No. 14, AP No. 16, Student No. 17

UConn- Received votes but unranked in all three

Needless to say, no one was prepping a victory parade in Storrs or Baton Rouge off of the early poll releases. But, as I said early on, teams have been proving rankings and polls wrong since they first came around.

They expose some of the misconceptions pollsters have about their team and what it takes for them to win a championship.

Scroll to Top
As the BI space evolves, organizations must take into account the bottom line of amassing analytics assets.
The more assets you have, the greater the cost to your business. There are the hard costs of keeping redundant assets, i.e., cloud or server capacity. Accumulating multiple versions of the same visualization not only takes up space, but BI vendors are moving to capacity pricing. Companies now pay more if you have more dashboards, apps, and reports. Earlier, we spoke about dependencies. Keeping redundant assets increases the number of dependencies and therefore the complexity. This comes with a price tag.
The implications of asset failures differ, and the business’s repercussions can be minimal or drastic.
Different industries have distinct regulatory requirements to meet. The impact may be minimal if a report for an end-of-year close has a mislabeled column that the sales or marketing department uses, On the other hand, if a healthcare or financial report does not meet the needs of a HIPPA or SOX compliance report, the company and its C-level suite may face severe penalties and reputational damage. Another example is a report that is shared externally. During an update of the report specs, the low-level security was incorrectly applied, which caused people to have access to personal information.
The complexity of assets influences their likelihood of encountering issues.
The last thing a business wants is for a report or app to fail at a crucial moment. If you know the report is complex and has a lot of dependencies, then the probability of failure caused by IT changes is high. That means a change request should be taken into account. Dependency graphs become important. If it is a straightforward sales report that tells notes by salesperson by account, any changes made do not have the same impact on the report, even if it fails. BI operations should treat these reports differently during change.
Not all reports and dashboards fail the same; some reports may lag, definitions might change, or data accuracy and relevance could wane. Understanding these variations aids in better risk anticipation.

Marketing uses several reports for its campaigns – standard analytic assets often delivered through marketing tools. Finance has very complex reports converted from Excel to BI tools while incorporating different consolidation rules. The marketing reports have a different failure mode than the financial reports. They, therefore, need to be managed differently.

It’s time for the company’s monthly business review. The marketing department proceeds to report on leads acquired per salesperson. Unfortunately, half the team has left the organization, and the data fails to load accurately. While this is an inconvenience for the marketing group, it isn’t detrimental to the business. However, a failure in financial reporting for a human resource consulting firm with 1000s contractors that contains critical and complex calculations about sickness, fees, hours, etc, has major implications and needs to be managed differently.

Acknowledging that assets transition through distinct phases allows for effective management decisions at each stage. As new visualizations are released, the information leads to broad use and adoption.
Think back to the start of the pandemic. COVID dashboards were quickly put together and released to the business, showing pertinent information: how the virus spreads, demographics affected the business and risks, etc. At the time, it was relevant and served its purpose. As we moved past the pandemic, COVID-specific information became obsolete, and reporting is integrated into regular HR reporting.
Reports and dashboards are crafted to deliver valuable insights for stakeholders. Over time, though, the worth of assets changes.
When a company opens its first store in a certain area, there are many elements it needs to understand – other stores in the area, traffic patterns, pricing of products, what products to sell, etc. Once the store is operational for some time, specifics are not as important, and it can adopt the standard reporting. The tailor-made analytic assets become irrelevant and no longer add value to the store manager.