Alright so today I wanted to compare some women’s basketball stats, right? Specifically looking at how players from HB stacked up against PS players this season. Seems simple on paper. Yeah, no.
First thing I did? Grabbed my laptop and tried to find the official stats pages. Big mistake. Took me forever just to navigate those confusing league websites. Seriously, why make it so hard? Clicked through like ten pages before I found any actual player data. Felt like running through mud.
Then I thought, “Okay, maybe dump all this into Excel.” Opened up a fresh sheet, started typing everything manually. Player names, points per game, rebounds, assists – the usual stuff. Got through maybe three players before my eyes started crossing. This was gonna take all damn day. I ain’t got time for that.
Scrapped the manual plan fast. Started searching for faster ways. Found a couple fan sites with tables, tried copy-pasting directly into Excel. Total mess. Stuff landed in wrong columns, numbers got jumbled, formatting went crazy. Had to clean it up line by line. So damn frustrating. Excel and website tables just hate each other, I swear.
- Tried highlighting cells and pasting special
- Mess with Text to Columns tool
- Even downloaded a sample CSV to see if structure would be similar
Nothing worked smoothly. One site kept blocking my copy attempts with popups. Another had stats spread over multiple tiny tables. Pure headache material. Almost threw my mouse.
Finally remembered some basic web scraping stuff. Fired up Chrome’s Inspect tool, stared at the HTML gibberish for a bit. Found the table tags after poking around, used Inspect > Copy > Copy OuterHTML on the whole stats table. Pasted that raw mess into a text editor first. Looked terrifying with all those angle brackets everywhere.
Took that HTML dump straight into pandas with Python. Wrote a quick script to read it:
import pandas as pd
tables = *_html('stats_*')
player_df = tables[0]
player_*_excel('clean_*')
Boom. Suddenly had a clean spreadsheet with names and stats in nice columns. Magic. Why didn’t I start with this? Still, some cleanup needed – weird symbols, extra spaces. Used simple .replace()
and fixes on the DataFrame.
Ended up with usable data to compare HB and PS top performers side-by-side. Not perfect, but way better than spending five hours typing. Learned my lesson: avoid manual grunt work like the plague next time. Jump straight to pandas when dealing with messy web tables. Saved my sanity.