JS

Cal Field Hockey Database Management System

From normalized schema to analytics and ML‑driven insights

Project Overview

Collaborated with Cal Club Field Hockey to design and implement a comprehensive database solution that modernized data management across the club. The system spans player information, event management, merchandise tracking, and predictive analytics for player evaluation.

Database Relation and Schema

The schema follows strict normalization (BCNF) to ensure integrity, minimize redundancy, and optimize queries. Relations were first specified in detail, then validated through EER modeling before implementation in MySQL Workbench.

Synthetic Data Generation

To protect sensitive information while demonstrating functionality, I generated synthetic data based on real patterns. Logic constraints maintained realism (e.g., goals never exceeding shots), enabling realistic demos without exposing private data.

Machine Learning Pipeline

  • Performance metrics: goals, assists, minutes played, ratings
  • Derived features: per‑90 stats, consistency scores
  • Modeling: Random Forest (100 estimators) with 5‑fold cross‑validation
  • Output: Player ranking and recognition support

Results & Impact

Delivered a functional, production‑ready database replacing spreadsheets and paper workflows. The system enables efficient entry, retrieval, and analytics across the club, while the ML model provides an objective perspective on player evaluation and awards.

Technologies Used

  • MySQL Workbench
  • Python
  • Pandas & NumPy
  • Scikit‑Learn
  • Git & GitHub

Key Achievements

  • 62 normalized tables
  • BCNF normalization
  • 92.5% model accuracy (internal validation)
  • Production‑ready deployment

Demo

Database schema overview slide
Database schema overview within MySQL Workbench
Simplified EER diagram slide
The simplified EER diagram for the database
Database relations diagram slide
The database relations written out before implementation
Historical data visualization slide
Awards ML model performance table
Player ranking slide
Model tested on players never seen during training/testing
Synthetic data generation slide A
Synthetic data generation for assists
Synthetic data generation slide G
Distribution of real data vs synthetic data for goals
Synthetic data scoring slide
Distribution of real data vs synthetic data for goals scored
Synthetic data heatmap slide
Distribution of real data vs synthetic data for shots
Synthetic data shots on goal slide
Distribution of real data vs synthetic data for shots on goal
Weighted scoring model slide
Weighted scoring model for player evaluation
1 / 21

Project Details

Duration: 4 months
Team Size: 8 members
Role: Database Designer & ML Engineer
Client: Cal Club Field Hockey