Body Performance
Intelligence
Predict your class · Estimate your jump

Enter your physiological measurements and fitness test scores to instantly receive your predicted performance class (A–D) and estimated broad jump distance — powered by a Random Forest model trained on 13,393 real fitness evaluations.

13,393Records
74.26%RF Accuracy
0.78Regression R²
A
Elite Performer — Top tier fitness across all metrics
B
Above Average — Strong performance with minor gaps
C
Average — Room for targeted improvement
D
Below Average — Needs consistent training focus

Participant Profile

11 inputs required
Simulation mode — approximated RF decision boundaries Run real model in Colab →
01 — Demographics
35
02 — Body Composition
170
70
20
BMI: 24.2 (Normal)
03 — Cardiovascular
80
120
04 — Fitness Test Results
35
20
35
160
AWAITING INPUT
Adjust sliders and click Analyze
Random Forest Prediction
cm
Linear Regression Prediction
cm
Age/Gender Benchmark: —

Performance Profile Radar

Model Performance

Leaderboard
#Model Accuracy Precision Recall F1 CV Mean ± Std
1
Random Forest
74.26% 74.71% 74.26% 74.12% 73.32% ± 0.78
2
Neural Network (MLP)
74.06%75.19%74.06%74.16%74.00% ± 1.12
3
SVM (RBF Kernel)
71.62%72.12%71.62%71.57%70.90% ± 0.69
4
Decision Tree
65.17%66.79%65.17%65.32%64.75% ± 0.80
5
Logistic Regression
62.36%62.05%62.36%61.97%61.67% ± 0.83
6
KNN (k=11)
61.84%63.81%61.84%61.91%62.08% ± 0.72
R
Linear Regression (OLS)
77.88% N/AN/AN/A RMSE 18.80

Split Stability Analysis

80/20 · 70/30 · 50/50

Accuracy Across Train/Test Splits ALL 6 MODELS · 3 RATIOS

Grouped bar chart — each cluster is one split ratio · Y-axis starts at 55% for readability
Most Stable: RF Δ 1.90% from 80/20 → 50/50. Consistently top performer regardless of data volume.
70/30 Is Best Split All 6 models peak or improve at 70/30 vs 80/20, confirming this as the optimal ratio for this dataset.
KNN Most Sensitive KNN degrades consistently with less training data — confirming distance-based methods need data volume.

K-Fold Cross-Validation

5-Fold · Generalisation

CV Mean ± Std Dev 5-Fold · Full Dataset

Lower std = more generalizable · Results prove RF isn't a lucky split — it dominates across all folds
# Model CV Mean ± Std Dev Performance Bar
Finding: SVM has the lowest std (±0.69%) — most consistent across folds. Neural Network has the highest std (±1.12%) — sensitive to fold composition. Random Forest achieves 73.32% ± 0.78% CV, confirming its robustness over any single split result.

Classification Models

Detailed Analysis

Random Forest

Best overall robust bagging ensemble of 200 Decision Trees.
Accuracy74.26%
F1 Score74.12%
Highest accuracy, robust to outliers and prevents overfitting.
Most stable across 80/20 & 50/50 splits.
Less interpretable than a single tree.

Neural Network

Multi-Layer Perceptron (128, 64) modeling highly non-linear boundaries.
Accuracy74.06%
F1 Score74.16%
Top-tier accuracy predicting non-linear boundaries.
Sensitive to data volume; drops significantly on 50/50 split.
Least interpretable (black-box).

Support Vector Machine

Hyperplane separation utilizing the RBF (Gaussian) Kernel.
Accuracy71.62%
F1 Score71.57%
Very effective in high-dimensional feature spaces.
Strong non-linear capture with C=10.
Feature scaling is strictly mandatory for distance computations.

Logistic Regression

Linear multinomial classification optimized via L-BFGS.
Accuracy62.36%
F1 Score61.97%
Highly interpretable feature coefficient analysis.
Outputs calibrated class probabilities well.
Struggles completely with non-linear class separations.

Decision Tree

Recursive partition splits maximizing node purity (Gini).
Accuracy65.17%
F1 Score65.32%
Fully interpretable branching and scaling not required.
Prone to high variance and overfitting deeper than max_depth=8.

K-Nearest Neighbors

Instance-based learner computing Minkowski distances (k=11).
Accuracy61.84%
F1 Score61.91%
Simple lazy learner with no training phase.
Lowest performer overall due to high dataset dimensionality.
Drops sharply on smaller data splits.

Confusion Matrix & Error Analysis

Random Forest · 70/30 Split

RF Performance by Class True vs Predicted

Interactive 4×4 grid revealing specific misclassification boundaries
Predicted Class
A B C D
True Class A 882 87.8% 94 22 6
B 136 603 60.1% 240 25
C 35 175 660 65.7% 135
D 8 24 170 802 79.9%
High B↔C Confusion
The most common error is confusing Class B with C (240 + 175 = 415 errors). These intermediate fitness levels have highly overlapping feature distributions, making linear separation impossible.
Excellent Extreme Accuracy
The model rarely makes catastrophic errors. Only 6 Class A participants were misclassified as D, and 8 D's as A. Extreme fitness tiers are highly distinct.
Per-Class F1 Scores
A
0.78
Best
B
0.63
Lowest
C
0.69
Moderate
D
0.86
Outstanding

Regression Models

Broad Jump Prediction

RF Regressor

Ensemble averaging across 200 jump-predicting trees.
R² Score0.7842
RMSE / MAE18.57 / 13.82
Best fit capturing complex agility metric interactions.
Extremely resilient to outliers.

Neural Network Regressor

MLP (128, 64) non-linear regression for jump prediction.
R² Score0.7837
RMSE / MAE18.59 / 13.88
Confirms strong linear association between variables and jump.
Cannot fully capture peak performance explosive thresholds.

SVR (RBF Kernel)

Support Vector Regression utilizing insensitive tube (ε=0.1).
R² Score0.7796
RMSE / MAE18.76 / 13.92
Produces smooth continuous predictions unlike decision trees.
Highly sensitive to the hyperparameter C choices.

Linear Regression

Ordinary Least Squares (OLS) identifying linear variable associations.
R² Score0.7788
RMSE / MAE18.80 / 14.12
Zero training time; identifies global feature impact directly.
Assumes linearity where fitness datasets often show non-linearities.

Feature Importance

Permutation Importance

Top Predictors RF · Permutation

sit_and_bend_forward_cm
0.258
sit-ups counts
0.231
age
0.132
weight_kg
0.071
body_fat_%
0.058
gripForce
0.050

Lower Predictors Ranked 7–11

gender
0.050
broad_jump_cm
0.028
height_cm
0.010
systolic
0.006
diastolic
−0.002
Key Finding: Flexibility (sit-and-bend) and core endurance (sit-ups) together account for 48.9% of total permutation importance — far ahead of body composition metrics.

Dataset Overview

Body Performance
13,393Total Records
12Feature Columns
11Input Features
4Performance Classes
0Missing Values
~3,348Records per Class

ML Pipeline

End-to-End
Load
CSV · 13,393 rows
Audit
Physiol. Laws
EDA
Bivariate Analysis
Prep
IQR Capping
Split
80/20 · 70/30 · 50/50
Train
6 classifiers · 3 reg.
Evaluate
Gini · Permutation
Deploy
APEX Dashboard

Data Quality Audit

Physiological Laws

Blood Pressure Constraint

Enforced the Systolic > Diastolic physiological law. Measurements where resting pressure exceeded beating pressure were flagged as illogical and removed to ensure data integrity.

Duplicate Rectification

Identified and purged exact row duplicates. This prevents "Data Leakage" where the model might "memorize" identical participants across training and testing splits, artificially inflating accuracy.

Column Definitions

Schema
ColumnTypeDescriptionValid RangeML Role
ageINTParticipant age in years18 – 80Feature
genderCATBiological sex — M or FM / FFeature (encoded)
height_cmFLOATStanding height in centimetres100 – 220Feature
weight_kgFLOATBody weight in kilograms20 – 250Feature
body fat_%FLOATBody fat percentage3 – 65%Feature
diastolicINTDiastolic blood pressure40 – 130 mmHgFeature
systolicINTSystolic blood pressure70 – 200 mmHgFeature
gripForceFLOATHand grip strength0 – 70 kgFeature (high importance)
sit_and_bend_forward_cmFLOATFlexibility: sit-and-reach test−25 – 200 cmFeature (top importance)
sit-ups countsINTNumber of sit-ups completed0 – 80Feature (high importance)
broad jump_cmFLOATStanding broad jump distance50 – 300 cmFeature + Regression Target
classCATPerformance band — A (best) to D (worst)A / B / C / DClassification Target

Executive Summary

Final Report
03_Gharieb_Team
Body Performance
Final Analytics

The Gharieb Team from the Military Technical College presents the definitive body performance intelligence system. Our methodology follows a recursive 5-stage pipeline—Data Cleaning, EDA, Multi-Split Modeling, Cross-Validation, and Production Deployment—to classify fitness tiers (A–D) with optimized precision.

13,393 Records Analyzed
6 Classifiers Trained
3 Regression Models
74.26% Best Accuracy (RF)

Through rigorous statistical auditing (Physiological BP Laws, IQR Outlier Capping, and Permutation Importance analysis), we have engineered a robust engine that captures the non-linear relationship between physical metrics and athletic performance grades.

Key System Insights

Feature Analysis
#1

Body Composition Dominance

Predictor: FAT %

Body fat percentage is the primary driver of performance classification. Individuals in Class 'A' consistently exhibit significantly lower fat levels, making it the most reliable physiological metric for predicting peak fitness grades.

#2

Strength-Agility Synergy

Indicators: JUMP / GRIP

Our analytics confirm that broad jump distance, sit-ups, and grip force move in near-perfect lockstep. High scores in one usually signal high scores in others, representing a unified explosive power-endurance coefficient.

#3

Gender Threshold Scaling

Condition: M/F Balanced

Distributions across Classes A-D are exceptionally balanced between genders. This indicates that our grading criteria effectively scale according to biological sex standards, ensuring fair and accurate classification for all participants.

#4

Blood Pressure Indifference

Predictor: Low Corr.

While vital for health monitoring, systolic and diastolic blood pressures showed minimal correlation with raw performance classes. This confirms that cardiovascular health is a background constant rather than a direct performance driver.

Methodology & Results

Performance
Random Forest Classifier
74.26%
Classification Accuracy
Precision74.71%
Recall74.26%
F1 Score74.12%
Training Split70 / 30
Cross-Validation5-fold
Best Classifier
RF Regressor (Jump)
0.7842
R² Score
TaskBroad Jump (cm)
RMSE18.57 cm
MAE13.82 cm
Best Regressor
Methodology
Gharieb 5S
Audit Pipeline
1. AuditBP logic + Duplicates
2. EDABivariate Correlation
3. PrepStandard Scaler + IQR
4. ModelGrid Search + 5-Fold
5. DeployInteractive Dashboard
5-Stage Flow

Strategic Roadmap

Future Work

Ensemble Stacking

Combine RF, SVM, and MLP into a single meta-model to minimize residual variance.

SHAP Integration

Implement Game Theory explainability to provide local reasons for every prediction.

Deep Regression

Utilize Keras/PyTorch ANN architectures to push Broad Jump R² beyond current thresholds.

Grid Search Pro

Execute exhaustive hyperparameter optimization across all split ratios simultaneously.

Project Artifacts

Downloads
Final Report (PDF)
02_Snipers_Team_Body_Performance_Final_Report · 1.4 MB
⬇ Download
Final ML Notebook
03_Snipers_Team_ML_Notebook_Final — Full training pipeline
⬇ Download
Presentation Slides (PPTX)
04_Snipers_Team_Presentation · 17-slide Final Deck · 1.8 MB
⬇ Download
Task Distribution Report
01_Snipers_Team_Tasks_Distribution_Report · 217 KB
⬇ Download
Analytics & Classification Notebook
Body_Performance_Analytics_&_Intelligent_Classification_System · 7.7 MB
⬇ Download
Google Drive — Full Project
Complete submission: all reports, notebooks, datasets & documentation
GitHub Repository
GhariebML/apex-performance-app — Source code & notebooks
LOADING ML NOTEBOOK...
LOADING COLAB INSTANCE...

History & Compare

Broad Jump Progression
Run 1
Run 2

Artifact Preview

Validated

Confusion Matrix / Distribution

Per-Class Performance (F1)

Hyperparameters
Split Comparison
Analyst Insight