24 Project: Grade Analysis Tool
Make sure you’ve completed: - All of Part I and Part II - Chapter 10: Working with Data - Understanding of CSV files and data processing
You should be ready to: - Process real-world data files - Calculate statistics and insights - Handle messy, imperfect data - Create meaningful reports
Starter code and notebooks for all projects are available on GitHub with “Open in Colab” buttons. See books.borck.education.
24.1 Project Overview
This project combines everything you’ve learned about data processing to create a real tool for analysing performance data. You’ll analyse grade data from CSV files, calculate statistics, identify trends, and generate actionable insights.
This is where programming becomes genuinely useful - solving real problems with real data.
24.2 The Problem to Solve
Educators need to understand their students’ performance! Your grade analyzer should: - Read grade data from CSV files - Calculate class averages, medians, and ranges - Identify struggling students - Find grade distribution patterns - Generate progress reports - Handle missing or invalid data gracefully
24.3 Architect Your Solution First
Before writing any code or consulting AI, design your grade analyzer:
1. Understand the Data
What might a gradebook CSV look like?
StudentID,Name,Quiz1,Quiz2,MidTerm,Project1,Quiz3,Final,Attendance
001,Alice Johnson,85,92,88,91,89,94,95
002,Bob Smith,78,65,82,79,81,77,88
003,Charlie Brown,91,88,94,96,93,89,100
004,Diana Prince,,85,90,88,87,92,92
005,Eve Wilson,45,52,48,65,58,61,75
2. Design Your Analysis Features
Plan what insights you’ll generate: - [ ] Individual student summaries - [ ] Class performance statistics - [ ] Grade distribution analysis - [ ] Improvement/decline trends - [ ] Missing assignment identification - [ ] At-risk student alerts
3. Identify Data Challenges
Real gradebook data has problems: - [ ] Missing grades (empty cells) - [ ] Invalid entries (“absent”, “N/A”, “103%”) - [ ] Inconsistent formatting - [ ] Extra or missing columns - [ ] Student names with special characters
24.4 Implementation Strategy
Phase 1: Basic Data Loading
- Read CSV file safely
- Handle missing values
- Convert grades to numbers
- Validate data ranges
Phase 2: Core Analytics
- Calculate averages per student
- Compute class statistics
- Identify grade distributions
- Generate basic reports
Phase 3: Advanced Insights
- Trend analysis (improvement/decline)
- Correlation between assignments
- At-risk student identification
- Visual data representation
24.5 AI Partnership Guidelines
Effective Prompts for This Project
✅ Good Learning Prompts:
"I'm analysing grade data from CSV. Some cells are empty or contain
'N/A'. Show me how to safely convert grade values to numbers,
handling these edge cases."
"I have a list of student grade dictionaries. How do I calculate
the median grade for the class? Show me both sorted and
statistics module approaches."
❌ Avoid These Prompts: - “Build a complete grade analysis system” - “Create a machine learning model for grade prediction” - “Add database integration and web interface”
AI Learning Progression
Data Cleaning Phase: Handling messy data
"My CSV has grades like '85', '92.5', 'N/A', '', and '102'. How do I clean and validate these values?"Statistics Phase: Mathematical analysis
"I need to calculate mean, median, and standard deviation for a list of grades. Show me simple implementations."Pattern Recognition: Finding insights
"How can I compare a student's recent grades to their earlier grades to detect improvement or decline?"
24.6 Requirements Specification
Functional Requirements
Your grade analyzer must:
- Data Processing
- Read standard CSV grade files
- Handle missing or invalid grades
- Support multiple assignment types
- Validate grade ranges (0-100)
- Statistical Analysis
- Calculate student averages
- Compute class statistics (mean, median, mode)
- Find grade distributions
- Identify outliers
- Reporting Features
- Individual student reports
- Class summary statistics
- At-risk student alerts
- Grade trend analysis
- Error Handling
- Graceful handling of bad data
- Clear error messages
- Data validation warnings
- Missing file handling
Learning Requirements
Your implementation should: - [ ] Use file I/O for CSV processing - [ ] Demonstrate data cleaning techniques - [ ] Apply statistical calculations - [ ] Show real-world data handling - [ ] Include comprehensive error handling
24.7 Sample Interaction
Here’s how your analyzer might work:
📊 GRADE ANALYSIS TOOL 📊
════════════════════════════
Loading grades from 'class_grades.csv'...
✅ Found 25 students with 7 assignments each
CLASS SUMMARY
═════════════
Total Students: 25
Assignments: Quiz1, Quiz2, MidTerm, Project1, Quiz3, Final, Attendance
Overall Class Statistics:
- Average: 84.2%
- Median: 86.0%
- Highest: 98.5% (Alice Johnson)
- Lowest: 52.3% (Eve Wilson)
- Standard Deviation: 12.4
ASSIGNMENT BREAKDOWN
═══════════════════
Quiz1: Avg 82.1% | Range: 45-98
Quiz2: Avg 79.8% | Range: 52-97
MidTerm: Avg 85.3% | Range: 48-96
Project1: Avg 87.2% | Range: 65-98
Quiz3: Avg 84.6% | Range: 58-95
Final: Avg 83.9% | Range: 61-94
Attendance:Avg 91.2% | Range: 75-100
AT-RISK STUDENTS
═══════════════
⚠️ Eve Wilson (Student ID: 005)
- Current Average: 52.3%
- Missing: 0 assignments
- Trend: Improving (+8% from early to recent grades)
- Recommendation: Schedule tutoring session
GRADE DISTRIBUTION
═════════════════
A (90-100): 6 students (24%)
B (80-89): 11 students (44%)
C (70-79): 5 students (20%)
D (60-69): 2 students (8%)
F (0-59): 1 student (4%)
INDIVIDUAL REPORTS
═════════════════
[Showing top 3 students]
1. Alice Johnson (ID: 001)
Average: 91.4% | Grade: A
Strongest: Final (94%), Quiz2 (92%)
Needs work: Quiz1 (85%)
2. Charlie Brown (ID: 003)
Average: 91.3% | Grade: A
Strongest: Project1 (96%), MidTerm (94%)
Needs work: Final (89%)
[Full reports available - press Enter to see all students]
24.8 Development Approach
Step 1: Safe CSV Reading
Start with robust file handling:
import csv
def load_grades(filename):
"""Load grades from CSV file with error handling"""
students = []
try:
with open(filename, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
students.append(row)
except FileNotFoundError:
print(f"Error: Could not find file '{filename}'")
return None
except Exception as e:
print(f"Error reading file: {e}")
return None
print(f"Loaded {len(students)} student records")
return studentsStep 2: Data Cleaning Functions
Handle messy real-world data:
def clean_grade(grade_str):
"""Convert grade string to float, handling edge cases"""
if not grade_str or grade_str.strip() == "":
return None
# Remove common non-numeric characters
cleaned = grade_str.strip().replace('%', '')
# Handle common text values
if cleaned.lower() in ['n/a', 'na', 'absent', 'missing']:
return None
try:
grade = float(cleaned)
# Validate range
if 0 <= grade <= 100:
return grade
else:
print(f"Warning: Grade {grade} outside valid range")
return None
except ValueError:
print(f"Warning: Could not parse grade '{grade_str}'")
return None
def clean_student_grades(student):
"""Clean all grades for a student"""
cleaned = {}
cleaned['name'] = student.get('Name', 'Unknown')
cleaned['id'] = student.get('StudentID', 'Unknown')
# Get all assignment columns (skip Name and StudentID)
assignment_columns = [col for col in student.keys()
if col not in ['Name', 'StudentID']]
cleaned['assignments'] = {}
for assignment in assignment_columns:
grade = clean_grade(student.get(assignment, ''))
cleaned['assignments'][assignment] = grade
return cleanedStep 3: Statistical Analysis
Build your analysis toolkit:
def calculate_student_average(student):
"""Calculate average grade for a student"""
grades = [g for g in student['assignments'].values() if g is not None]
if not grades:
return None
return sum(grades) / len(grades)
def calculate_class_statistics(students):
"""Calculate class-wide statistics"""
all_averages = []
for student in students:
avg = calculate_student_average(student)
if avg is not None:
all_averages.append(avg)
if not all_averages:
return None
all_averages.sort()
n = len(all_averages)
stats = {
'count': n,
'mean': sum(all_averages) / n,
'median': all_averages[n//2] if n % 2 == 1 else
(all_averages[n//2-1] + all_averages[n//2]) / 2,
'min': min(all_averages),
'max': max(all_averages)
}
# Calculate standard deviation
mean = stats['mean']
variance = sum((x - mean) ** 2 for x in all_averages) / n
stats['std_dev'] = variance ** 0.5
return statsStep 4: Trend Analysis
Identify patterns in performance:
def analyze_student_trend(student):
"""analyse if student is improving or declining"""
grades = []
assignments = student['assignments']
# Get grades in chronological order (assuming column order)
for assignment, grade in assignments.items():
if grade is not None:
grades.append(grade)
if len(grades) < 3: # Need enough data points
return "Insufficient data"
# Compare first third vs last third
third = len(grades) // 3
early_avg = sum(grades[:third+1]) / (third+1)
late_avg = sum(grades[-third-1:]) / (third+1)
improvement = late_avg - early_avg
if improvement > 5:
return f"Improving (+{improvement:.1f}%)"
elif improvement < -5:
return f"Declining ({improvement:.1f}%)"
else:
return "Stable"24.9 Advanced Features
Grade Distribution Analysis
def analyze_grade_distribution(students):
"""analyse how grades are distributed"""
distribution = {'A': 0, 'B': 0, 'C': 0, 'D': 0, 'F': 0}
for student in students:
avg = calculate_student_average(student)
if avg is not None:
if avg >= 90:
distribution['A'] += 1
elif avg >= 80:
distribution['B'] += 1
elif avg >= 70:
distribution['C'] += 1
elif avg >= 60:
distribution['D'] += 1
else:
distribution['F'] += 1
total = sum(distribution.values())
if total > 0:
for grade in distribution:
count = distribution[grade]
percentage = (count / total) * 100
print(f"{grade} ({grade_ranges[grade]}): {count} students ({percentage:.1f}%)")At-Risk Student Identification
def identify_at_risk_students(students, threshold=70):
"""Find students who might need help"""
at_risk = []
for student in students:
avg = calculate_student_average(student)
if avg is not None and avg < threshold:
# Count missing assignments
missing_count = sum(1 for g in student['assignments'].values()
if g is None)
trend = analyze_student_trend(student)
at_risk.append({
'student': student,
'average': avg,
'missing_assignments': missing_count,
'trend': trend
})
return sorted(at_risk, key=lambda x: x['average'])24.10 Real-World Data Challenges
Challenge 1: Extra Credit Handling
def handle_extra_credit(grade):
"""Handle grades over 100% properly"""
if grade > 100:
return min(grade, 110) # Cap at 110%
return gradeChallenge 2: Different Grading Scales
def normalize_grade(grade, scale='100'):
"""Convert different grading scales to 100-point scale"""
if scale == '4.0':
return (grade / 4.0) * 100
elif scale == 'letter':
letter_to_number = {'A': 95, 'B': 85, 'C': 75, 'D': 65, 'F': 50}
return letter_to_number.get(grade.upper(), 0)
return grade24.11 Testing with Sample Data
Create test data to verify your analyzer:
def create_sample_data():
"""Generate sample grade data for testing"""
sample_csv = """StudentID,Name,Quiz1,Quiz2,MidTerm,Project1,Final
001,Alice Johnson,85,92,88,91,94
002,Bob Smith,78,,82,79,77
003,Charlie Brown,91,88,94,96,89
004,Diana Prince,N/A,85,90,88,92
005,Eve Wilson,45,52,48,65,61"""
with open('sample_grades.csv', 'w') as f:
f.write(sample_csv)24.12 Practice Extensions
Extension 1: Progress Tracking
- Compare current grades to previous semesters
- Track improvement over time
- Generate progress charts
Extension 2: Assignment Analysis
- Identify which assignments are most difficult
- Find correlations between different assignments
- Suggest which assignments to review
Extension 3: Class Comparison
- Compare multiple class sections
- Identify teaching effectiveness
- Benchmark against standards
24.13 Common Pitfalls and Solutions
Pitfall 1: Assuming Clean Data
Problem: Real data is messy with missing values Solution: Always validate and clean first
Pitfall 2: Division by Zero
Problem: Calculating averages with no valid grades Solution: Check for empty lists before dividing
Pitfall 3: Hardcoded Column Names
Problem: Code breaks when CSV format changes Solution: Dynamically detect assignment columns
Pitfall 4: No Data Validation
Problem: Grades of 150% or -20% crash calculations Solution: Validate ranges and handle outliers
24.14 Reflection Questions
After completing the project:
- Data Quality: What surprised you about real-world data?
- Statistics Understanding: Which calculations were most insightful?
- Error Handling: How did you make your code robust?
- User Value: How would someone actually use this tool?
24.15 Next Project Preview
Excellent work! Next, you’ll build a Weather Dashboard that pulls live data from APIs, creating a real-time application that connects to the internet. You’ll see how external data sources make programs dynamic and current!
Your grade analyzer proves you can turn raw data into actionable insights - a skill valuable in any field! 📊