All work
/ 2023Final Year Project · Research to production

Automated Summary Evaluator (FYP)

An LLM-powered system that scores student summaries on content and wording. Datasets sourced from multiple NGOs, deployed via CI/CD pipelines with Docker for consistent real-world execution.

/ The Problem

Educators scoring ADHD student summaries by hand faced consistency issues and long turnaround times. NGOs needed an automated way to grade content and wording at scale.

/ The Approach

Curated labelled summary datasets from multiple NGOs, then trained LLMs as regressors to score content and wording on a continuous scale. Wrapped the model in a CI/CD + Docker pipeline so grading runs identically in research and production.

/ The Impact

Trained LLMs as regressors on curated ADHD datasets from multiple NGOs. Shipped via GitHub CI/CD and Docker for reproducible deployment. Graded summaries in seconds instead of minutes.

/ Highlights
  • LLMs trained as regressors to score content and wording continuously
  • Datasets curated and merged from multiple partner NGOs
  • GitHub Actions CI/CD + Docker for reproducible grading runs
  • Cut turnaround from minutes-per-summary to seconds
/ Stack
  • Python
  • LLMs
  • CI/CD
  • Docker
  • GitHub Actions
View source on GitHub
Next project
CureWise — Agentic RAG for Healthcare