Automated Summary Evaluator (FYP)
An LLM-powered system that scores student summaries on content and wording. Datasets sourced from multiple NGOs, deployed via CI/CD pipelines with Docker for consistent real-world execution.
Educators scoring ADHD student summaries by hand faced consistency issues and long turnaround times. NGOs needed an automated way to grade content and wording at scale.
Curated labelled summary datasets from multiple NGOs, then trained LLMs as regressors to score content and wording on a continuous scale. Wrapped the model in a CI/CD + Docker pipeline so grading runs identically in research and production.
Trained LLMs as regressors on curated ADHD datasets from multiple NGOs. Shipped via GitHub CI/CD and Docker for reproducible deployment. Graded summaries in seconds instead of minutes.
- LLMs trained as regressors to score content and wording continuously
- Datasets curated and merged from multiple partner NGOs
- GitHub Actions CI/CD + Docker for reproducible grading runs
- Cut turnaround from minutes-per-summary to seconds
- Python
- LLMs
- CI/CD
- Docker
- GitHub Actions