Case study
Academic Data Platform (Japan Learning Center)
Architected pipelines and analytics surfaces that processed 10K+ student records monthly, combining NestJS and Next.js services with Kafka, Airflow, PySpark, and LangChain on OpenAI.
- Role
- Software Architect
- Published
- Tags
- edtech · data-pipeline · etl · ai · analytics
Student records
10K+ / month
Monthly academic data processing volume
Architecture team
12
Cross-functional delivery for pipelines and BI
Problem
Academic data was fragmented, and teams lacked a reliable way to turn operational records into usable insights. Manual reporting slowed decision-making and made it hard to understand learning patterns across classrooms.
Solution
As software architect, I shaped a platform that combined NestJS services and a Next.js analytics experience with Kafka, Airflow, and PySpark pipelines. Processed datasets landed in AWS S3 and MySQL-backed stores for reporting, while LangChain on OpenAI supported classification and automated insight summaries surfaced in BI dashboards.
Architecture decisions
- Kafka decoupled ingestion from batch and serving paths so upstream changes did not destabilize downstream consumers.
- Airflow orchestrated PySpark workloads and dependencies with explicit retries and scheduling.
- S3 acted as a durable lake-style landing zone before relational serving in MySQL, keeping heavy transforms off transactional paths.
- LangChain structured prompts and tooling around OpenAI for repeatable classification and narrative insights fed into dashboards.
Impact
- Processed 10K+ student records monthly for the learning center client.
- Delivered BI dashboards plus AI-assisted classification and insights on top of standardized pipelines.
- Coordinated delivery with an architecture-focused team of 12 across data, backend, and analytics surfaces.