Work case

Academic Data Platform (Japan Learning Center)

Architected pipelines and analytics surfaces that processed 10K+ student records monthly, combining NestJS and Next.js services with Kafka, Airflow, PySpark, and LangChain on OpenAI.

Role: Software Architect
Published: 2025-03-01
Tags: edtech · data-pipeline · etl · ai · analytics

Student records

10K+ / month

Monthly academic data processing volume

Architecture team

Cross-functional delivery for pipelines and BI

Problem

Academic data was fragmented, and teams lacked a reliable way to turn operational records into usable insights. Manual reporting slowed decision-making and made it hard to understand learning patterns across classrooms.

Solution

Academic data platform architecture

As software architect, I shaped a platform that combined NestJS services and a Next.js analytics experience with Kafka, Airflow, and PySpark pipelines. Processed datasets landed in AWS S3 and MySQL-backed stores for reporting, while LangChain on OpenAI supported classification and automated insight summaries surfaced in BI dashboards.

Architecture decisions

Kafka decoupled ingestion from batch and serving paths so upstream changes did not destabilize downstream consumers.
Airflow orchestrated PySpark workloads and dependencies with explicit retries and scheduling.
S3 acted as a durable lake-style landing zone before relational serving in MySQL, keeping heavy transforms off transactional paths.
LangChain structured prompts and tooling around OpenAI for repeatable classification and narrative insights fed into dashboards.

Impact

Processed 10K+ student records monthly for the learning center client.
Delivered BI dashboards plus AI-assisted classification and insights on top of standardized pipelines.
Coordinated delivery with an architecture-focused team of 12 across data, backend, and analytics surfaces.