Analytics & Insights
Overview
Analytics & Insights covers the data and visualization tools that instructors use to understand learner behavior — engagement rates, video watch times, problem attempts, completion patterns, and course-level aggregations.
Open edX has undergone a major analytics evolution: from a proprietary in-house analytics stack (edX Insights, backed by a Hadoop/Hive data pipeline), to the community-developed Aspects platform (ClickHouse + Apache Superset). Most active deployments are migrating to Aspects.
Current State (2026)
• Aspects: The current-generation analytics platform — event data flows from the LMS to ClickHouse via `event-routing-backends`; Apache Superset dashboards give instructors rich, real-time views
• Legacy Insights: The old edX Insights product (Python + Hadoop + Hive + Django) is effectively deprecated for the community; still may run in some older deployments
• LMS instructor tab: The legacy instructor analytics tab in the LMS provides basic aggregate stats (enrollment count, grade distribution) — still available but not being enhanced
• Event tracking: `event-tracking` captures browser and server events; `event-routing-backends` routes them to ClickHouse and other backends
Architecture
• Event pipeline: Browser/server → `event-tracking` → `openedx-platform` Celery → `event-routing-backends` → ClickHouse (Aspects)
• Aspects stack: Tutor plugin (`openedx-aspects`) installs ClickHouse + Superset; `aspects-dbt` transforms raw events into analytics models
• Superset dashboards: Pre-built dashboards for enrollment, engagement, completion, assessment analytics; instructors access via embedded Superset
• Real-time vs. batch: ClickHouse provides near-real-time analytics (seconds to minutes) vs. old Hadoop pipeline (hours to days)
History
Origin
• Year introduced: ~2013 (basic analytics from early edX)
• Initial implementation: Django-rendered analytics pages in the LMS instructor tab; basic enrollment and grade stats
• Context: Instructors needed visibility into how learners were engaging with their courses; data was also used by edX researchers for learning science
Key Milestones
Basic instructor analytics tab in LMS
edX Insights launched (Hadoop/Hive pipeline)
edX Insights begin to stagnate post-2U acquisition
Aspects project initiated by community
Aspects becomes the recommended analytics approach
Open Questions
- ?Who built edX Insights and what was the original data pipeline architecture?
- ?What drove the decision to build a Hadoop/Hive pipeline rather than using simpler approaches?
- ?Who initiated the Aspects project and what was the community process?
- ?How does the event schema in `event-tracking` compare to industry standards (xAPI, Caliper)?
- ?What analytics questions do instructors most commonly ask that the platform struggles to answer?
- ?How was learning analytics research (Learning Sciences) connected to the platform's analytics infrastructure?