Mining for Data Gold: Ph.D. Student Digs Into Research on MOOCs

September 26, 2018

by Karen Brooks

“If we can tell who is likely to drop out, then we can introduce interventions to support them to stay.”

Nearly eighty million people have enrolled in a massive open online course (MOOC) since the web-based learning platform burst onto the education scene in 2011. But 90 percent of them have never actually completed one.

 Juan Miguel “Miggy” Andres wants to figure out why the MOOC attrition rate is so high—and what can be done to reduce it. A Ph.D. student in Penn GSE’s Teaching, Learning, and Teacher Education program, Andres explores data on MOOC student behavior to identify signs that someone will or won’t follow a course through to its end.

At Penn GSE, doctoral student Juan Miguel “Miggy” Andres has found the opportunity to pursue groundbreaking research and learning outside of his comfort zone. Photo by Ginger Fox Photography.

 “MOOCs are like Netflix shows—there are so many to choose from, and they cover a vast variety of topics,” Andres says of the approximately ten thousand courses offered by eight hundred-plus universities worldwide. Regardless of their content, he says, “at some point, many students seem to reach a place where they stop engaging.”

Research has shown that completing a MOOC benefits learners, whether they are conventional students or full-time professionals looking to boost their work performance. Andres therefore aims to find ways to get participants to stick with their courses.

Raised in the Philippines, where he earned his bachelor’s and master’s degrees in computer science, Andres began pursuing his doctoral degree in 2015 at Teachers College, Columbia University. He transferred to Penn GSE a year later alongside his mentor, Associate Professor Ryan Baker, who founded the Penn Center for Learning Analytics (PCLA)—a laboratory dedicated to investigating and improving teaching and learning methods through data.

Dr. Baker credits Andres with revolutionizing MOOC research by applying the techniques of data mining—a field in which researchers develop computer algorithms to find hidden patterns in large quantities of data—on a scale not seen before. “Miggy has conducted the largest-scale analyses of MOOC data in history,” says Baker. “His technical infrastructure has allowed us to take findings from small-scale studies and see if they can be generalized across a rich diversity of learners and content, which can help instructors design future MOOCs.”

The infrastructure Baker references is the MOOC Replication Framework (MORF), a software system that pinpoints patterns in massive quantities of research about MOOCs. Andres began develop­ing MORF after he, Baker, and their colleagues noticed discrepancies among previous MOOC studies and wanted to determine which find­ ings they could reproduce. They also sought to expand the scope of data being analyzed—prior to MORF’s conception, most MOOC research was conducted by instructors who could access data only from their own courses.

With MORF, Andres has analyzed over one hundred data sets about the MOOCs that Penn offers for free to learners around the world through the Coursera and edX platforms. He has also analyzed data from the MOOCs of several other institutions, including Columbia and the University of Edinburgh in Scotland. “MORF supplies the computational power for researchers to ingest other people’s data sets while keeping everything privacy protected,” says Andres.

Andres is using MORF to dissect students’ interactions in discussion forums, responses to pre-course surveys, and other behaviors, in search of ways to predict who is at risk of abandoning a course. “If we can tell who is likely to drop out, then we can introduce interventions to support them so they stay,” he says. Those interventions might involve sending check-in emails to participants who have failed to log in to a course for a certain amount of time or offering badges as incentives for communicating in discussion forums.

A plan for interventions will help instructors to expend their attention effectively when faced with the vast numbers of students that MOOCs enroll, notes PCLA Associate Director Dr. Jaclyn Ocumpaugh. “What Miggy is doing will help instructors expend their resources in a way that’s beneficial to both individual students and the broader community of participants,” she says.

As a student at Penn GSE, Andres has found not only the opportunity to pursue groundbreaking research, but also the chance to learn outside of his comfort zone. While undertaking his course work, he surprised himself by discovering that he enjoyed his Education, Culture, and Society (ECS) class most of all.

“In computer science, everything is defined by concrete rules. ECS challenged me because the concepts were abstract and up for interpretation. I was completely out of my element and thought I would make a fool of myself the day I had to lead a discussion. But my anxiety ended up turning into excitement, and by the time it was my turn, I couldn’t wait to get started,” recalls Andres, who credits Professor Sigal Ben-Porath with drawing him into the subject matter.

With his course work now complete, Andres is conducting new analyses using a second, more robust version of MORF that the PCLA team recently built in collaboration with partners at the University of Michigan. Baker predicts an “incredibly bright future” for Andres, who expects to earn his Ph.D. by spring 2020.

“He is an excellent learner and an adaptive thinker who can succeed anywhere,” Baker says. “In five years I see Miggy being a faculty member at one of the strongest universities in the world.”

This article originally appeared in the Spring 2018 issue of The Penn GSE Magazine.