Flipped classrooms and MOOCs (Massive Open Online Courses) have become more prevalent since the outbreak of COVID-19. Online learning would inevitably bring more changes to students in the post-covid era. It is never just an alternative to on-site study, similar to distant learning to reduce the trouble caused by geological limitations and other emergencies. Compared with traditional classrooms, online learning blurs the borders of learning in time and space. Online learners have been exposed to a broad network of learning resources thanks to the internet, fulfilling a more sustainable and affordable experience. However, problems remain regarding students’ perseverance with online learning before, during, and after specific time points such as emergencies. Despite previous exploratory research findings (e.g. Esperanza et al., 2016; Subramaniam & Muniandy, 2017) that students in the flipped classroom are more engaged in learning specific courses, fewer studies attend to whether observed positive effects of online learning are sustainable whatever events occur in the real world. Spitzer et al. (2021) discovered that the proportion of students active online during and after the closure of schools owing to COVID lockdowns dropped significantly compared with data in the previous three years. It is reasonable to doubt perseverance with online learning in the flipped class, given some inevitable periods when students are absent from school for one week or longer. This data analysis project aims to explore Estonian high school pupils’ engagement with online learning in an elective: digital products and technologies with survival analysis methods.
Codesters Club, launched and supported by Riesenkampff Foundation, has provided flipped classrooms for Estonian high school students (gymnasium level pupils) for around three years. Unlike traditional classrooms and computer science courses, Codester Club abandons the rigid structure in which teachers require students to follow designed directions; instead, the learning programme covers practical skills in programming and design and soft techniques, both of which are critical to delivering digital products through the combination of teamwork and individual development. Codesters Club has an online learning platform for students to learn independently, prepare before live classes, and complete and submit assignments. Students always need to do homework with other software such as Figma; however, all the learning materials are on the website of Codesters Club. Apart from reviewing stuff, students should also check and read feedback mentors give to their homework. Without analysing online user data, we can never know if this website can satisfy user needs and how they participate in independent learning outside the classroom.
## New names:
## Rows: 233942 Columns: 7
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (7): Time, User full name, Affected user, Event context, Component, Even...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...7`
## 'data.frame': 233942 obs. of 6 variables:
## $ date : Date, format: "2022-11-20" "2022-11-01" ...
## $ time : 'times' num 14:18:23 17:00:33 17:00:33 17:00:33 15:33:48 ...
## ..- attr(*, "format")= chr "h:m:s"
## $ User full name: chr "a0b59c6639358d444032192698d6c1c1b30045ad25e1cff7c72178a89a6ff159" "7820c9d5c3f8f07d5715bd01282d182f515276e519b4339beea1e3065a2c67bd" "7820c9d5c3f8f07d5715bd01282d182f515276e519b4339beea1e3065a2c67bd" "7820c9d5c3f8f07d5715bd01282d182f515276e519b4339beea1e3065a2c67bd" ...
## $ Affected user : chr NA "994634cbd0db5d22951af6c644a64780ee45208c57d70c9aa73d3201e6437fea" "994634cbd0db5d22951af6c644a64780ee45208c57d70c9aa73d3201e6437fea" "994634cbd0db5d22951af6c644a64780ee45208c57d70c9aa73d3201e6437fea" ...
## $ Event context : chr "Course: C0202 | Solo Work Evaluation" "Course: C0202 | Solo Work Evaluation" "Course: C0202 | Solo Work Evaluation" "Course: C0202 | Solo Work Evaluation" ...
## $ Event name : chr "Log report viewed" "Group member added" "Role assigned" "User enrolled in course" ...
Because of technical issues, with the data on previous years missing, the data set covers the raw data on time, users, and events in 2022 from the mid of May to the end of November. The data type is all characters, even the time and date. Nevertheless, relying on the popular package lubridate cannot clean the time stored in irregular forms. I eventually found the package fliptime and chron, which works well with dealing with informal data forms. Finally, I added two new variables: date and time ahead of other columns.
## [1] 305
## [1] FALSE
## [1] 280
## [1] FALSE
## [1] FALSE
The sorted data set contains three kinds of data: date(YMD), time(hms), and character data. Then, I used the package dplyr to transform data. The first goal is to filter data on students from September to November; other tasks include categorising data by three grades and adding a new column of every student’s active hours (for every student, the active hours of every date is equal to the max value minus the minimum value in the vector).
The result displays 305 users online since 5th September, 2022, including mentor users. After transforming data, I extracted 280 student users’ data.
Further transformation divided students into three grades; I used the str_detect function for detecting key words in the column “Event context”. The result shows 122 10th grade students, 94 11th-grade students, and 65 12th-grade students.
## [1] 122
## [1] FALSE
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
## [1] 94
## [1] FALSE
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
## [1] 65
## [1] FALSE
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
This scatter-point plot visualises how many hours 135 students spent learning online. It is apparent that by the end of November, the number of 11th and 12th-grade students who should have been active dropped sharply since the 30th of November. Another noticeable thing is that the density of points gets thinner before the beginning of November —- the inactive period occurred around the mid-term among students regardless of grades. We can assume that active students became fewer with the time moving close to the midterm during which students do not need to go to school. The difference between changes in the proportion of active students in different grades might be significant because of various course plans.
Based on the observation in descriptive statistics, I revised and specified the research question:
Q1: How does 10th-grade students’ perseverance with the online learning environment differ from 11th-grade students before the mid-term?
Q2: How do 11th-grade students differ from 12th-grade students in their perseverance with the online learning environment during the same period?
This project uses survival analysis to evaluate students’ online engagement. Survival analysis is a data analysis approach targeting time-event data: the outcome variable is the time until an event occurs. Prevalently used in medical studies, survival analysis plays a role in comparing and estimating hazard ratios and surviving probabilities across patients. Many studies (e.g. ) also applied this method to assess online learners’ engagement. Specifically, this project evaluates students’ engagement(survival probabilities) and dropout rates(hazard ratio) before a specific time point, the mid-term.
In this project, the survival function S(t) becomes the probability of being active until the first day (24th of October) of the mid-term this semester. Thus, I define survival time as the number of active days that passed from the first day for registration until the event (24th of October) and death as the last activity by that event. With the package dplyr, I sorted out a table to code students who were still active during the mid-term as 0 (censored data) and students who dropped out before the mid-term as 1 (failure). This table functions as the foundation of survival analysis. Also, I installed and loaded the package Survival and survminer, both of which provide necessary functions. The function survfit generates an estimate of the survival curve for censored data in a single event survival with a Kaplan-Meier (KM) estimate. Then, I used the function ggsurvplot to create KM curves. To compare the difference between KM curves, I used the log-rank test. It is a large sample Chi-square test, generating an overall comparison of KM curves; the mechanics undergirding this test is similar to the Chi-square Goodness-of-fit test: assessing the association between categories between observed versus expected counts of failures. Thus, The null hypothesis is that there is no difference between KM curves; the alternative hypothesis is that KM curves differ across three grades.
## # A tibble: 90 × 4
## `User full name` Activ…¹ d Grade
## <chr> <int> <dbl> <chr>
## 1 0105d9485b17c868769dc414241ebd8a29e05f92b6ebb9f111d4b49c… 22 0 10th
## 2 1a6422c85cd5983bc843ecb135449a43c5094082326409e7da4eeeeb… 11 0 10th
## 3 39c0998ea93c8d091f85932ddb2f0e74cce50afc3fda3fc7579a1d06… 11 0 10th
## 4 3cf538d5bb4ff4151cb9da5e18f626b2548d3f1141d5170badf76666… 10 0 10th
## 5 49045e6ab98a7be1859a29365e22e62b7818caaf088d2389b530f9eb… 21 0 10th
## 6 6c608918b5af80abf30cb69a6c655868f6a006a85f98849675027be4… 13 0 10th
## 7 6c9decc9623be1422c894cd2ad73b6ae6991bb6babddf8034bad4b74… 21 0 10th
## 8 7699d131b788f3b99f22355184d4e429ca6c8576d772a48c3970dd84… 14 0 10th
## 9 8881739a3bbd5b23a3cad4d19c0435588e72357951309886eea65730… 27 0 10th
## 10 f5bc72a7f3d8b4091bb378441e8f212b992490422b1f4d0154149dc3… 23 0 10th
## # … with 80 more rows, and abbreviated variable name ¹Active_days
## Call: survfit(formula = Surv(active_table_10V11$Active_days, active_table_10V11$d) ~
## active_table_10V11$Grade, data = active_table_10V11)
##
## active_table_10V11$Grade=10th
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 5 45 2 0.956 0.0307 0.8972 1.000
## 6 43 1 0.933 0.0372 0.8632 1.000
## 7 42 2 0.889 0.0468 0.8017 0.986
## 8 40 1 0.867 0.0507 0.7728 0.972
## 10 39 3 0.800 0.0596 0.6913 0.926
## 11 35 3 0.731 0.0664 0.6123 0.874
## 12 30 2 0.683 0.0703 0.5578 0.835
## 13 28 2 0.634 0.0733 0.5054 0.795
## 14 25 2 0.583 0.0757 0.4522 0.752
## 15 22 5 0.451 0.0783 0.3205 0.634
## 16 17 3 0.371 0.0768 0.2474 0.557
## 17 14 2 0.318 0.0744 0.2011 0.503
## 18 12 1 0.292 0.0728 0.1788 0.476
## 19 11 1 0.265 0.0708 0.1570 0.448
## 20 10 3 0.186 0.0627 0.0957 0.360
## 21 7 1 0.159 0.0591 0.0768 0.329
## 22 4 1 0.119 0.0561 0.0474 0.300
##
## active_table_10V11$Grade=11th
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 4 45 1 0.9778 0.0220 0.93564 1.000
## 5 44 1 0.9556 0.0307 0.89720 1.000
## 7 43 4 0.8667 0.0507 0.77283 0.972
## 8 39 6 0.7333 0.0659 0.61487 0.875
## 9 33 10 0.5111 0.0745 0.38407 0.680
## 10 20 3 0.4344 0.0753 0.30925 0.610
## 11 17 4 0.3322 0.0729 0.21607 0.511
## 12 12 4 0.2215 0.0664 0.12308 0.399
## 13 8 1 0.1938 0.0636 0.10186 0.369
## 14 7 1 0.1661 0.0602 0.08160 0.338
## 15 5 1 0.1329 0.0566 0.05766 0.306
## 16 3 2 0.0443 0.0408 0.00729 0.269
## 17 1 1 0.0000 NaN NA NA
## Call:
## survdiff(formula = Surv(active_table_10V11$Active_days, active_table_10V11$d) ~
## Grade, data = active_table_10V11)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Grade=10th 45 35 51 5.03 20.6
## Grade=11th 45 39 23 11.17 20.6
##
## Chisq= 20.6 on 1 degrees of freedom, p= 6e-06
The KM curves between 10th-grade and 11th-grade students above show that the data provides strong evidence to reject the null hypothesis since the P-value is far below 0.0001. Compared with ten active days (CL: 9-11) for 11th-grade students, the median active days for 10th-grade students are 15 days (CL: 13-17).
## Call: survfit(formula = Surv(active_table_11V12$Active_days, active_table_11V12$d) ~
## active_table_11V12$Grade, data = active_table_11V12)
##
## active_table_11V12$Grade=11th
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 4 45 1 0.9778 0.0220 0.93564 1.000
## 5 44 1 0.9556 0.0307 0.89720 1.000
## 7 43 4 0.8667 0.0507 0.77283 0.972
## 8 39 6 0.7333 0.0659 0.61487 0.875
## 9 33 10 0.5111 0.0745 0.38407 0.680
## 10 20 3 0.4344 0.0753 0.30925 0.610
## 11 17 4 0.3322 0.0729 0.21607 0.511
## 12 12 4 0.2215 0.0664 0.12308 0.399
## 13 8 1 0.1938 0.0636 0.10186 0.369
## 14 7 1 0.1661 0.0602 0.08160 0.338
## 15 5 1 0.1329 0.0566 0.05766 0.306
## 16 3 2 0.0443 0.0408 0.00729 0.269
## 17 1 1 0.0000 NaN NA NA
##
## active_table_11V12$Grade=12th
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 3 45 1 0.9778 0.0220 0.93564 1.000
## 5 44 2 0.9333 0.0372 0.86323 1.000
## 6 42 6 0.8000 0.0596 0.69127 0.926
## 7 35 2 0.7543 0.0644 0.63808 0.892
## 8 33 4 0.6629 0.0710 0.53737 0.818
## 9 28 5 0.5445 0.0755 0.41491 0.715
## 10 23 3 0.4735 0.0760 0.34570 0.648
## 11 18 3 0.3946 0.0758 0.27082 0.575
## 12 14 4 0.2818 0.0721 0.17070 0.465
## 13 10 3 0.1973 0.0649 0.10351 0.376
## 15 5 2 0.1184 0.0582 0.04517 0.310
## 16 3 1 0.0789 0.0504 0.02255 0.276
## 19 2 1 0.0395 0.0376 0.00609 0.255
## Call:
## survdiff(formula = Surv(active_table_11V12$Active_days, active_table_11V12$d) ~
## Grade, data = active_table_11V12)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Grade=11th 45 39 38 0.0286 0.0699
## Grade=12th 45 37 38 0.0286 0.0699
##
## Chisq= 0.1 on 1 degrees of freedom, p= 0.8
By contrast, the curves between 11th-grade and 12th-grade students display weak or no evidence to support the alternative hypothesis (P = 0.79). The median of active days for 12th-grade pupils is 10 (CL: 8-11), too. In other words, the difference between the two curves for 11th and 12th-grade students is tiny.
The results are similar to what I presented on the 15th of December even though I removed ineffective data ---- the rows whose active hours are equal to 0. Such rows exist because the website stored the date and time an individual user started activity without logging the time they ended online performances. However, I did not determine the significance level in reviewing the data analysis for this report. Without thoroughly understanding the significance level or P-value, novice researchers or students easily trap themselves in the superstition of a lower P-value. For instance, had I decided the significance level to be 1 per cent, the previous presentation might have reached a different conclusion. According to the ASA statement (Bruce et al., 2020; Wasserstein & Lazar, 2016), the P-value represents the probability that extreme results might occur against a chance model. Chasing the lowest P-value would lead to confusion as that lowest decimal cannot prove the hypotheses to be true. Removing ineffective data may either strengthen or qualify the previous results; rather than relying on the significance level and P-value, neither of which can determine the correctness of studied hypotheses, it is more sensible to test the reliability of inferential analysis in samples of different sizes, more sorted data sets, and more appropriate mathematic models.
Though the log-rank test is flexible, its inability to deal with the problem surrounding covariance influences the reliability of data analysis results. By contrast, Cox proportional haphazard (PH) models are more suitable as we should consider the connection between students’ active hours on the Codesters Club website and their engagement with other online platforms; for instance, some software they have to use for homework. Also, Cox PH models provide a thinking pattern for dealing with recurrent events, on which this project should also focus. For example, a similar time point when students do not need to attend school for a long time occurs again before Christmas. Finally, this project needs a validation replica. The initial purpose of starting this project is to compare engagement during and after the school closures last year and this year. However, we have no access to the data for technical issues.
This project applies survival analysis to evaluate gymnasium-level pupils’ engagement with an online learning environment. The analysis results reveal that owing to more time spent on other platforms for programming practice and teamwork, 50 per cent of 11th-grade and 50 per cent of 12th-grade students left the Codesters Club website within ten days; while 50 per cent of them dropped in longer than ten days. Only one 12th-grade student was active during the mid-term after 20 days. By contrast, six students in the tenth grade were still active after 21 days; more stuff to read and more new skills to learn can explain why 10th-grade pupils were more engaged than the other two grades. Future studies should focus on covariates and how to use Cox PH models to re-assess their perseverance.
Bacca‐Acosta, J., & Avila‐Garzon, C. (2020). Student engagement with mobile‐based assessment systems: A survival analysis. Journal of Computer Assisted Learning, 37(1), 158–171. https://doi.org/10.1111/jcal.12475
Bruce, A., Bruce, P., & Gedeck, P. (2020). Statistical experiments and significance testing. In Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python (2nd ed., pp. 87–139). O’Reilly Media. Chen, C., Sonnert, G., Sadler, P. M., Sasselov, D. D., Fredericks, C., & Malan, D. J. (2020). Going over the cliff: MOOC dropout behavior at chapter transition. Distance Education, 41(1), 6–25. https://doi.org/10.1080/01587919.2020.1724772
Esperanza, P., Fabian, K., & Toto, C. (2016). Flipped Classroom Model: Effects on Performance, Attitudes and Perceptions in High School Algebra. Adaptive and Adaptable Learning, 85–97. https://doi.org/10.1007/978-3-319-45153-4_7
Kleinbaum, M., & Klein, M. (2011). Survival Analysis: A Self-Learning Text, Third Edition. Springer Publishing. Schult, J., Mahler, N., Fauth, B., & Lindner, M. A. (2022). Did students learn less during the COVID-19 pandemic? Reading and mathematics competencies before and after the first pandemic wave. School Effectiveness and School Improvement, 33(4), 544–563. https://doi.org/10.1080/09243453.2022.2061014
Spitzer, M. W. H., Gutsfeld, R., Wirzberger, M., & Moeller, K. (2021). Evaluating students’ engagement with an online learning environment during and after COVID-19 related school closures: A survival analysis approach. Trends in Neuroscience and Education, 25, 100168. https://doi.org/10.1016/j.tine.2021.100168
Subramaniam, S. R., & Muniandy, B. (2017). The Effect of Flipped Classroom on Students’ Engagement. Technology, Knowledge and Learning, 24(3), 355–372. https://doi.org/10.1007/s10758-017-9343-y
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108