Estimation of SIMCE Test Performance with Socioeconomic Data using Ordinal Classifiers
Estimation of SIMCE Test Performance with Socioeconomic Data using Ordinal Classifiers
Authors
Flores Iturra, Andres
Rojas Mora, Julio
IEEE
Rojas Mora, Julio
IEEE
Profesor GuĆa
Authors
Date
Datos de publicaciĆ³n:
10.1109/SCCC54552.2021.9650430
2021 40TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC),Vol.,,2021
2021 40TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC),Vol.,,2021
Tipo de recurso
Meeting
Keywords
Materia geogrƔfica
Collections
Abstract
We propose two ordinal classifiers (logistic regression and LightGBM) to estimate the school learning level (obtained by an ordinal transformation of the school's average score in the math section of the SIMCE test) for students in the 4th grade of primary schooling. We trained these classifiers with socioeconomic variables that characterize Chilean schools. This dataset included Alonso-Villar and Del Rio's local segregation index to measure socioeconomic and gender segregation at the school level compared to the district level. For the socioeconomic segregation, we used a vulnerability criterion based on the status of a student as a recipient of the subsidy established by Law 20.248 (SEP). We used a greedy algorithm based on Variance Inflation Factor (VIF) and the Prediction Power Score (PPS) to automatically select the most orthogonal features in this dataset. This algorithm selected six variables, being the most important the self-esteem and motivation index, and both the socioeconomic and gender segregation of the school. Due to the inherent imbalance in the data, we used Synthetic Minority Over-sampling Technique for Nominal and Continuous (SMOTE-NC) on the training data and random under-sampling on the test data. The classifiers were hyperparametrized with Bayesian optimization to avoid exploring the whole hyperparameter space. Ordinal logistic regression had slightly better AUC and slightly worst accuracy than LightGBM. Nevertheless, a boosting algorithm applied to the ordinal logistic regression classifier slightly improved its performance over LightGBM, both in AUC and accuracy.