Skip to Center for Data, Mathematical, and Computational Sciences site navigationSkip to main content

MS Thesis Archive

Spring 2022


Colette Joelle Barca, M.S. Data Science

A student’s high school experience often forms the foundation of his or her postsecondary career. As the competition in our nation’s job market continues to increase, many businesses stipulate applicants need a college degree. However, recent studies show approximately one-third of the United States’ college students never obtain a degree. Although colleges have developed methods for identifying and supporting their struggling students, early intervention could be a more effective approach for combating postsecondary dropout rates. This project seeks to use anomaly detection techniques to create a holistic early detection tool that indicates which high school students are most at risk to drop out of college. An individual’s high school experience is not confined to the academic components. As such, an effective model should incorporate both environmental and educational factors, including various descriptive data on the student’s home area, the school’s area, and the school’s overall structure and performance. This project combined this information with data on students throughout their secondary educational careers (i.e., from ninth through twelfth grade) in an attempt to develop a model that could detect during high school which students have a higher probability of dropping out of college. The clustering-based and classification-based anomaly detection algorithms detail the situational and numeric circumstances, respectively, that most frequently result in a student dropping out of college. High school administrators could implement these models at the culmination of each school year to identify which students are most at risk for dropping out in college. Then, administrators could provide additional support to those students during the following school year to decrease that risk. College administrators could also follow this same process to minimize dropout rates.

Full Text


Miguel A. Esteban Diaz, M.S. Data Science

NBA Top Shot moments are NFTs built on the FLOW blockchain and created by Dapper Labs in collaboration with the NBA. These NFTs, commonly referred to as “moments”, consist of in-game highlights of an NBA or WNBA player. Using the different variables of a moment, like for example: the type of play done by the player appearing in the moment (dunk, assist, block, etc.), the number of listings of that moment in the marketplace, whether the player appearing in the moment is a rookie or the rarity tier of the moment (Common, Fandom, Rare or Legendary). This project aims to provide a statistical analysis that could yield hidden correlations of the characteristics of a moment and its price, and a prediction of the price of moments with the use of machine learning regression models which include linear regression, random forest or neural networks. As NFTs, and especially NBA Top Shot, are a relatively recent area of research, at the moment there is not extensive research performed about this area. This research has an intent to expand the up to date analysis and research performed in this topic and serve as a foundation for any future research in this area, as well as provide helpful and practical information about the valuation of moments, the importance of the diverse characteristics of moments and impact in the pricing of the moments and the future possible application of this information to other similar highlight-oriented sport NFTs like NFL AllDay or UFC Strike, which are designed similarly to NBA Top Shot.

Full Text


Nirusha Srishan, M.S. Data Science

Teacher turnover rate is an increasing problem in the United States. Each year, teachers leave their current teaching position to either move to a different school or to leave the profession entirely. In an effort to understand why teachers are leaving their current teaching positions and to help identify ways to increase teacher retention rate, I am exploring possible reasons that influence teacher turnover and creating a model to predict if a teacher will leave the teaching profession. The ongoing turnover of teachers has a vast impact on school district employees, the state, the country, and the student population. Therefore, exploring the variables that contribute to teacher turnover can ultimately lead to decreasing the rate of turnover.

This project compares those in the educational field, including general education teachers, special education teachers and other educational staff, who have completed the 1999-2000 School and Staffing Survey (SASS) and Teacher Follow-up Survey (TFS) from the National Center for Educational Statistics (NCES, n.d.). This data will be used to identify trends in teachers that have left the profession. Predictive modeling will include various machine learning techniques, including Logistic Regression, Support Vector Machines (SVM), Decision Tree and Random Forest, and K-Nearest Neighbors. By finding the reasons for teacher turnover, a school district can identify a way to maximize their teacher retention rate, fostering a supportive learning environment for students, and creating a positive work environment for educators.

Full Text


Emmanuel Torres, M.S. Data Science

Organizations use multifaceted modern call centers and are currently utilizing antiquated forecasting technologies leading to erroneous staffing during critical periods of unprecedented volume. Companies will experience financial hemorrhaging or provide an inadequate customer experience due to incorrect staffing when sporadic volume emerges. The current forecasting models being employed are being used with known caveats such as the inability for the model to handle wait time without abandonment and only considers a single call type when making the prediction.

This study aims to create a new forecasting model to predict the Average Speed of Answer (ASA) to obtain a more accurate prediction of the staffing requirements for a call center. The new model will anticipate historical volume of varying capacities to create the prediction. Both parametric and nonparametric methodologies will be used to forecast the ASA. An ARIMA (Autoregressive Integrated Moving Average) parametric model was used to create a baseline for the prediction. The application of machine learning techniques such as Recurrent Neural Networks (RNN) was used since it can process sequential data by utilizing previous outputs as inputs to create the neural network. Specifically, Long Short-Term Memory (LSTM) recurrent neural networks were used to create a forecasting model for the call center ASA.

With the LSTM neural network a univariate and multivariate approach was utilized to forecast the ASA. The findings confirm that univariate LSTM neural networks resulted in a more accurate forecast by netting the lowest Root Mean Squared Error (RMSE) score from the three methods used to predict the call center ASA. Even though the univariate LSTM model produced the best results, the multivariate LSTM model did not stray far from providing an accurate prediction but received a higher RMSE score compared to the univariate model. Furthermore, ARIMA provided the highest RMSE score and forecasted the ASA inaccurately.

Full Text

Fall 2021


Kyle Philip Calabro, M.S. Data Science

Within this thesis work, the applications of data augmentation, transfer learning, and image enhancement techniques were explored in great depth with respect to speech emotion recognition (SER) via convolutional neural networks and the classification of spectrogram images. Speech emotion recognition is a challenging subset of machine learning with an incredibly active research community. One of the prominent challenges of SER is a lack of quality training data. The methods developed and presented in this work serve to alleviate this issue and improve upon the current state-of-the-art methodology. A novel unimodal approach was taken in which five transfer learning models pre-trained on the ImageNet data set were used with both the feature extraction and fine-tuning method of transfer learning. Such transfer learning models include the VGG-16, VGG-19, InceptionV3, Xception and ResNet-50. A modified version of the AlexNet deep neural network model was utilized as a baseline for non pre-trained deep neural networks. Two speech corpora were utilized to develop these methods. The Ryerson Audio-Visual Database of Emotional Speech and Songs (RAVDESS) and the Crowd-source Emotional Multimodal Actors dataset (CREMA-D). Data augmentation techniques were applied to the raw audio of each speech corpora to increase the amount of training data, yielding custom data sets. Raw audio data augmentation techniques include the addition of Gaussian noise, stretching by two different factors, time shifting and shifting pitch by three separate tones. Image enhancement techniques were implemented with the aim of improving classification accuracy by unveiling more prominent features in the spectrograms. Image enhancement techniques include conversion to grayscale, contrast stretching and the combination of grayscale conversion followed by contrast stretching. In all, 176 experiments were conducted to provide a comprehensive overview of all techniques that were proposed as well as a definitive methodology. Such methodology yields improved or comparable results to what is currently considered to be state-of-the-art when deployed on the RAVDESS and CREMA-D speech corpora.

Full Text