Spring 2025

Predictive Modeling for Email Marketing Success: Optimizing Campaign Deliverability and Google Postmasters Metrics via Tree-based Regression

Diana Olivia Arva, M.S. Data Science

Email remains the most effective and widely used channel to engage with consumers, typically utilized through coordinated email campaigns. These campaigns consist of a series of emails sent to a target audience with the intent of leading users to interact with a call to action (CTA). The CTA is clearly defined and placed within the body of the email, and interaction can take various different forms. Some include signing up on a website, clicking on a provided link or starting a subscription service.

However, email campaigns face two core issues, the first is that a significant portion of emails within these campaigns fail to reach the recipient’s inbox. The second is that companies are also striving to seek ways to optimize how engaging their emails are for the user. The primary goals of the research are to improve email delivery rates, maximize inbox placement within the target audience and enhance user engagement metrics.

To address those goals, the research investigated two key objectives: first, identifying which email factors most directly correlate with delivery success, and second, analyzing how modifications within an email campaign rollout can improve both delivery success and engagement metrics. The research followed a structured workflow of data collection and dataset construction, predictive modeling, and segmented tracking A/B testing.

The first phase of the research involved collecting and constructing a dataset to analyze and predict the factors correlated with email deliverability. The dataset consisted of 19 distinct email campaign features across approximately 300 campaigns. All features were extracted and compiled from historical campaign data stored in a digital marketing company’s internal database. Key features include metrics such as click-through rate (CTR), which measures the rate of users who clicked on the CTA out of those who opened the email, as well as other engagement metrics, delivery event data and third-party metric tools like Google Postmasters.

Following dataset construction, two tree-based machine learning models, Decision Tree and Random Forest regressors, were trained to predict email delivery, with the primary goal of obtaining insights into the key influencing factors. Model optimization and evaluation were additionally performed using hyperparameter tuning and 5-fold and 10-fold cross validation. The models identified top critical factors, with high email send frequency contributing 50.51% and 53.80% of the predictive power influencing email delivery success, respectively.

Insights from the predictive modeling then informed the design of an A/B test, which evaluated whether historically low-engaging users and those with past delivery issues were associated with lower engagement metrics. The results confirmed the hypothesis that users with historical poor performance lowered engagement metrics, and that suppressing emails to these users may help preserve Google Postmasters metrics, increasing the likelihood that emails reach the user’s inbox rather than being filtered as spam or blocked entirely.

This research contributes to the evolving field of marketing optimization by demonstrating how predictive modeling and experimental testing can identify and address
campaign inefficiencies in large-scale email campaigns. It also highlights future research frameworks, including sentiment analysis of email content and segmentation strategies to target users within respective demographics, with the ultimate target of enhancing both engagement and deliverability in digital email marketing strategies.

Full Text

Analyzing Artificial Intelligence’s Ability to Detect
Misinformation

Max Bilyk, M.S. Data Science

Misinformation and disinformation represent critical societal challenges of the 21st century, significantly amplified by rapid advancements in digital technology. The proliferation of generative artificial intelligence (AI) exacerbates these problems, enabling false narratives to spread at unprecedented speeds, undermining public trust, polarizing societies, and endangering democratic processes. Traditional methods, such as manual fact-checking, governmental initiatives, and educational programs, while effective, are increasingly insufficient in addressing the scale and immediacy of digital misinformation.

This thesis aims to critically evaluate artificial intelligence’s potential in addressing the misinformation crisis. Specifically, it investigates how AI-driven techniques, particularly natural language processing (NLP), can improve misinformation detection and fact-checking processes. Further, it examines ethical considerations surrounding AI use, evaluates practical and technical implementation challenges, and proposes solutions to improve these technologies.

A mixed-methods approach was employed, encompassing historical analysis of misinformation, review of existing solutions, examination of contemporary AI technologies, and detailed case studies evaluating AI’s application in real-world misinformation scenarios. Additionally, a quantitative performance analysis of an AI-driven misinformation classifier was conducted using a structured prompt engineering method. This involved scoring news articles on factuality, logic, sentiment, and bias, using a composite measure tested against a labeled dataset of verified true or false articles.

The thesis demonstrated that AI systems, particularly large language models (LLMs), show substantial promise in misinformation detection, achieving over 90% accuracy when optimally calibrated. Real-world case studies, including the UK-based organization Full Fact, 1 revealed AI’s capacity to significantly enhance fact-checking efficiency and responsiveness. Nevertheless, the study identified critical limitations, including AI’s difficulties in nuanced contextual understanding, bias propagation, ethical dilemmas, and environmental sustainability concerns. The research highlights the necessity of continued human oversight—particularly through human-in-the-loop (HITL) models—to address AI’s current limitations.

The findings underscore that AI, while not flawless, holds promise as a scalable, effective tool against misinformation when complemented by rigorous ethical frameworks, transparency (via Explainable AI), multimodal approaches, human-in-the-loop systems and widespread AI literacy initiatives. The broader implications suggest that successful deployment of AI in misinformation detection necessitates interdisciplinary collaboration, proactive bias mitigation, robust public education, and sustained human involvement. Addressing misinformation through AI is not only a technological pursuit but fundamentally an ethical and societal responsibility crucial for maintaining the integrity of public discourse and democratic institutions in the digital age.

Full Text

Exploring Jane Addams Papers Project Documents Through Topic
Modeling and Multilabel Classification

Olivia Church, M.S. Applied Mathematics

The Jane Addams Papers Project at Ramapo College of New Jersey compiles documents relating to Jane Addams. An American activist and social worker, Addams was an influential member of many political and social movements throughout the nineteenth and twentieth centuries, advocating for women’s suffrage, child labor reform, and peace, among other matters. The Digital Edition of the Jane Addams Papers Project contains digital versions of the documents, as well as a variety of other features, such as tags that categorize the documents based on their content. To explore new ways of analyzing and organizing documents from the Digital Edition, two machine learning techniques were implemented: topic modeling and multilabel classification. In addition to extracting insights from the documents and developing an automated method of assigning tags, a central aim of this research was to investigate how topic modeling and multilabel classification can be bridged to enrich analyses of texts.

Using a subset of documents from the Digital Edition, speeches and articles written by Jane Addams, latent Dirichlet allocation (LDA) topic modeling identified central topics, or themes, including international affairs and conflicts, child labor, and women’s suffrage. A variety of multilabel classification models were utilized to predict tags. The problem transformation algorithm Binary Relevance used in conjunction with a Multinomial Naive Bayes classifier had the best performance, though a higher accuracy would have been more desirable. To link the topic modeling and multilabel classification results, each document and tag was assigned to a specific topic. A connection between the topics and predicted tags of documents was evident, with the multilabel classifier often predicting tags related to the topic of their corresponding document. Therefore, when used together, topic modeling and multilabel classification may 1 complement each other, potentially contributing to a greater understanding of the subject matter of texts.

Full Text

Building a Collaborative Recommender System for Magic the
Gathering

Brian DeNichillo, M.S. Data Science

The goal of this research is to address the challenge of deck construction in Magic the Gathering’s Commander format, a task requiring players to create a deck of 100 cards from a card pool of over 28,000 different cards while also adhering to the color identity constraints of the card chosen to be their commander. The objective is to develop a recommendation system, a tool that uses collaborative filtering to suggest relevant cards to the player based deck construction patterns of the commander community.

The recommender system utilizes Alternating Least Squares (ALS) matrix factorization to identify latent features which capture the relationship between cards in a Commander deck. A model was trained using 220,000 player-created decks scraped from a popular deck building website. The model was tuned by systematically testing various configurations of hyperparameters which include latent factors, regularization values, confidence scaling, and iteration counts to determine the optimal configuration.

A final model was produced using 600 latent factors, regularization of 2.25, alpha of 10, and iteration count of 25. This parameter configuration resulted in an F1 score of 0.33 and MRR of 0.063. Additionally, it had a precision@5 of 0.64 and precision@10 of 0.58 when tested with a seed of 40%, meaning that 64% of the top 5 and 58% of the top 10 recommendations appeared in the test decks.

Full Text

Using Free-Text Clinical Notes to Improve Model Performance
in Healthcare

Daniel Figueiras, M.S. Data Science

Predictive models in healthcare often rely solely on structured data, missing crucial context contained in free-text clinical notes and thereby limiting accurate outcome prediction. This study quantified the impact of incorporating free-text discharge summaries alongside structured data to improve one-year mortality prediction by evaluating both resampling techniques and Natural Language Processing (NLP) methods.

Using the MIMIC-IV and MIMIC-IV-Note datasets, five machine learning model types were trained with structured data alone versus structured data combined with insights extracted from clinicals notes using four NLP techniques (Bag of Words (BoW), Binary BoW, Term Frequency-Inverse Document Frequency (TF-IDF), and Sentiment Analysis). A hybrid resampling method addressed severe class imbalance. Performance was primarily evaluated using recall due to the nature of outcomes being predicted.

Baseline models, trained using only structured data, obtained poor recall scores (~0.17). Resampling was essential, boosting average recall by ~61.5%. Integrating clinical notes further improved performance. The gradient boosting model trained using TF-IDF features achieved the highest recall (0.779), a 4.6% gain over its baseline after resampling. TF-IDF and BoW were the most effective NLP methods overall. Key features from the best performing model included age and discharge location (from the structure data) and note terms (i.e. CT, disease).

Overall, the inclusion of free-text clinical notes, combined with effective resampling, significantly enhances the performance of healthcare models, resulting in improved identification of high-risk patients and ultimately contributing to better patient care.

Full Text

Quant Model Tools

Yussof, Kasmi M.S. Data Science

Investing in financial instruments, such as stocks, has been a significant pillar of stability, savings, and wealth generation for decades due to the potential for positive returns. In response to such high investment activity, the stock market has risen exponentially over the last four decades, despite facing challenges such as severe market corrections, financial market crashes, and recessions. Over the years, this has led to maximizing positive investment returns while minimizing negative ones which has led to the emergence of algorithmic trading models. As trading models become more sophisticated, they require a substantial amount of analysis, including testing the model under simulated conditions, comparing trades, cleaning and sourcing price data, and optimizing profitable parameters.

This thesis presents a framework developed in the C++ programming language that enables the execution of multiple trading simulations using user-provided trading models, thereby generating meaningful performance insights for the user-provided model. This framework will allow users to configure their trading model through configuration files, backtest and optimize it across various simulated market conditions, and run WFA (Walk-Forward Analysis), which simulates a trading model against unseen live market data as if it were trading live in the market. Furthermore, the trading model will be able to utilize a wide variety of socks in its portfolio per the user’s request.

The goal of this thesis is to develop the tools to combat overfitting in trading models. As the number of investors, the amount of money, and the complexity of trading activity in the market increases, there is a significant need for a tool such as this framework in developing robust trading models.

Full Text

A Computational Analysis of the Thyroid Imaging and
Reporting Data System

Olivia Luisi, M.S. Computer Science

The detection of thyroid cancer is uniquely based upon a standardized system of numerical analysis. After a nodule is detected on a patient’s thyroid, the ultrasound images are analyzed to determine the level of need for a biopsy. The majority of first world countries follow the basis of the Thyroid Imaging Reporting and Data System created by the American College of Radiology. Each country however has their own rating system to determine the level of danger or suspicion surrounding the nodule, this has led to some country’s systems being more or less sensitive in electing for a biopsy of the nodule. While the United States has a numerical value system, the European Union and South Korea have an algorithmic flow chart to determine the nodules rating, and the newer Chinese system focuses on dominant features of likely malignancy. Each has their own strengths and weaknesses and in an attempt to better explain their differences, comparing their rates of positive identification will allow for a greater understanding. Patients of Thyroid Cancer are rarely given such insight into the mechanisms which declare the safety of their own health, this project seeks to allow patients to see the data behind what they are being told in their reports and compare their own cases against the systems handling of cases like their own.

Full Text

An Empirical Comparative Analysis of Skyline Query
Algorithms for Incomplete Data

Anthony Messana, M.S. Computer Science

Skyline queries are a popular and useful technique for multi-criteria analysis, but the presence of incomplete data complicates the retrieval of the skyline. Namely, incompleteness introduces the problems of intransitivity and cyclic dominance. Over the years, many algorithms have been developed to find skylines over incomplete data by addressing the two aforementioned problems. For software engineers working on Big Data applications or for researchers interested in the incomplete Skyline problem, it can be useful to know in which contexts a particular class of algorithm may perform best. We sought to investigate the differing approaches to dealing with the unique challenges of computing the skyline over incomplete data. We picked three recently developed algorithms to represent certain classes of incomplete skyline algorithms, and benchmarked them in different contexts. We controlled for the correlation of the dataset, the size of the dataset, and the dimensionality of the dataset. The three algorithms, PFSIDS (Liu et. tal), TSI (He et. al) and BTIS(Yuan et. al) represent sorting-based, table-scan based, and bucket-based approaches respectively. We found that the sorting-based algorithm performed the best in general, except in the case of high dimensional anti-correlated data. The table-scan based algorithm was observed to work best in small, higher-dimensional datasets and its performance did not change significantly with respect to the correlation of the data. The bucket-based approach generally performed the worst, which we believe to be due to the overhead of initializing the possibly large number of buckets for each data class.

Full Text

The Impact Of Natural Disasters On Border Crossings In The
US

Sai Harshitha Dalli, M.S. Data Science

The U.S.-Mexico and U.S.-Canada borders are vital to the trade and tourism sector of the country, which influences the economy, but the borders are susceptible to natural disasters. The research explores natural disasters as a factor for border crossing values by utilizing two datasets: U.S. border crossing entry data and FEMA disaster declarations data. The goal is to analyze patterns and use modeling to forecast the border volumes and predict volumes based on disasters to help find the impact of the two.

Exploratory Data Analysis (EDA), K-Means and K-Prototypes clustering, and SARIMA and ARIMA forecast models were applied to the border crossing data set to identify the important factors influencing the border volumes. The two data sets were joined by state, year, and month, for which statistical tests such as the Welch’s test were used to test the difference between volumes one month before and one month after a specific disaster type had occurred. A Generalized Linear model (GLM) with Poisson distribution, and a Negative Binomial model were fit for both the U.S.-Mexico and U.S.-Canada borders after checking for dispersion statistics for predicting border volumes based on disaster count and other border predictors.

The clustering model showed traffic patterns where there were more truck traffic crossings in the southern ports, and personal vehicle traffic was higher in the northern ports. The forecasting models captured seasonal trends, showing future volumes. The disaster periods show higher differences in traffic volumes one month before and after a disaster occurrence, though it is not statistically significant by Welch’s test. The negative binomial models suggested that disaster declarations were not strong predictors of crossing values, though they were slightly positive. Variables such as personal vehicle passengers and personal vehicles are strong 1 predictors for both models. Pedestrian crossings are more impactful in predicting the number of crossings in the Mexico border, while rail containers were more influential in predicting the number of crossings in the Canada border.

These findings call for additional research in the future about the border-specific, state-specific, or disaster-specific analysis to dig deeper into the analysis and help build better disaster response policies. Since the data that is used is considered as count data, which is data aggregated by month or year, different regression models such as Generalized Linear Models were used in the research. In the future, more count analysis and models could be explored, and can be compared to the Generalized Linear Model to compare and find the best fit predictive model.

Full Text

Fall 2024

Game Genie: The Ultimate Video Game Recommendation System

Nicholas D’Amato, M.S. Computer Science

The video game market has a lot of variety and competition. It is hard for gamers to figure out a game to invest their time and money into. A lot of the leading video game companies such as Game Freak, Blizzard, EA, Bethesda, and Activision, created a highly valued reputation with good games in the past. However, in the present, they are making unpolished/unfinished games to get more money which people buy because of their high reputation. With an expert system that suggests games based on user ratings and personalized recommendations rather than popularity, gamers will have a better experience finding a game they prefer. Currently, there are a few of these video game recommendation systems either on the video game console or online, but all of them have flaws to them. These expert systems attempt to recommend games to people and include many different algorithms to predict and suggest games that match a specific user’s preferences. However, these current recommendation systems lack personalized recommendations or lack in the data of games they have thus either giving the same recommendations to all users or giving the user a console-specific game. To solve this problem an expert system with personalized recommendations is needed. The system should incorporate a database that includes a variety of relevant games containing real-world data and put it into a web application. Then making sure the program worked properly and errors and bugs were eliminated from the web application. After completing and testing the proposed recommendation system, the final step was to observe the results and compare the system with other video game recommendation systems.

GenEthic Analysis: Building a Secure and Accessible Genetic Analysis Framework

Sapir Sharoni, M.S. Data Science

The latest advances in genetic research have paved the way for innovative new applications of genetic data in areas such as ancestry research, forensic science, and medicine. However, the
current Direct-To-Consumer (DTC) genetic platforms often have limited accessibility and utility, posing significant challenges for researchers and other professionals. Furthermore, concerns
about the privacy and security of data within popular DTC companies persist among users. To address these limitations, a framework was developed for a predictive genetic analysis tool
that prioritizes privacy, security, and user-friendliness. This study focused on predicting observable traits, including ancestry, biological sex, blood type, and eye and hair color, using
single nucleotide polymorphisms (SNPs). A machine-learning-driven methodology was employed, integrating data preprocessing, standardized genotype encoding, and model evaluation. Models such as Gradient Boosting and Neural Networks were used to predict traits, demonstrating high accuracy across categories, including blood type and population groups (96%). The results demonstrate that using the proposed framework it is feasible to create a genetic analysis tool capable of bridging the gap between privacy and security and practical usability. It is important to note that the framework presented is adaptable, enabling its application across various industries. While this study focused on observable traits, future research can extend to various domains.

Summer 2024

Time Series Analysis on Produce Truck Load Shipments

Ernest Barzaga, M.S. Data Science

The dataset for this project is sourced from a major freight broker based in New Jersey, with an annual revenue of approximately $200 million. The primary objectives are to implement techniques for handling missing data across the client’s highest-volume lanes, to prepare the dataset for modeling and analysis, and to predict truck costs. Additionally, the project explores the impact of external macroeconomic factors on the trucking industry and their relationship to truck costs. In the modeling phase, analysis is focused on a single lane—Salinas, California, to the Bronx, New York—due to its high shipment volume for produce. Various machine learning models were evaluated on this lane, with ARIMA performing best when the year 2022 was excluded from the training set, resulting in a root mean squared error (RMSE) of $436. SARIMA performed best when 2021 was excluded, yielding an RMSE of $834. Based on this initial iteration of modeling, recommendations for future modeling techniques were made, including the use of a vector autoregression model. This suggestion arose from hypothesis tests (Engle-Granger) that indicated the collected macroeconomic factors may have predictive power
regarding truck costs.

Spring 2024

Time Series Analysis on Differing Climate Regions

Jacob Insley, M.S. Data Science

Droughts, hurricanes, tornadoes, and other climate disasters wreak havoc in all corners of the world. Constantly, scientists and mathematicians are working on ways to predict such events
and learn more about them. Unfortunately, the weather remains incredibly difficult to predict. If we can learn more about how data science and time series methods work on a variety of climate
regions, we can understand how to put them to better use.

Two locations with very different seasonal patterns were looked at: Bergen County, NJ and Napa County, CA. Droughts were classified in both regions and different drought indices were compared in their ability to identify droughts. Time series techniques were used to predict the amount of precipitation in each location. A least squares regression model with a seasonal component and a SARIMA model were created to predict precipitation. We were able to discover some of the strengths and weaknesses of these tools when used on data from different climate regions.

The Standardized Precipitation Index was able identify short term drought well but failed to identify droughts during Napa County summers. Palmer Drought Severity Index identified droughts all year long in both locations, but only identified droughts if they occurred over several months. The SARIMA model decisively portrayed the seasonal pattern of Napa’s precipitation, but made more accurate predictions for the more stable climate of Bergen County.

Scientists can use this data to better equip these tools to handle situations which they do not excel at. We can use what we learned about how drought is measured in both locations to find ways to improve the indices we measure with.

Fall 2023

Summer 2023

Spring 2023

The Food and Drug Administration (FDA) uses a format known as SEND (Standard for Exchange of Nonclinical Data) to evaluate non-clinical (animal) studies for investigational new drug applications. Investigative drug sponsors currently use information from historical and control data to determine if drugs cause toxicity.

The goal of this study is to identify outlying data points that may indicate an investigative new drug could be toxic. Examples include a negative body weight gain over time, enlarged organ weights, or laboratory test abnormalities, especially in relation to a control group within the same study. Flagged records can be analyzed by a veterinarian or pathologist for potential signs of toxicity without looking at each individual data point.

Common domains within the non-clinical pharmaceutical studies were evaluated using changes from baseline measurements, changes from the control group, a percent change from the previous measurement with reference to the ethical guidelines, values outside of the mean ± two standard deviations, and a measure of abnormal findings to unremarkable findings in pathology. A program was designed to analyze five of these domains and return a collection of possible outlying data for simpler and faster than individual data point analysis by a study monitor, performing the analysis in a fraction of the time. The resulting file is more easily read by someone unfamiliar with the SEND format.

With this program, analyzing a study for possible toxic effects during the study can save time, effort, and even animal lives by identifying the signs of toxicity early. Sponsors or CROs can determine if the product is safe enough to proceed with testing or should be stopped in the interest of safety and additional research.

MS Thesis Archive

Spring 2025

Predictive Modeling for Email Marketing Success: Optimizing Campaign Deliverability and Google Postmasters Metrics via Tree-based Regression

Diana Olivia Arva, M.S. Data Science

Analyzing Artificial Intelligence’s Ability to Detect Misinformation

Max Bilyk, M.S. Data Science

Exploring Jane Addams Papers Project Documents Through Topic Modeling and Multilabel Classification

Olivia Church, M.S. Applied Mathematics

Building a Collaborative Recommender System for Magic the Gathering

Brian DeNichillo, M.S. Data Science

Using Free-Text Clinical Notes to Improve Model Performance in Healthcare

Daniel Figueiras, M.S. Data Science

Quant Model Tools

Yussof, Kasmi M.S. Data Science

A Computational Analysis of the Thyroid Imaging and Reporting Data System

Olivia Luisi, M.S. Computer Science

An Empirical Comparative Analysis of Skyline Query Algorithms for Incomplete Data

Anthony Messana, M.S. Computer Science

The Impact Of Natural Disasters On Border Crossings In The US

Sai Harshitha Dalli, M.S. Data Science

Fall 2024

Game Genie: The Ultimate Video Game Recommendation System

Nicholas D’Amato, M.S. Computer Science

GenEthic Analysis: Building a Secure and Accessible Genetic Analysis Framework

Sapir Sharoni, M.S. Data Science

Summer 2024

Time Series Analysis on Produce Truck Load Shipments

Ernest Barzaga, M.S. Data Science

Spring 2024

Time Series Analysis on Differing Climate Regions

Jacob Insley, M.S. Data Science

PREDICTING FIRST YEAR RETENTION FOR UNDERGRADUATE EDUCATIONAL OPPORTUNITY FUND STUDENTS

Kelly O’Neill, M.S. Applied Mathematics

Fall 2023

PROTOTYPING A LITERARY ANALYSIS TOOL FOR CREATIVES THAT DOESN’T USE CLOUD COMPUTING AND DEVELOPS NEW ANALYSIS METRICS

John Chmielowiec, M.S. Computer Science

COMBINING STATISTICAL ANALYSIS AND MACHINE LEARNING TO EXPLORE THE INTERPLAY BETWEEN AGING, LIFESTYLE CHOICES, CARDIOVASCULAR DISEASES, AND BRAIN STROKES

Anit Mathew, M.S. Data Science

EXAMINING DISEASE THROUGH MICROBIOME DATA ANALYSIS

Brett Van Tassel, M.S. Data Science

Summer 2023

EVALUATING HOW NHL PLAYER SHOT SELECTION IMPACTS EVEN-STRENGTH GOAL OUTPUT OVER THE COURSE OF A FULL SEASON

Elliott Barinberg, M.S. Data Science

Spring 2023

BUILDING A STATISTICAL LEARNING MODEL FOR EVALUATION OF NBA PLAYERS USING PLAYER TRACKING DATA

Matthew Byman, M.S. Data Science

BUILDING AN ML DRIVEN SYSTEM FOR REAL-TIME CODE-PERFORMANCE MONITORING

Mikhail Delyusto, M.S. Data Science

OPTIMIZING PRODUCT RECOMMENDATION DECISIONS USING SPATIAL ANALYSIS

Raul A. Hincapie, M.S. Data Science

PREDICTING AND ANALYZING STOCK MARKET BEHAVIOR USING MAGAZINE COVERS

Egor Isakson, M.S. Data Science

IDENTIFYING OUTLIER DATA POINTS IN NON-CLINICAL INVESTIGATIONAL NEW DRUG SUBMISSIONS

Cassandra O’Malley, M.S. Data Science

Fall 2022

CLIMATE CHANGE IMPACTS ON FOOD PRODUCTION: A BIBLIOMETRIC NETWORK ANALYSIS

Skylar Clawson, M.S. Data Science

EXPLORING VEHICLE SERVICE CONTRACT CANCELLATIONS

Josip Skunca, M.S. Data Science

Spring 2022

A TOOL FOR WHO WILL DROP OUT OF SCHOOL

Colette Joelle Barca, M.S. Data Science

COMPREHENSIVE ANALYSIS OF THE FUTURE PRICE OF NBA TOP SHOT MOMENTS

Miguel A. Esteban Diaz, M.S. Data Science

PREVENTING THE LOSS OF SKILLFUL TEACHERS: TEACHER TURNOVER PREDICTION USING MACHINE LEARNING TECHNIQUES

Nirusha Srishan, M.S. Data Science

FORECASTING AVERAGE SPEED OF CALL CENTER RESPONSES

Emmanuel Torres, M.S. Data Science

Fall 2021

A COMPREHENSIVE EVALUATION ON THE APPLICATIONS OF DATA AUGMENTATION, TRANSFER LEARNING AND IMAGE ENHANCEMENT IN DEVELOPING A ROBUST SPEECH EMOTION RECOGNITION SYSTEM

Kyle Philip Calabro, M.S. Data Science

Analyzing Artificial Intelligence’s Ability to Detect
Misinformation

Exploring Jane Addams Papers Project Documents Through Topic
Modeling and Multilabel Classification

Building a Collaborative Recommender System for Magic the
Gathering

Using Free-Text Clinical Notes to Improve Model Performance
in Healthcare

A Computational Analysis of the Thyroid Imaging and
Reporting Data System

An Empirical Comparative Analysis of Skyline Query
Algorithms for Incomplete Data

The Impact Of Natural Disasters On Border Crossings In The
US