Project Portfolio for Kara C. Hoover, PhD

About Me

I’m a biological anthropologist and data scientist who bridges academic research, policy analysis, and applied analytics. My work spans human biological variation and adaptation—from biomechanics and genetic epidemiology to olfactory evolution and sensory inequities—alongside data-driven policy research in international science strategy, global R&D trends, and social equity. I specialize in translating complex data into actionable intelligence through advanced analytics (NLP, machine learning, network analysis), interactive visualizations, and reproducible workflows in R, Python, and SQL. With experience leading technical teams across academic, government, and industry settings.

See also Github (projects), ORCID (publications), and Google Scholar (publications with impact metrics).

Applications

NSF Office of International Science Award Browser
Project Highlight: The Shiny application NSF-OISE-AwardBrowser was written in R and relies on a subset of NSF award data that is loaded into the app at launch. The subset includes awards from the Office of International Science and Engineering (OISE) from 2010-2020. The goal is to visualize the distribution of awards by country and region. The user can choose among several options: 1) selecting a specific year or ticking a box to see a time series animation across the years selected for the app (2010-2020); 2) choosing whether to see a linear model with a straight regression line through the data or a loess model which fits a curved line; 3) toggle the standard error off and on around the line; 4) choose among four color-blind friendly color schemes or four color schemes inspired by the director Wes Anderson

Indicators of Democracy Interactive Explorer
Project Highlight: The Shiny application Women Demographic Outcomes was written in R and relies on a subset of vdem data that loaded into the app at launch. The subset includes indicators for women’s empowerment and demographic outcomes, such as life expectancy and mortality rates. The goal is to visualize the relationship between women’s empowerment and demographic outcomes. The data demonstrate that empowering women results in better outcomes–e.g., longer life, lower mortality. The user can choose among several options: 1) selecting a specific year or ticking a box to see a time series animation across the years selected for the app (1996-2019); 2) choosing whether to see a linear model with a straight regression line through the data or a loess model which fits a curved line; 3) toggle the standard error off and on around the line; 4) choose among four color-blind friendly color schemes or four color schemes inspired by the director Wes Anderson (muted palettes).

Next Word Prediction Using Natural Language Programming
Project Highlight: The Shiny application Next Word was written in R relies on a language model created from a small media corpus to predict the next word in a sentence. The UX presents a box to enter a sentence that is dynamically reaction–you can add additional words to the original text and get new predictions. Users will see a graph of 10 possible words with the probability of their being next. If there is no word, an empty graph will display. Each set of results includes meta data on where the prediction was made. The model relies on ngrams ranging from 2-4 (word combinations of 2, 3, or 4) as found in the corpus. The output includes a summary of which ngram was used to predict and the total possibilities and matches found.

Calculator for Speed Time and Stopped Distance
Project Highlight: This Shiny application SpeedStopTimes was written in R and predicts car speed based on a stopping time entered by using a slider (20-120 mph). The output is a plot of the data that is used to make the prediction (the regression line can be shown or hidden using a checkbox).

Data Analytics and Visualization

Global R&D Leadership Forecasting and Policy Modelling
Project Highlight: an example of international research landscape analysis and benchmarking using the Dimensions COVID-19 sandbox in Google Big Query with a SQL and Python ETL pipeline and project rendering in R and Quarto.

Time Series Models Forecast Shift in Global R&D Leadership from West to Emerging Regions as Populations Age

Terrorism Endangers Ape Conservation
Project Highlight: primate endangerment from terrorist activities in Central Africa using geospatial analysis.

Terrorism In China is Concentrated in Uyghur Areas
Project Highlight:terrorist activities in China are shown to be concentrate in Uyghur ethnic areas through geospatial analysis.

Access to Health Care Workers Increases Life Expectancy
Project Highlight: health disparities from lack of access to care visualized via interactive plots.

Project Highlight: predictive modelling for the impact of aging populations on scientific productivity by global region.

Gendered Economic Specialization May be Harmful to Women’s Health in the Early Modern Ryukyu Islands
Project Highlight: variation in women’s and men’s health and resilience analyzed statistical analysis using regression by group and data visualization skills.

Two-Century Trend Analysis Finds that Wealth Falls Short in Safeguarding Democracy
Project Highlight: animated visualizations showing the the role of education in safeguarding democracy.

The Presence of Women in Governance is Positively Correlated with Women’s Empowerment across Regions
Project Highlight: times series animated plots and multiple-correlation analysis to examine the role of equity and quality in suffrage and access to education in fostering inclusive governance.

Variation in the Occipital Bone Can Be Used to Predict Activity Patterns to a Limited Degree
Project Highlight: classifying biomechanical variation due to occupation through data mining, data wrangling, data cleaning, and machine learning.

Human Olfaction Has an Ecological Element that Alters Perception of Food Smells
Project Highlight: place-based variation in odor identification using correspondence analysis and odds ratio modelling.

Fitness Tracking: Performance versus Regularity
Project Highlight: machine learning using random forest models to determine what factors best describe athletic performance.

Water-Releated Extreme Weather Events Have the Greatest Human Population Health and Economic Impacts in the USA from 1996 to 2011)
Project Highlight: data visualization of weather events causing the most deaths and injuries globally.