Python • Spark • Streamlit • ML

Health Record Insights — Data Science Pipeline

Four-phase UB CSE 587 project: data cleaning, ML modeling, PySpark scale-out, and a Streamlit data product on public health survey data.

Project Overview

End-to-end data science pipeline from raw BRFSS health records through cleaned datasets, scikit-learn models, Spark notebooks, and an interactive Streamlit app for predictions and exploration.