Four-phase UB CSE 587 project: data cleaning, ML modeling, PySpark scale-out, and a Streamlit data product on public health survey data.
End-to-end data science pipeline from raw BRFSS health records through cleaned datasets, scikit-learn models, Spark notebooks, and an interactive Streamlit app for predictions and exploration.