London Fashion Week 2018 Visualization
Holition, London, UK
Summer 2018


Holition is a digital creative studio specializing in emerging technologies, such as augmented reality (AR), interactive data visualization, CGI, projection, and video. It partners with cutting-edge brands to craft interactive experiences that better engage the consumer and more authentically reflect brand core values. This past summer, I completed a 10-week data science internship at Holition’s London office as part of my Brown University Master’s in Data Science Capstone. I was involved in two core projects at Holition: the London Fashion Week S/S 2018 visualization and an internal Holition digital curation tool — a mechanism for pooling and analyzing information from online sources to reveal emerging trends. Note, shown here is my final report for the London Fashion Week component only. 

The British Fashion Council commissioned Holition to design the art installation. Collaborating partners included GoogleConde Nasté, and Pulsar. I worked with a select team of interns at Holition to create the algorithmic tree visualization, which showed the daily growth in mentions and impressions, during the past three London Fashion Weeks, concerning 10 important social topics pertaining to the fashion industry (e.g. sustainability and diversity). As the primary data scientist, I performed initial brand identity research, helped craft a compelling message that aligned with the core principles of British Fashion Council, identified and evaluated relevant data streams, analyzed data to reveal insights, and helped to program the final Fashion Week visualization. The art installation was shown during the full week of London Fashion Week S/S 2018 at 180 The Strand.

Link to Capstone report
Installation video


Painting Classification Using CNNs
Brown University, Providence, RI
Fall 2017


Within recent years, museums and online art collectives around the world have started to digitize their art collections. A proportion of these high-resolution images have been made available in the public domain, while others remain private. The influx of images to the online arena has provided an opportunity for scientists and researchers alike to analyze artwork for possible correlations. In the past, neural networks have been used for pattern recognition and to produce pattern transfer (i.e. Google DeepDream). In addition, several studies have been conducted regarding painting image classification (Northwestern and Stanford).

In this project I worked with two of my peers, Sean Miller and William Jordan. We used Convolution Nueral Networks (CNNs) to classify paintings based on various attributes (painter and style). We formed our database with over 15,000 web-scraped images, with a total of 12 total styles represented and 28 artists. We obtained a validation accuracy of 40% for the final style classfication CNN. For the artist classification CNN we obtained an accuracy of 53%. If running time were not an issue, we would likely be able to greatly improve upon these accuracies. In the future, we would like to explore the data through an interactive visualization such as the Embedding Project by TensorFlow. This could be helpful for museums, art historians, and art enthusiasts seeking to better understand artistic correlations between paintings.

For this project we conducted research under the guidance and mentorship of Dr. Serre from Cognitive, Linguistic, and Psychological Sciences (CLPS) at Brown University.

The project is documented on my Medium blog:

Data acquisition and storage
Status update
Final ML
Interactions and summary


Understanding Racial Bias in Police Killings
Brown University, Providence, RI Spring 2018


This was my final project in Data 2020: Probability, Statistics, and Machine Learning: Advanced Methods. The goal was to develop a better understanding of the factors that affect racial bias in police shootings in the United States. I was provided two data sets. The first dataset, from fivethirtyeight, contains police killings in the U.S. during the first half of 2015 (up until June). Some of the variables included here are raceethnicity, month, city, and age. The second dataset offers US census information from the American Community Survey in 2015. This dataset contains CensusTract information, with variables such as Poverty, Income, TotalPop, and Unemployment.

In this report, I describe my approach to develop an accurate model that can be used to better understand the underlying factors that affect racial bias in police killings. The report includes initial data preprocessing and aggregation, exploratory analysis, modeling building, model selection, and model interpretation. Finally, I summarize my results and provide an analysis of the larger ethical implications surrounding the research.

Link to report


Predicting Income Levels in New York City
Brown University, Providence, RI
Spring 2018


The goal of this project was to use New York City Census Data (2015) to create two sensible models for predicting median household income and income per capita in New York City and the surrounding area. This was a collaborative team project involving two of my peers, Yiwen Shen and Zhiwei Zhang.

We incorporated two analytical approaches, Stepwise Selection and Shrinkage Methods, to select a subset of predictors. Using the variables selected, we performed multiple linear regression to model median household income and income per capita as a linear combination of the potential predictors. Multiple linear regression is a useful statistical model for predicting a response on the basis of multiple redactor variables. Starting from that baseline model, we added interaction terms and performed transformations on variables based on exploratory analysis and diagnosis from the model. Finally, a interpretation for each model was provided.

Link to report
Link to project website