For the past several posts, I have discussed a brief overview of the project, and wrote about each process from data wrangling to machine learning. There are a lot of rooms for improvement for this project starting from automating webscraping and tweaking a machine learning algorithm to including other interesting variables in consideration. However, I am very glad that I could show that we can garner somewhat decent clustering result using the color coordinates as features. In this post, I want to discuss future directions of this project and introduce some interesting ideas.
Now that we know each color theme can be represented as a 5-point spatial pattern in 3D space, we can use an unsupervised learning algorithm, specifically a clustering algorithm, to cluster a certain number of themes that have similar patterns into groups.
Before analyzing the color data, we should first know that the RGB color space is not a good representation of nonlinearity of color perception. A color space that is considered perceptually most uniform is CIELab (aka Lab) color space. In this space, each color is represented by three coordinates: L (brightness), a (red-greenness), and b(yellow-blueness). L ranges from 0 to 100. A higher L value means a color is brighter. The range of a and b depends on device, but normally it's [-128,128]. Positive values mean "warm" colors such as the more positive a (or b), the redder (or the more yellow) the color is.
Exploratory data analysis (EDA) is a process where you figure out general features of your data. It's a close and casual conversation between you and data, and a very fun process! You make figures of histograms, scatter plots or bar plots to see how your data looks like. This can give you a general idea about your data, help you discover interesting facts about it, and finally guide you to a right direction towards your goal. I always go back and forth between applying models/algorithms and EDA for these reasons.
Before we have a clean dataset, let's take a look at an example JSON response we scraped from the website. This is a JSON response from the first theme.
This is an introduction to a toy project that I worked in 2015 when I was applying for Insight Data Science Fellowship. This was my first data science project (still unfinished) using unsupervised learning for clustering popular color themes in Adobe Kuler. I will talk about important steps in the project in the following posts.