Reflection on Tech Policy Workshop at the Center for Applied Data Ethics at USF

As a data scientist working in industry, I frequently witness the impact a machine learning application can make. This impact often has a cascade of downstream effects which are inconceivable to a data scientist without enough domain knowledge. Nevertheless, under the widespread motto, "move fast and break things," in tech industry, ML practitioners tend to … Continue reading Reflection on Tech Policy Workshop at the Center for Applied Data Ethics at USF

NIPS 2017 symposium and workshop: interpretable and Bayesian machine learning

Last week, I attended NIPS (Neural Information Processing Systems) 2017 conference in Long Beach, CA. This was my first time attending NIPS. What is NIPS? NIPS is one of the largest machine learning (ML) / artificial intelligence (AI) conferences in the world. NIPS conference consists of three programs: tutorials, main event (which includes symposium) and … Continue reading NIPS 2017 symposium and workshop: interpretable and Bayesian machine learning

Future directions for the Kuler project

For the past several posts, I have discussed a brief overview of the project, and wrote about each process from data wrangling to machine learning. There are a lot of rooms for improvement for this project starting from automating webscraping and tweaking a machine learning algorithm to including other interesting variables in consideration. However, I am very glad that I could show that we can garner somewhat decent clustering result using the color coordinates as features. In this post, I want to discuss future directions of this project and introduce some interesting ideas.

Color perception and color-space conversion

Before analyzing the color data, we should first know that the RGB color space is not a good representation of nonlinearity of color perception. A color space that is considered perceptually most uniform is CIELab (aka Lab) color space. In this space, each color is represented by three coordinates: L (brightness), a (red-greenness), and b(yellow-blueness). L ranges from 0 to 100. A higher L value means a color is brighter. The range of a and b depends on device, but normally it's [-128,128]. Positive values mean "warm" colors such as the more positive a (or b), the redder (or the more yellow) the color is.

Analyzing user activity of Adobe Kuler

Exploratory data analysis (EDA) is a process where you figure out general features of your data. It's a close and casual conversation between you and data, and a very fun process! You make figures of histograms, scatter plots or bar plots to see how your data looks like. This can give you a general idea about your data, help you discover interesting facts about it, and finally guide you to a right direction towards your goal. I always go back and forth between applying models/algorithms and EDA for these reasons.

Webscraping: XML and JSON

When you click a theme in the Kuler website, it shows the theme's page where you can see an enlarged image of the theme, and other information. On the right side, you can see "Action" and "Info" frames. The latter has following information:

Author of the theme ("Created By"): nominal
Date created ("Created"): ordinal
Number of views ("Viewed"): quantitative
Rating: quantitative (shown in number of stars)
Number of likes ("Appreciated By"): quantitative
Tags: nominal