Viewer's Choice

Selfiecity is a project that aims to investigate selfies in five cities across the world (Bangkok, Berlin, Moscow, New York, and Sao Paulo) using a mix of theoretic, artistic and quantitative methods. It contains some rich media visualizations called imageplots to showcase interesting patterns with hundreds of photos. The selfexploratory section is another interactive visualization that lets the users play around and navigate the whole dataset. Finally, the creators present their findings about the demographics of the people taking the selfies, their poses and expressions.

Data

The project is based on a unique dataset compiled by analysing tens of thousands of images from each city, both through automatic image analysis and human judgement. The image below shows the overview of the data collection process used. To begin with, the selfiecity creators randomly selected 120,000 photos (20,000-30,000 photos per city) from a total of 656'000 images we collected on Instagram. 2-4 Amazon’s Mechanical Turk workers tagged each photo as a single selfie or not. Out of these, 1000 ‘selfies’ were selected for each city and given to three master workers (more skilled) from Mechanical Turk who again verified that a photo shows a single selfie, and also guessed the age and gender of the person. On the resulting set of selfie images, they ran automatic face analysis, supplying us with algorithmic estimations of eye, nose and mouth positions, the degrees of different emotional expressions, etc. As the final step, one or two members of the project team examined all these photos manually. While most photos were tagged correctly, the team found some mistakes. To keep the data size the same (to make visualizations comparable), so the final set contains 640 selfie photos for every city.

Fig. 1 - Data Collection Process

Analysis

The first thing we see on open the webpage is the introduction of the project next to a video of montages of selfies from the dataset according to each city. The selfies are identically aligned with respect to the eye position and sorted by the head tilt angle. While this is a great idea, I though the pace of the video was too faset. I would have slowed down the video so that the each selfie gets about 2-3 seconds before its changed. This way I believe it wouldn't be as distracting as it is now.

Fig. 2 - Project overview and montage video

As we scroll through the webpage, the first visualization is about the poses. The selfies are arranged by city in a grid. Within the grid, the photos are arranged horizontally by the head tilt and vertically according to the person looking up or looking down. This arrangement of selfies is not immediately evident but becomes clearer once the images are cropped to show just the faces. The visualization also lets the user see cropped and rotated photos which again helps highlight the head tilt as you switch between the cropped and rotated and just cropped modes. The images in the two edited modes are black and white which also helps remove the distracting elements and lets the user focus on the tilt. The visualization fails in the sense that it is not evident as to the purpose of it. It does not show the whole dataset but just a subset of it that fits the grid.

Fig. 3 - Imageplot: Poses by city

Moving on, we next see the gender and age profiles per city. Each city is shown with a histogram created using the selfies themselves with the age shown along the x-axis and the genders, Male and Female, shown above and below the axis respectively. At first glance the viewers can get an estimate about the difference between the number of selfies between the genders. The median age is shown for both genders along with the percentage of the total selfies for each gender and it is easy to get an idea about the overall distribution according to age. Hovering on the plot for any city reveals the age labels for the x-axis as well as the gender symbols next to the percentage giving a clearer picture of the age distribution. Hovering on any image in the actual plot(histogram) shows an enlarged view of the selfie itself adding an interesting element to the visualization - the viewer can look at the actual selfie and judge the age/gender of the person themselves.

Fig. 4 - Imageplot: Gender and age profiles by city(a)
Fig. 5 - Imageplot: Gender and age profiles by city(b)
Fig. 6 - Imageplot: Details shown on hover
Fig. 7 - Imageplot: Selfie shown on hover

The next visualization is similar to the previous one but shows the smile distribution instead of age for both genders. As with the previous visualization, the first glance tells us the overall distribution for each city and hovering reveals more details. The x-axis here goes from a sad smiley to happy smiley which is straightforward and conveys a lot with just a tiny image.

Fig. 8 - Imageplot: Gender and smile distribution by city

For both the above visualizations, the plots for the cities are right next to each other, making it easy to compare across the cities. Throughout both the visualizations, there are these small circles that, on hover, tell us the trends observed in the plots. I thought creating the plots using the selfies themselves was a great representation as it gives the user access to the actual data (the selfie itself) without obfuscating the distribution of data. It also adds a fun element to the visualization.

Fig. 9 - Popup showing observed trend(a)
Fig. 9 - Popup showing observed trend(b)

Scrolling on, we get to the selfexploratory section. This lets us navigate through the whole dataset and experiment with the various filters given – demographics (city, age, gender), pose, features like eyes closed or open etc, and mood. The image below us the view on launching this section – all the filters on top with dataset of selfies below. As we can see, the filters themselves show the distribution of the data according to the filter field. Hovering over the filters, gives more details like the numerical value for the tilt angle for the tilt filter etc. For filters like features or age that are binary, selecting or deselecting is done by clicking whereas for filters that show the distribution, you can select a range by clicking at any point on the plot and dragging your pointer till the point you want. The whole selection of filters is intuitive and easy to use. Applying any filter is immediately reflected in the set of images shown and we can also see the exact number of selfies that match our criteria on the left just above the images of the selfies. Something to note is that, on selecting any filter, not only is the selected filter highlighted in blue, but the rest of the filters are shown with a green/teal overlay highlighting the change in the distribution due to the filters selected. I though this feature was really cool as it allows us to compare the distribution for the filtered dataset with that of the total. The reset button is also a handy feature that lets us reset the filters to default.

Fig. 10 - Selfexploratory: with default filters
Fig. 11 - Selfexploratory: with one city selected
Fig. 12 - Selfexploratory: with multiple filters

Conclusion

The project is an intersting take on society's seeming obsession with taking selfies. I think the dataset used is too small to result in any conclusive findings. Only 640 images from a city is too less. Also, the age and genders are estimates and not actual data provided by the people in the photographs so, the data is not a 100% accurate. Keeping all these points aside, I think it is a great initiative and can be expanded to derive various conclusions about the population of the cities in general.