New York City Restaurants Data Visualization
This final project examined the restaurants, the cuisines, and the food safety inspection grades across New York City. New York City has five boroughs -- Manhattan, Queens, Brooklyn, the Bronx, and Staten Island. As one of the most diverse and the most visited cities in the world, New York City has over 24,000 restaurants to serve its 8.5 million residents and 60.5 million travelers from all around the world per year. Among the five boroughs, Manhattan owns the most tourist attractions and attracts the largest population of tourists. Queens is the most ethnically diverse urban area in the world. Brooklyn has the largest population and also occupies the biggest area in terms of size. Considering the ethical diversity and dynamic population in the city, this report aims to analyze some interesting facts of the restaurants in New York City, such as the neighborhood with the most restaurants, the cuisine that owns the most restaurants, the area that has the best grade of food safety inspection, etc.
The dataset used in this study was acquired from OpenData.gov. It is indicated that the online dataset is supported by Department of Health and Mental Hygiene and is updated on a daily basis. The dataset applied in this study was retrieved on November 16, 2017. Any later update was not applied in the analysis.
The dataset was cleaned up before it was imported into Tableau. Some dimensions unnecessary was removed, including CAMIS (a unique identifier for the restaurant), phone number, inspection date, violation code, grade record date, and inspection type. Dimensions that were kept include business name, street address, borough, zip code, cuisine, and inspection grade. Since New York City is formed with lots of neighborhoods, it would be ideal if the neighborhood geoinformation could be considered in this project. A table that matches borough, neighborhood, and zip code information was discovered online and imported into Excel. Then the zip code - neighborhood mapping information was copied into the data file, and Excel Vlookup function was used to append it to the original dataset based on zip code. Then I had a dataset with three types of geoinformation to create my visualizations.
Another thing did was to categorize the cuisines. In the original dataset, there were over 80 different cuisines, and many cuisines only have several restaurant options. In order to better present the restaurants information with more common cuisines, I sorted the restaurants by cuisine in Excel and only kept the first 25 cuisines, and grouped the rest into a category called “Other” (see Fig. 1 for all grouped cuisines). In this way, I would be able to fit an appropriate amount of cuisines in my visualizations.
In this project, I decided to use interviews to conduct user research before the visualization creation to help me figure out what content or stories would interest people. Since restaurant selection is a common topic that people talk often, it was easy for me to set up an informal interview with three participants, two were colleagues at my practicum site and one was a schoolmate at Pratt. All participants consider themselves as a ‘foodie’ in some way, and they expressed strong interests on knowing more facts about NYC restaurants based on the visualizations I would create. Each interview lasted around 15 minutes. An interview protocol was prepared and carried out during the activity. Important notes were taken by pen and paper, and were later analyzed into insights and findings on the visualization creation. I started the interview by understanding the general preferences about dining out. For example, I asked some typical occasions or reasons they decide to dine out, how they usually select restaurants, and some major aspects they evaluate before they settle with one or several choices. Then I provided them with a list of values I have in my dataset and discussed with them that given these data, what they would expect to see. The first participant mentioned that she did not care much about the inspection grade or a specific cuisine when she selected a restaurant but rather some restaurants she had never tried. The second participant pays a lot of attention to the restaurant cleanliness when she selects a restaurant. She would like to know which areas have the cleanest restaurants, and she would prefer to select a restaurant from these areas. The third participant was mostly interested in cuisines of restaurants. For instance, he was curious about which area has the most Japanese restaurants, and by knowing that he would consider the Japanese restaurants in that area to be the most authentic ones. It was helpful to learn that when being asked if they would look for restaurants by borough, zip code, or neighborhood, they all agreed that neighborhood would be the most useful and straightforward geoinformation to them. After the interviews, I organized the notes into insights for my design of the visualizations. Based on the insights, I hand sketched some rough ideas of how I would like to visualize these information according to user interests I learned from the UX research.
Once the dataset and the interview insights were ready to use, I imported the dataset into Tableau Public. Having the sketches at hand, it was quite easy to create several visualizations with different views. After analyzing the implication of each view thoroughly, four visualizations with different stories behind them were further designed and then configured into a dashboard for an overall view of the project.
An Overall View through the Map
The visualization above (Fig. 2) shows restaurants distributed in the five boroughs on the map. A hue range of the red color was selected to indicate the number of restaurants in each zip code area. The darker an area is, there are more restaurants in the area, vice versa. From the map view, we can clearly see that Manhattan owns the most dark red areas, which shows that there are more restaurants clustered in this borough. By hovering on anywhere on the map, the visualization shows the details of the zip code area, including borough, neighborhood, zip code, and number of restaurants (see Fig. 3 below). Filters are provided for users to view details of a chosen cuisine. For example, if the user selects American food from the cuisine list, he will be able to see the distribution of American restaurants in the city. If he wants, he can also view the restaurants by different food safety inspection grade levels. By designing the map visualization, I aimed to present an overall view of the restaurant situation in NYC straightforwardly.
Restaurant Details in Different Neighborhoods
The second visualization (see Fig. 4) was a bar chart. Each bar represents a neighborhood, and all neighborhoods are sorted by borough. Values on the y-axis represent the number of restaurants. Looking at the chart, users can get an immediate idea of which neighborhood and borough may have the most or the least number of restaurants. When hovering over a bar, a pop-up window will display the detailed information of the neighborhood, including the most popular cuisine which was calculated by the number of restaurants. From the user research, one participant mentioned that she would like to know what’s the most popular cuisine in a certain neighborhood. Therefore, this metric was added in the bar chart to fit the user’s need.
Percentage of Grade A Restaurants in Different Neighborhoods
Food safety inspection level is considered as an important metric to some people. In the user research, one participant claimed that she would want to know what areas have more clean restaurants. A percentage rate of the number of grade A restaurants in a neighborhood was calculated and imported into Tableau to create the circle chart below.
In this visualization, the color of circle indicates the percentage of grade A restaurants in one neighborhood, and the size of circle indicates the number of restaurants in that neighborhood. We can see that there are most restaurants in Chelsea and Clinton, which shows the same result from the other visualizations. The Lower Manhattan circle is in the darkest color, indicating its percentage of grade A restaurants is the highest in the Manhattan borough. Inwood and Washington Heights has the lowest percentage of grade A restaurants, which means it could be less considered for people who take restaurant cleanliness into account. By using the filers, users will also see details of other boroughs or the whole city. Since there are over 40 neighborhoods spread across the city, I think it is easier to present the visualization by borough in the initial view, but users will always have an option to change the geographical scope they would like to see.
The Most Popular and the Cleanest Cuisine
The bubble chart (see Fig. 6) is an exploration created to represent the popularity and cleanliness level simultaneously of each cuisine. In this visualization, the percentage of grade A restaurants in each cuisine is placed on the x-axis, and the number of restaurants is placed on the y-axis. Therefore, a cuisine style that is closer to the top right corner in the chart tends to be popular and clean at the same time. Looking at the x-axis and the y-axis separately, we can compare the cleanliness and the popularity of different cuisines separately. It is interesting to see that although donut doesn’t have a large number of shops, its grade A percentage is very high, which means over 90% of the donuts shop are clean, followed by sandwich and cafe/coffee/tea. Among all cuisines, Indian restaurant has the lowest grade A percentage rate, followed by Asian fusion and Thai.
Moving forward with the project, there are several things I would like to revise with more time allowed. These are problems uncovered either by myself or through user feedback after the visualizations were created.
First, on the current map view, each interactive area represents one zip code area. However, according to the user research, all three participants ranked neighborhood as the most preferred geoinformation compared to zip code or borough. I thought about grouping the zip code areas by neighborhood, then I realized this might not be feasible, because a part of a zip code area may be divided into two different neighborhoods. Another solution to this might be having the longitude and latitude of the restaurants available so that the data could be transferred into a map view, with neighborhood being the individual metric to be reviewed.
Secondly, there is inconsistency between the map view and the circle chart in the current visualizations. In the map view, the color range indicates the number of restaurants, while in the circle chart it indicates the percentage of grade A restaurants in a neighborhood (see Fig. 7).
Although it is labeled in each visualization what the color stands for, it might still bring confusion to users from my point of view. I tried to reverse percentage and number of restaurants in the circle chart, and the visualization is shown in figure 8. This is because the percentage range is very small so that the size differences of the circles are very hard to distinguish. In the future, I would like to explore if there would be a better solution to solve this problem.
Lastly, it would be interesting if I have datasets for past years to compare the change on number of restaurants in a neighborhood or for a cuisine. For example, I am particularly interested in seeing if any cuisine is becoming trendy or less popular in recent years, or if there is any neighborhood that has experienced a big increase or decrease in numbers of restaurants and the possible reasons behind that. With all these ideas to be implemented in the future direction, I believe this project would be telling a more comprehensive and compelling visual story about restaurants in New York City.