Suppose we want to segment neighborhoods of two cities: Toronto and New York, in terms of venues. Also, we need to explore the average income and population density of boroughs of each city.
The necessary data sets were: Location data for Toronto and New York neighborhoods, Venues data, Population density and average income data for Toronto and New York. The data sets were imported from Wikipedia, IBM open data. Link of the data sets are available at the last of this post.
We’ll discuss the process in two separate part:
- Segmentation of the neighborhoods
- Exploring the Average income and population density
I’ll use Python and it’s various packages. So, let’s get started!
Segmentation of the neighborhoods
example code:
a = pd.read_csv('https://cocl.us/Geospatial_data')
gdf = pd.DataFrame(a)
html = wp.page("List of postal codes of Canada: M").html().encode("UTF-8") dfr = pd.read_html(html)[0] df = pd.DataFrame(dfr)
example code:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data: newyork_nigh = json.load(json_data)
ny_neigh = newyork_nigh['features']
# define the dataframe columns ny_nighcolumn = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] # instantiate the dataframe ny_neighborhoods = pd.DataFrame(columns=ny_nighcolumn)
for data in ny_neigh: borough = neighborhood_name = data['properties']['borough'] neighborhood_name = data['properties']['name'] neighborhood_latlon = data['geometry']['coordinates'] neighborhood_lat = neighborhood_latlon[1] neighborhood_lon = neighborhood_latlon[0] ny_neighborhoods = ny_neighborhoods.append({'Borough': borough, 'Neighborhood': neighborhood_name, 'Latitude': neighborhood_lat, 'Longitude': neighborhood_lon}, ignore_index=True)
example code:
CLIENT_ID = "Your Foursquare client ID" CLIENT_SECRET = "Your Foursquare Client Secret" VERSION = '20180605' print('My credentails:') print('CLIENT_ID: ' + CLIENT_ID) print('CLIENT_SECRET:' + CLIENT_SECRET)def getNearbyVenues(names, latitudes, longitudes, radius, LIMIT): venues_list=[] for name, lat, lng in zip(names, latitudes, longitudes): print(name) ....
......
# create the API request URL url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format( CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT) ...#some code here return(nearby_venues)
example code:
# set number of clusters t_kclusters = 5 toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1) # run k-means clustering kmeans_tor = KMeans(n_clusters=t_kclusters, random_state=0).fit(toronto_grouped_clustering) # check cluster labels generated for each row in the dataframe kmeans_tor.labels_[0:10]
Now, the labelled neighborhood data was plotted in a map using folium package.
example code:
#create map ny_map_clusters = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=10.25) # set color scheme for the clusters x = np.arange(ny_kclusters) ys = [i + x + (i*x)**2 for i in range(ny_kclusters)] colors_array = cm.rainbow(np.linspace(0, 1, len(ys))) rainbow = [colors.rgb2hex(i) for i in colors_array] # add markers to the map markers_colors = [] for lats, lons, poi, cluster in zip(newyork_merged['Latitude'], newyork_merged['Longitude'], newyork_merged['Neighborhood'], newyork_merged['Cluster Labels']): label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True) folium.CircleMarker( [lats, lons], radius=5, popup=label, color=rainbow[int(cluster)-1], fill=True, fill_color=rainbow[int(cluster)-1], fill_opacity=0.7).add_to(ny_map_clusters) ny_map_clusters.save('ny_map_cluster.html') #saving the map as image ny_map_clusters


We observe from the figure that Toronto has one big cluster (83% of the neighborhoods) and a smaller one. Other three clusters are insignificant compared to them. For New York, there are two big (45% and 41% of the neighborhoods) and one mid size clusters. Other two clusters are insignificant compared to them. So, Toronto seems to have more uniform neighborhood type. New York has much more varieties. So, segmentation is different.
Average Income and Population Density
This part is easier than the first part. Population density and average income data for all boroughs of New York and Toronto were imported from Wikipedia. Then, I converted them to data frame using pandas. After some cleaning and modification, the dataset were ready for some exploratory visualization.
example code:
html2 = wp.page("Demographics of Toronto neighbourhoods").html().encode("UTF-8") dft = pd.read_html(html2)[1] dft = pd.DataFrame(dft) dft.head()
html1 = wp.page("Boroughs of New York City").html().encode("UTF-8") dfny = pd.read_html(html1)[0] dfn = pd.DataFrame(dfny) dfn.drop(dfn.index[5:], inplace=True) dfn.head()
I have created a column chart showing Population Density of New York and Toronto in two separate chart. After that, I applied similar method for Average Income data.
example code:
plt.figure(figsize = (20,15)) plt.bar(dft_2['Borough'], dft_2['Population Density'], color = 'tomato') plt.title("Population Density by Borough in Toronto", fontsize = '28') plt.xlabel('Borough', fontsize = '18') plt.ylabel('Population Density', fontsize = '18') plt.xticks(fontsize='24', rotation ='30') plt.yticks(fontsize='22') plt.savefig('pdtor.jpg') plt.show()




From these charts, it is clear that in New York, Manhattan has by far the highest average income as well as population density than other boroughs. Other four are poorer.
For Toronto, population density is slightly higher in old city of Toronto and York than the other three. In terms of Average income, Old city of Toronto has highest average income. But, it is not very far from other four.
So, in terms of average income and population density, Toronto is much different from New York. New York wide gap between boroughs and everything seems centralized. On the other hand, Toronto has much more uniform distribution of population and income.
Data Sources:
- “List of postal codes of Canada: M” from Wikipedia.
- “Boroughs of New York City” from Wikipedia.
- “Demographics of Toronto Neighbourhoods” from Wikipedia
- Toronto location data by IBM
- New York Geo Location data by IBM
- Foursquare API
Greate pieces. Keep posting such kind of information on your site. Im really impressed by your site.
This web site certainly has all the info I wanted about this subject and didn’t know who to ask.
It’s nearly impossible to find educated people about this subject, however, you seem like you know what you’re talking about! Thanks
You are so interesting! I don’t think I’ve read through anything like this before. So wonderful to find someone with a few genuine thoughts on this subject. Really.. many thanks for starting this up. This website is something that is needed on the web, someone with a bit of originality!