Finding Optimal Business to start in UK
Divyansh Sharma
July 23, 2021
1. Introduction
1.1 Background
The report has been prepared as part of the IBM Applied Data Science final submission capstone project Success criteria for the project as stated in the course requirements consist of using data to explore a geographic location or city with the intention of solving problems like finding trending hotspots, happening venues etc
1.2 Business Problem
Accordingly in this project we will analyze quality of for London detail. Based on the findings we will then come up with a series of recommendations to which business to start.
1.3 Intrest
Obviously, people who are thinking of starting a business in London/UK would be intrested to know which businesses are most common and in which areas. Also wheather it is good time to invest in Commercial property or not.
2. Data acquisition and cleaning
2.1 Data sources
The data we will be using will be similar to we used in toronto cluster analyzing. It will contain longitudes,latitudes , postcode, neighborhood and brough. We wil be using foursquare api to get neighborhood details. Then we would explore the neighborhoods and get nearby venues. Then we would divide them into clusters(by finding optimal k in kmeans using elbow method) and check which are the popular businesses in respective clusters.
To get postal codes: https://en.wikipedia.org/wiki/List_of_postcode_areas_in_the_United_Kingdom then it was used to get latitudes and longititudes for visualization on map
UK commercial property data: https://www.investing.com/equities/uk-comm-prop-trust-historical-data
2.2 Data Cleaning
Data downloaded or scraped from these sources didn’t connect well with the latitudes and longitudes generated by the python libraries. So we had to drop some rows of data in order to preserve large and correct remaining data.
There are some locations showing in US and other places which we have to remove.
After removing unwanted coordinates.(We used Folium for map visualization)
3. Analysis and Visualization
Some analysis and visualization was done on the UK commercial historical data to find mean price from 2015 to 2021. (which came out to be 82 and the current price is in range of 70’s , so it might be a good time to invest) Visualization of price trend was also done.(from which we inferred that pandemic affected the prices to a great extend but it is catching up quite fast )
4. Methodology
In first step we have collected the required data: prices historical data, location and postal codes of UK. Identified weather it is right time to invest in UK commercial property. We have also identified nearby venues.
Second step in our analysis will be calculation and exploration of 'business categories' across different areas of areas.
In third and final step we will focus on examining clusted which we have created and check which businesses are popular and which has less competetion.
some basic requirements established in discussion with stakeholders: We will get them top 10 popular venues and they can judge on the basis of that if they want to go with competetion or try different route.
We will present map of all such locations but also create clusters (using k-means clustering) of those locations to examine clusters / neighborhoods / addresses which should be a starting point for final decision that should be taken by stakeholder for optimal business to start.
Foursquare
Now that we have our locations, let's use Foursquare API to get info on venues in each postal code/neighborhood.
We used the same get categories function which we (Coursera Students) used for getting categories of Toronto.
Now we have to analyze each location on basis of the postal code.but I analyzed on the basis of latitude(which doesn’t matter).
Then used onhot encoding and grouped by latitude and took its mean.
For understanding what I did completely check out my juputer notebook(available on github).
getting top 10 venues for each neighborhood
Now we are going to start clustering and we need to find optimal k for k means so we use elbow method.(No clear k was determind by this so I used k=5 as used in Toronto exercise)
Divided the data in 5 clusters and assigned cluster to each data point.
Visualizing the clusters using Folium.
Then we analysed each cluster to know which businesses are popular and which business have less competition.
5. Results and Discussions
Our analysis shows that this might be good time to invest in london property as prices are low and they are catching up fast. Although there is a great number of pubs,coffee shops and clothing store in London they are most common and can succeed. If you want to avoid competeion you can open bakery, home service , park(if possible) , pharmacy, furniture store, pizza or sandwitch place.
6. Conclusion
Purpose of this project was to identify businesses close to london with high chance of success. in order to aid stakeholders in narrowing down the search and chances of sucess with right time of investing. By analysing and visualizing UK comercial property data and from Foursquare data we have first identified neighborhoods with postal code and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby venues.
Clustering of those locations was then performed in order to know the clusters of interest and clusters were examined in order to find most common venues and know about competetion.
Final decission on optimal business will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.



Comments
Post a Comment