by Near Data Team

Shubha Shedthikere, Data Scientist
Haricharan Aragonda, Data Scientist
Tony Scott, Principal, Data Science
Santosh Hegde, Senior Manager, Data Engineering
Madhusudan Therani, Chief Technology Officer

March 12, 2018

The Physical World and Human Activity - Some Interesting Insights

Most of us have very little intuition as to how the physical world is organized around us as we go about our daily lives. Very little attention is paid to how quickly the physical world changes around us as we spend more of our lives online. At Near, as we focus on combining multiple data streams from different sources to understand the everyday life of a consumer in the physical world, we were curious to understand - what kind of places are around us? Is it just residences? Is it just shops? What kind of shops? How are these brands and shops distributed? With rampant urbanization globally, how does the world look around us in different countries?

Our quest in answering some of these questions begins with the Near Places repository - our in-house, manually curated geo database of all things in the physical world - shops, restaurants, stadiums and more. Near Places is organized in terms of place categories - types of physical assets we see in the real world - restaurants, bars, apparel stores, stadiums, railway stations etc. Further, within each category, there are multiple points of interest (POI) where people visit. Given these place categories and POIs, the following questions were of interest to us -

  • How are POIs distributed within a Place Category? We want to know if there are more QSRs (quick service restaurants) than say railway stations? Does this vary by country?
  • Given a certain spatial region (in our analysis a geohash), how many Place Categories does it have? How many POIs does it have? How does this vary by the size of the spatial region (i.e. for different geohashes - A geohash is a way of partitioning the physical world)? Which Place Categories are found in the largest number of geohashes? Do all of these distributions vary by country ?
  • Finally, which place categories have the highest human activity? Which geohashes are the most active overall and at different times of the day?

Our preliminary efforts at understanding some of these issues in a data driven way and associated observations are discussed below. In all the analysis below, relevant human mobility data has been curated for Dec 2017.

Shown below are three figures that show the spatial spread of Points of Interest in California (Fig 1), b) Human mobility activity in California (Fig 2) and both of them superposed (Fig 3).

Fig 1. Distribution of Places in California
Fig 2. Human Activity in California (at Geohash 7 - grids of size 150mx150m)
Fig 3. People Activity in Different Places

The heatmap in Fig 3 illustrates the level of activity in terms of mobile activity at these locations Figure 1 & 2 superposed.

Overall, what the aforementioned figures illustrate is that currently available mobility data activity via anonymous GPS feeds and app data feeds etc. are a good indicator into what people do at large. Correlated behaviors can be studied effectively. For understanding human mobility at different spatio-temporal scales, as a consumer activity analyst, marketer or geographer, this data can be fruitfully used for a various kinds of macro and micro analysis. As usual, the devil is in the details.

To understand things at a more deeper level we analyzed the data from multiple perspectives. Shown below in Fig. 4 illustrating the distribution of Points of Interest across different spatial resolutions - geohash6 (1.2kmx0.6km), geohash7(150mx150m) and geohash8(32mx19m)

Fig 4 - Distribution of POIs at different spatial resolutions

We see that at different spatial units - geohash 6,7,8 - the distribution of POIs in geohashes follows a power law. We see that approximately 60% of the geohashes (level7) which contain POIs, have only one POI. Similarly 75% of the geohashes at level8 which contain POIs, have only one POI. The power law behavior (see our earlier blog on this) occurs in most phenomena around us and is again exhibited here.

How does this spatial spread of POIs look globally in different countries and in different place categories?

Fig. 5 below shows the variation in number of POIs and the spread of POIs across geohashes across multiple countries. The size of the bubble represents the number of geohashes in which the POIs of the particular category is present as a percentage of total number of geohashes that contain POIs. This gives us a sense of which place categories are more spread out in space versus those which are densely packed together in a given spatial unit. Also, the power law behavior now occurs not only spatially but also by place categories across all countries. Some Place categories dominate our landscape which sort of makes sense - services that support more common human activities will be more common.

Fig 5 Distribution of POIs by Place Categories

Fig 6 and 7 below summarize human activity at different place categories. The charts give the percentage of users seen at different place categories, for the top 10 place categories. In the same plot, the number of POIs in the place category is also plotted as a percentage of total number of POIs. These plots confirm some commonly seen behavior. The figure shows that though the number of POIs in the Grocery category is lower, the footfall is higher, which reflects our intuition that these place categories have higher periodicity of visits and we would expect to see more users on a daily basis compared to Apparel stores. The charts at different radii indicate that the rank ordering of place categories changes based on what spatial resolutions we work with. Furthermore, it is important to interpret the results in a valid manner before any specific next steps are taken.

Fig 6. Uniques and POIs (at 250m)
Fig 7. Uniques and POIs (at 100m)
Fig 8. Uniques and POIs (at 50m)

Concluding Remarks

The above analyses illustrate the different kinds of insights one may gather from understanding places/POI distributions and analysing the corresponding human mobility data. However, data scientists and analysts need to handle this data with care. Interpretations and conclusions can vary widely based on the chosen resolutions and scaling of data. The recent offerings on offline marketing attribution and other spatial analytics really need to be validated in a deeper context. It took two decades for online attribution to be established as a well-understood measure. GIven the complexity of handling spatial human mobility data, one should really pay attention to the methodology of offline attribution.

There are a variety of insights one can imagine as-is and by combining with other public and private data sets - such as real-estate data, traffic data and more. Our ongoing work analyses such data across and within countries - for e.g. what is common to New Yorkers and folks on the west coast? What do folks in Perth (Australia) do differently from folks in Sydney? Keep checking in on this.

Understanding human mobility has many implications across multiple business scenarios. It guides our decisions that influence what we build next at a particular location. It has implications on understanding the whys and wherefores behind consumer behavior. It also guides development of ways to measure our marketing efforts such as offline attribution.

Our current work at Near is pushing the envelope in developing techniques and tools for performing these analyses at scale handling different kinds of data sets. Incumbent approaches need to utilize new data sets and technology. If you have a specific scenario requiring a deeper exploration, please reach out to us at allspark@near.co.

Contact Us to use data for superior decision-making.