This post was written as part of Udacity’s Data Scientist Nanodegree Program

Airbnb leaves the tricky business of deciding at what price to list a property mostly up to the host. While hosts are encouraged to set their own price based on the local market, there are a number of features in the onboarding process that use data to make this process smoother. In particular, there are algorithmically-derived suggestions for a base price, and for minimum and maximum prices if the host chooses to use Airbnb’s ‘Smart Pricing’ feature.

These suggestions are based on the forms the host has completed just prior, which described their property, set the number of bedrooms etc. These are mostly factors that cannot be easily changed - adding, say, an additional bathroom is no easy task! - and for all brick-and-mortar properties, this naturally includes the location of the property.

I wanted to investigate how well the location of the property correlated with the price of the listing, and if we could incorporate this exact location into the mechanics of pricing suggestions. For this I used the Airbnb listings data available on Kaggle for Boston and Seattle. Using just the location data in a vacuum would not necessarily be helpful, so I attempted to create a pricing suggestion model that incorporated the other data available at signup to give context to how strongly the location affects pricing. I then investigated how this model could be further improved.

The main issue here is how to quantify the location in a meaningful way. Both latitude and longitude are present in the data, but I thought it would be far more useful to transform this into a distance from a particular point of interest. I decided to use the distance from the coordinates for the cities themselves, according to Google. This gives a good approximation for the distance from the properties to downtown Boston/Seattle, and what I hoped would be a good predictor for price.

Can we create a pricing suggestion model based on factors known about the property at the point of onboarding?

The linear model I created used ten major factors:

  • Available amenities
  • Distance from downtown in miles
  • Neighbourhood
  • Number of bathrooms
  • Number of bedrooms
  • Number of beds
  • Number of people accommodated by the property
  • Type of bed available
  • Type of property
  • Type of room

I evaluated the performance of this model to find out if these factors were good at predicting the price of a listing. For Boston, I found there was a weak-to-moderate correlation between the predicted prices, and the actual prices. In the worst case, the actual price was $3,800 higher than the predicted price - though I have my suspicions that the $4,000 per night price tag for that property is not accurate, given that is more than the average monthly rent in Boston and that the property itself was not particularly notable. The biggest overstimation, on the other hand, was of under $300.

In Seattle the model performed a lot better, showing moderate-to-strong correlation. The worst overstimation was, like Boston, just under $300, but the worst underestimation was only by $840 or so. It’s clear the model has a problem with underestimating the price, perhaps due to the difficulties in capturing how well particular properties can be presented - two properties could look identical in this data, but one might be a large, immaculately-furnished house with a sea view, and the other might be a small house in a less-well regarded part of the neighbourhood.

Despite the difficulties in prediction, I feel this is a good start to the model.

For both Boston and Seattle, the number of bedrooms and the entirety of the property being rented out to the guest (as opposed to sharing with the host or other guests) were by far the biggest factors in predicting the price. In Boston, for example, renting the entire property was approximately 6.5 times more important for the price than the property having air conditioning.

Plot of actual vs predicted price

Boston:

Seattle:

Does location play a large role in setting the price?

The distance from downtown that I calculated for each property turns out to be a very strong factor in price also - the further from downtown, the lower the price, in general. In Boston, it has just under half the effect that renting the entire property has; in Seattle it’s about one-third.

In both cities, the distance from downtown was the largest single negative factor on price - i.e. that the further from downtown the property is, the lower the price. In Boston it was nearly three times more influential than the next largest negative factor (being located in the Dorchester neighbourhood). It was relatively less powerful in the Seattle model, where it was only slightly more influential in decreasing price than the property being an apartment.

While this model is definitely nowhere near production-ready - it predicts negative prices for some properties in Boston, after all - I believe it shows that the bones of a good pricing suggestion model are there in this data. The actual model in use for Smart Pricing etc. is almost certainly more complicated than the simple linear model I used, and uses more factors.

Relative influence of Miles From Downtown relative to other factors (higher = more influence)

Boston:

Seattle:

How can the model be extended?

More factors that could go into the prediction might include current or predicted supply/demand of properties, the presence of particular keywords in the host-provided descriptions of the property, and the inclusion of more points of interest from which to calculate distance.

For example, there is a concentration of properties in a spot approximately 1.5 to 2 miles southwest of downtown Boston. These are all in the vicinity of Fenway Park, probably the most famous ballpark in the United States, and this data was scraped in early September 2016, with the Boston Red Sox imminently playing three series of games in the MLB regular season and an excellent chance of a home playoff series in October.

Other points of interest might be local landmarks - the Space Needle is close to the chosen spot for downtown Seattle, but still over a mile away - or beaches/the sea - Alki Beach Park in Seattle is 4 miles from downtown.

Alki Beach Park raises the possibility of an interesting change to the distance model. While it is 4 miles from downtown as the crow flies, it is around a 6 mile drive, and even further if using public transpot. And right now the West Seattle Bridge is closed long term for repairs, adding another 6 miles for the detour. It may therefore be more prudent to use travel time between a property and points of interest. It would be possible to implement this via implementation of a mapping API such as Google’s.