Hipster Modeling Venue
Using python to replace a manually curated, travel startup venue verification process by modeling “The Hipster”. Fun Experiences with Machine Learning
Once upon a time I worked in Seattle at a travel startup that helped their users plan an itinerary full of cool edgy venues like gastro pubs and modern-day speakeasies. The curation process was a simple selection done by two people by manually scanning Yelp and Google business profiles of and deciding if they were of good quality, reasonable prices, unique and interesting menus etc.
Targeted Data Science by Process Mapping
The first step was to speak with my builder (people picking venues) and chart out what they do, what their goals were and most of all, how they do it. The individual people and their distinct opinions had become the style of venues supported most by the company. There was plenty of data to work with as the pair had already labeled a full city of samples. With the process charted and an solid understanding of their workflow we set out to build a model that would not replace the builders but make them managers of the data stream to detail the classifiers accuracy and build more robust training data.
Natural Language Processing
The Data Science task was straight forward, build a binary classifier that could decide if a venue was good or bad. The venue descriptions, titles comments etc were cleansed and modeled with features that did a fair job of classifying good and bad venues. By going back into the data, we immediately found that good venues often had a certain style and energy about them. They were hipster places. By inspection it was clear what changes in feature engineer would improve the system dramatically. I needed a way to figure out how rare a word was in the human language.
Venues:
- Smith: A taxidermy turned gastropub
- Oddfellows Cafe & Bar: A cafe serving Caffe Vita espresso, fresh pressed juices, sandwiches, grain salads, and baked goods.
- Dromedary Urban Tiki Bar: Tiki-inspired cocktails, board games & inventive pub grub are the draw at this rustic-chic tavern.
Comments:
- “Pub meets chalet at this Phinney Ridge landmark. Score the six-person table directly in front of the lodge-appropriate hearth and get acquainted with the beer list”
- “Perfect place for hipster goods from Poler sleeping bags and fancy iPhone cases, you’ll find gifts for yourself or your friends.”
Between the venue names, descriptions, comments and more you can begin to see some trends common to hipster venues. Modeling was done in two ways, one was to build a grammar of hipster language and providing a score of 1–10 to represent it’s strength, the other way was using Microsoft’s NGram API. Within Microsoft NGram are features to display the rarity of a word, by including rare words and boosting their feature representation, the modeling was very successful with +95% accuracy.
It comes down to the fact that hipsters expressing their individuality will ultimate choose to use more obscure language. Identifying and Modeling for rare words is plenty to model “The Hipster”