Women Communicate Better – Yelp

May 31, 2017dcramer

The Company

Yelp! is a platform for user-published reviewers of local businesses. Yelp!’s human users submit feedback to the website on local businesses with two types of information: a 5-star rating and textual reviews.

The Profile

Yelp! is an active user of prize competitions to better understand and use of the human-inputted data and crowdsource novel approaches to improve its service for users. The company is currently promoting the 9th iteration of its dataset challenge, with a total of ten prizes amounting to a modest $5,000.

The subset of data includes 11 cities spanning 4 countries (Germany, UK, US, Canada), which means that users have access to over 4M reviews for ~150K businesses for analysis. For the ninth iteration, Yelp! Is also including 200K user-uploaded photos in the dataset for analysis.

Contrary to the Netflix competition, Yelp! leaves the challenge question and the success metric open-ended to applicants to explore what interests them in the dataset; judging the submissions on technical rigor, the relevance of the results and novelty.

Previous Winners:

One interesting proposal from a winner in the first dataset challenge at University of California-Berkeley used Natural Language Processing (NLP) to extract various subtopics of the individual’s text review. The team used unsupervised machine learning to classify the categories of subtopics that diners mentioned in text reviews; they included areas of interest such as, restaurant service, decor, food quality. With the subcategories developed by the machine algorithm, the team then could predict for a given review what each subtopic’s rating would be.

This form of machine learning and natural language processing is helpful to (1) evaluate the accuracy of the star rating given by a user and (2) help small business owners improve their service. The subtopics approach attempts to not overweight one aspect of a user’s experience into the overall score.

Potential for Improvement

Yelp! could improve the value of this subtopic analysis with human input. By enabling users to assign tags (“ambiance”, “decor”) to a review or, or even better, a 5-star score for each subtopic (as TripAdvisor currently does), Yelp! could enrich its dataset for users and small businesses. NLP could help suggest tags to users (as with Stack Overflow) from the textual analysis of the provided review or users could input their own.
One could see this area of subtopics moving in a different direction, as currently, Yelp! reviews are conglomerated for one business that provides multiple services (i.e. a hotel that has a spa and restaurant). Similarly, Yelp! reviews for a restaurant that serves brunch and dinner are combined into a single ratings score. This prohibits users from understanding the value the business provides to them for one particular service. By using NLP and human input on subtopics (e.g. tagging “spa”, “facial”, “brunch”), the user could have a more granular view of the quality of business offering for what the user is trying to achieve. The user could then assess the value of the business based on the service most relevant to their needs, rather than the business as an undifferentiated whole.

https://engineeringblog.yelp.com/2017/01/dataset-round-7-winners-and-announcing-round-9.html

https://www.yelp.com/dataset_challenge

https://www.yelp.com/html/pdf/YelpDatasetChallengeWinner_ImprovingRestaurants.pdf

https://www.ischool.berkeley.edu/news/2013/students-data-analysis-uncovers-hidden-trends-yelp-reviews

Women Communicate Better – Pitch for Classy

May 17, 2017dcramer

The Problem

Everyone’s familiar with class-action lawsuits where a bunch of families sue a pharmaceutical company. However, the same problem often happens for investors as well, leading to a securities class-action lawsuit. Essentially, if a company neglects its fiduciary responsibility to keep investors informed about negative changes in the company, and those eventually impact the company’s stock price when the news becomes public, investors are entitled to sue a company.

Right now, the existing solution is throwing bodies at the problem – plaintiff firms keep tons of lawyers on staff whose job is to read the news and track stocks, and hopefully identify a situation where a securities class-action lawsuit could be filed. This is incredibly manual and time-intensive, and is an impossible process to ensure success – a person is always going to miss some opportunities.

The Solution

Instead of relying on plaintiff lawyers and industry blogs, like Lyle Roger’s The 10b-5 Daily, to just manually scan and analyze stock price data, we believe there is an opportunity to merge human understanding and machine learning to identify and even predict potential security class action suits. To solve this problem we propose the creation of Classy, a service that predicts potential lawsuits for plaintiff lawyers by combining machine algorithms and human intuition. Currently many of the class action suits brought by firms end up being frivolous and yield limited to no profit for plaintiffs. Classy will help plaintiff firms to mediate this error and increase their efficiency in pursuing the most fruitful cases. Additionally Classy will help plaintiff firms better prioritize their staffing structure, so that there are more lawyers using their time to execute suits rather than searching for potential signs of fraud.

The Design

Our product would combine external sensors with machine and human algorithms to predict the likelihood of securities misconduct of various firms and help analyze the success of a suit. The two sensory inputs would be stock prices and news articles. First, we would utilize machine learning to flag precipitous stock price drops throughout the whole market. We would also use natural language processing and sentiment analysis to analyze relevant news items, identifying patterns of negative disclosures by a firm in the past or public apologies issued by CEOs. These sensory inputs would then be analyzed by a machine algorithm, which would use the data to create a likelihood score of disclosure malfeasance by the firm and the predicted settlement value. This information would then be transmitted through a human algorithm – plaintiff lawyers with years of experience and relationship expertise – who would then verify and expand upon the potential suits flagged by the machine algorithm. They would also provide feedback to the machine algorithm in order to improve its efficacy and accuracy over time.

Women Communicate Better: Spotify

May 9, 2017May 9, 2017dcramer

The Problem

Recommending music without user data is referred to as the ‘cold-start problem’ and is exactly the problem that Spotify wants to solve. This problem arises because it’s nearly impossible to recommend new and unpopular music because by definition, that type of music lacks usage data to base recommendations off of. Spotify wants to be able to introduce people to bands and songs they’ve never heard.

Spotify’s Algorithm

Spotify uses “Collaborative Filtering” to identify users with similar musical taste in order to recommend new music. Basically, User 1 listens to two Justin Bieber songs, also loves the new single from Justin Timberlake, and stores them all in their “Justin^2” playlist. Then, if User 2 also enjoys the same two Bieber songs, the Collaborative Filtering will recommend the new J.T. single to User 2.

The next step comes from a music analytics startup that Spotify acquired called Echo Nest. The Machine Learning from Echo Nest goes above and beyond matching playlists or preferences. The program reads the articles about music and attempts to quantify the descriptions of new music in a way that allows Spotify to bucket songs and artists, and then recommend them to users. This process (Natural Language Processing) is also used to read the titles of billions of user-generated playlists and categorize the songs by the user-generated titles. Using these buckets from the music press and user playlists, Spotify then creates a “taste profile” or a mix of which categories of music the user most enjoys, and their magnitudes.

Spotify also uses deep-learning on actual audio files. Some attributes of music are easier to find from audio files, like which instruments are used or how fast the beat goes, while other attributes are harder to identify through listening, like genre and age of songs.

These computer generated inputs to the recommendation algorithm are also filtered by some human editorial limits. For example, certain genres like white noise albums are filtered out, and they turn off the Christmas music after…well, Christmas. These guardrails keep the algorithm from making understandable but annoying mistakes.

Effectiveness of Spotify

Spotify’s effectiveness is evidenced by the fact that in March 2017, the company hit 50 million paid subscribers as well as 100 million users. In comparison, Apple Music has 20 million subscribers, Tidal has 3M and Pandora has 4.5M. This indicates that Spotify’s features, including selecting specific songs, downloading music, playlist curation and lack of advertisement has been wooing users to the site and converting them to paid users.

The ability to purchase and afford musical content will continue to challenge Spotify, as it has for Netflix, in order to have music that users want to listen to. This is where Spotify’s “Discover Weekly” or “Daily Mix” can continue to attract users with new music that has lower acquisition costs.

Improving Spotify

While Spotify has an advanced machine learning based algorithm, there may be opportunities to use human-machine interactions to improve the algorithm. They already leverage their human network by identifying early adopters to source their “fresh finds” playlist, and this approach can be expanded more broadly to other curated content across the site. Similar to Pandora, which uses more human sensor input. The musicology team at Pandora developed a list of attributes like “strong harmonies” and “female singer” and human sensors graded these inputs. The limitations of these human sensors are obvious, but Pandora failed when it was unable to transition from radio and recommendations to providing on-demand plays of specific songs.

There is also potential for music video integration in the same vein as YouTube’s new video curated playlists. Spotify could look to pool a library of music videos and pair them with song selections or curate specific video recommended lists based on a newly developed algorithm. There is also the potential to incorporate voice recognition, which has already been piloted by Amazon Unlimited through Alexa. Voice recognition could provide valuable integration with voice recognition software and hardware from Google home to Microsoft’s Cortana for more on demand searches. With many new technology services from voice to video emerging, Spotify has the opportunity to build out unique video and voice experiences within its platform to drive a more extensive music platform for its customers.

Sources:

http://benanne.github.io/2014/08/05/spotify-cnns.html

https://www.forbes.com/sites/quora/2017/02/20/how-did-spotify-get-so-good-at-machine-learning/#2142872f665c

https://qz.com/571007/the-magic-that-makes-spotifys-discover-weekly-playlists-so-damn-good/

https://www.slideshare.net/MrChrisJohnson/from-idea-to-execution-spotifys-discover-weekly

https://techcrunch.com/2017/03/02/spotify-50-million/

By: Women Communicate Better (Chantelle Pires, Emily Shaw, Kellie Braam, Ngozika Uzoma, David Cramer)

BUSN39100 Augmented Intelligence

University of Chicago

Author: dcramer

Women Communicate Better – Yelp

Women Communicate Better – Pitch for Classy

Women Communicate Better: Spotify