By: Trevor Gringas, Prashob Menon, Dheeraj Ravi, Joanna Si, DJ Thompson
Fake news is “a made up story with an intention to deceive”. It originated in the BCs when Julius Caesar’s son-in-law, spread false rumors about Marc Antony, culminating in Marc Antony being declared a traitor in the Roman senate. So fake news is not a new phenomenon. However, in the last year, the engagement of users across fake news websites and content has increased significantly across all mediums and types (headlines, sources, content). In fact, in the final 3 months of the 2016 election season, the top 20 fake news articles had more interactions (shares, reactions, comments) than the top 20 real articles.1 Furthermore, 62% of Americans get their news via social media, while 44% use Facebook, the top distributor of fake news.2 This represents a major shift in the way individuals receive information. People are led to believe misleading and often completely inaccurate claims. Decisions that are made off of this information are more likely to be incorrect, leading to a serious threat to our democracy and integrity.
Media corporations are recovering from playing a part in either disseminating this news or inadvertently standing by. Governments have ordered certain social media sites to remove fake news or else face a hefty punishment (e.g. €50 million in Germany).3 Companies such as Google and Facebook are scrambling to find a solution and are investing millions in internal ideas and external partnerships. However, it is extremely difficult to come to consensus on what defines fake news. Often times, ideological underpinnings define one’s proclivity to call something fake or not. This is why our solution focuses on identifying:
- claims which are 100% false (e.g., “there are millions of illegal voters in the US”),
- scientific studies which have been disreputed (e.g., “power poses reduce cortisol levels and increase confidence”), and
- conspiracy theories (e.g., “the moon landing was staged”).
Satire and opinion pieces such as articles from The Onion or a statement like “Russians undermined the US political process” are currently out of scope given that Artificial Intelligence (“AI”) is still far from being able to semantically understand words like a human. Human beings cannot even agree on such things; thus, it is unreasonable to expect AI to be able to do so in the near future.
Check Yourself (“CY”) provides real-time fact checking solutions to minimize the acceptance of fake news. It combines natural language processing techniques with machine learning techniques to immediately flag fake content.
CY’s first approach will employ semantic analysis. Often times, fake news articles are purely clickbait and meant to induce someone to click on an article to generate ad revenue. These articles will have gibberish or unrelated content from the headlines. Our solution will first examine whether the headline and the body of an article are related/unrelated and then whether the content supports the headline. Furthermore, the CY solution leverages fact-checking websites or services to determine whether the actual content itself has anything explicitly fake. Verification would happen against established websites, academics, and other website attributes (e.g. domain name, Alexa web rank).
The second approach involves (i) identifying platform engagement (Facebook, Twitter), (ii) analyzing website tracker usage (ads, cookies, widgets) and patterns over time, and (iii) generating links between those items to predict relationships. In the past year, the proliferation of ad trackers has led to many domains being created for clickbait and then quickly being abandoned to avoid detection. Furthermore, these websites often link to common platforms and other websites where one can find patterns in fake news sources that are distinct from those created by established news sites. This will result in a neural network through which the CY algorithm may predict the probability that the source is fake.
Combining the above two approaches leads to a novel solution as it semantically analyzes the text and assesses the veracity of the source to generate a probability score for how fake an article is.
The first phase of this will be designed in-house by a data scientist. After devising a baseline result and target, we will then use crowdsourcing to improve upon the algorithm. Given our limited in-house resources and the novel nature of this problem, we want to maximize our potential for success by generating ideas from individuals from all disciplines and encouraging collaboration either through an in-house crowdsourcing platform, or through existing platforms such as InnoCentive. We also intend to build out a mobile application through which users may curate and select news subscriptions that would automatically be scored using the CY solution. The mobile app will allow users to submit feedback on the accuracy of CY’s probability scores.
The next stage in the company’s roadmap is to continuously improve on the product and incorporate other features beyond a mere probability score and in-article highlights. These could include a list of corroborating sources or a list of “real”/factual news articles on the same subject. In the long-term, the goal is to be able to apply the algorithm not only to written text articles, but to be able to convert verbal speech into text, subsequently run the algorithm, and have CY call out inaccuracies on an almost real-time basis. This long-term solution would take into account not just textual relationships but other things such as verbal intonations and facial muscle movements so that factors such as mood and facial expressions can help determine the likelihood of fake news. CY intends to be a real-time lie detector for all types of news mediums, print, video, and yes – even live in-person interviews. Impossible you say? Tell that to the folks at Google Brain who created the computing power to essentially perform real-time language translation. The computing power available today is rapidly increasing such that aspirations of this sort are indeed achievable.
Pilot 1 will be run with articles on the 2016 election. Subject matter experts will be asked to evaluate our algorithm in real-time. We will place the experts in four conditions – liberal, conservative, independent, no affiliation – and run two experiments.
Experiment 1. Assessing speed and accuracy of the CY algorithm on fake news sources. We will present the exact same news stories to each group. The algorithm, along with the experts, will evaluate the article and both the speed of the human experts’ comments and their assessments will be compared against those of the CY algorithm. Both fake news sources and legitimate sources will be tested.
Experiment 2. The second phrase will involve snippets of phrases in the various articles (not opinionated statements, but facts or lies).
Pilot 2 will be conducted with an academic journal or newspaper. In line with our propensity for crowdsourcing and desire to collaborate across disciplines, our team will test the algorithms against a team of faculty and students fact-checking sources for publication.
Many companies are trying to solve this problem. As noted above, Facebook and Google are key developers in this space. Existing solutions largely consist of human fact checkers, but they are not as comprehensive in their approach as we are. Furthermore, human fact checkers are rarely able to provide feedback in real-time. Universities are also trying to solve this problem, and are doing so with small teams of students and faculties. The advantage CY has over universities as well as the tech giants is two-fold. First, we intend to create neural networks that span various news sites and search engines (Google, for e.g., currently relies only on its search algorithms and platforms) Second, our focus on crowdsourcing the solution and crowdsourcing for further feedback allows for the best ideas in a newly emerging area.
Even though our value proposition affects companies and customers, we will primarily start with a B2B product. We anticipate collaborating with a news aggregator as an initial keystone customer. Given the strength and connections of our Advisory Board, CY is confident that initial keystone customers will not be an issue. As more media and news aggregators adopt a fake news notifier, content producers themselves will be incentivized to use such a service as well. Large media companies have around 10-20 fact checkers on staff for any live debate. The media company cost for fact checkers alone results in about $600K-$1.2M (assuming they spend $60k per checker per year). Furthermore, these customers often use Twitter and Reddit and would find our service invaluable to confirm the veracity of statements/claims immediately. Even more staff is on hand for research publications and institutions to verify academic journals and articles prior to publication. We anticipate that CY would reduce at least 50% of the fact checking resources of a media company. Key to CY’s continued success is to gain quick adoption and serve as the go-to platform for real-time fact-checking solutions so that additional features (such as a suggested sources feature described above or a social sharing aspect) have distinct and sustainable value. The product will be offered as a subscription service for lower-usage customers, and then as a combination of a subscription + usage cost basis for larger customers.
At this time, we are asking for $225K to cover development costs and expenses over the next year. The bulk of this funding would go towards hiring a data scientist with the remainder covering administrative cost including IT and Server costs. To supplement this, we are also working on securing grants from agencies who are keen to address the problem of fake news.
 For a selection of articles on the efficacy of crowdsourcing and its potential, please see: King, Andrew and Karim Lakhani. “Using Open Innovation to Identify the Best Ideas.” Sloan Management Review 55(1):SMR 466.
Boudeau, Kevin J. and Karim R. Lakhani. “Using the Crowd as an Innovation Partner.” Harvard Business Review April 2013 R3104C.
Parise, Salvatore, Eoin Whelan, and Steve Todd. “How Twitter Users Can Generate Better Ideas.” MIT Sloan Management Review (2015): 21.
Schlack, Julie Wittes. “HBR Web Article: Ask Your Customers for Predictions, Not Preferences” Harvard Business Review January (2015).