Quantopian: inspiring talented people everywhere to write investment algorithms


“Quantopian inspires talented people everywhere to write investment algorithms”


The Opportunity

Quantitative hedge funds, which instead of human traders use computer algorithms and mathematical models to make investment decisions, are becoming increasingly popular. This is due to the fact that their performance has been much better than that of traditional hedge funds.

As more investment managers seek to implement quantitative strategies, finding people has become a great challenges as many people with the requisite skills to develop trading algorithms have little interest in working for a big, established hedge funds.


Quantopian, a crowd-sourced quantitative investing firm, solves this issue by allowing people to develop algorithms as a side-job.

On one hand, the company provides access to a large US Equities dataset, a research environment, and a development platform to community members, which are mainly data scientists, mathematicians and programmers,  enabling them to write their own investment algorithms.

On the other, Quantopian acts as an investment management firm, allocating money from individuals and institutions to the community top-performing algorithms.  The allocation is based on the results of each algorithm backtesting and live track record.

If an algorithm receives an allocation, the algorithm developer earns 10% of the net profit over the allocated capital.


Effectiveness and Commercial Promise

On the algorithm developer side, Quantopian’s community currently has over 100,000 members including finance professionals, scientists, developers, and students from more than 180 countries from around the world.

Members collaborate with each other through forums and in person at regional meetups, workshops, and QuantCon, Quantopian’s flagship annual event.
On the fundraising side, last year Steven Cohen, one of the hedge fund world’s biggest names, promised up to $250m of his own money to the platform. Moreover, they started managing investor capital last month. The initial allocations ranged from $100k to $3m with a median of $1.5m per algorithm. By the end of 2017, the expect allocations to average $5m-10m per algorithm.



Trading algorithms sometimes converge on buy or sell signals which can generate systemic events. For example in August 2007 all quant algorithms executed sell orders at the same time. During two weeks quant trading strategies created chaos in the financial markets.

If an event similar to August 2007 occurs again, it might harm Quantopian  returns if the algorithms in which money is allocated converge.

Quantopian can mitigate this risk by regularly analyzing their exposition to particular stocks and the overlap between the different strategies they managed and their portfolio with the portfolios of other quant managers that disclose their positions.


We believe that Quantopian approach can be utilized for any “market” that requires accurate predictions. Therefore, Quantopian model can be exported to other markets such as weather prediction, in which users are paid by weather forecasting agencies, or sports results, in which users are paid by sport betting sites, or for public policy solutions as part of open government initiatives.



  1. https://www.ft.com/content/b88e6830-1969-11e7-9c35-0dd2cb31823a
  2. https://www.quantopian.com/about
  3. https://www.quantopian.com/faq
  4. https://www.quantopian.com/home
  5. https://www.ft.com/content/0a706330-5f28-11e6-ae3f-77baadeb1c93
  6. https://www.novus.com/blog/rise-quant-hedge-funds/
  7. http://www.wired.co.uk/article/trading-places-the-rise-of-the-diy-hedge-fund


Team members:

Alex Sukhareva
Lijie Ding
Fernando Gutierrez
J. Adrian Sánchez
Alan Totah
Alfredo Achondo

Brainspace: Increasing Productivity using Machines


To make data driven decisions, public and private investigative agencies, especially in legal, fraud detection investigations, compliance and governance issues, are overburdened by record amounts of investigative requests. Data is growing faster than ever and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. The document reviewing process is both time consuming and expensive. These organizations seek more efficient methods to analyze data given the cost associated with people sifting through documents. For example, the most efficient lawyers can sift through 80 documents in an hour, but this process is prone to mistakes due to fatigue, etc.  Digitization has given rise to opportunities for e-discovery solutions. Companies like Brainspace are developing software and machine learning tools that augment intelligence of public and private investing agencies to reduce the amount of time of reviewing documents to make data driven decisions.


Brainspace analyzes structured or unstructured data to derive concepts and context. Using visual data analytics, it then engages human interaction to refine a search for maximum relevancy.  Brainspace software is able to learn at a massive scale (1 million documents in 30 mins) and the process is entirely transparent in which a user can see and interact with the machine learning.

By using Brainspace, productivity is enhanced due to interaction between the machine and human. Brainspace is better at ingesting, connecting and recalling information than humans, while humans are better at using information to reason, judge and strategize than machines. For example, after the platform organizes the unstructured texts into concepts, humans could filter based on a concept and weight “suggested contexts” to curate relevant search results.

Brainspace is differentiated versus other text searching algorithms in that it searches to recognize concepts not just text.  The software “reads between the lines” as humans do, except that it can handle thousands of pages instantly.  The software is dynamic and unsupervised utilizing no lexicons, synonym lists or ontologies.

Effectiveness and Commercial Promise

The market opportunity and applications of Brainspace’s leading machine learning software are enormous, though the first application has been in the investigative industry.  One can imagine the immense savings associated with document reviewing costs.  Other research-heavy applications include legal e-discovery, fraud detection investigations within financial-services organizations and compliance or governance issues.

For example, remember Enron?  As a demo, Brainspace imported 450 million Enron documents, and traced the emails about an offshore account.  It took 5 clicks and less than a minute to nail down the executives involved.  By hand, it would have taken lawyer’s 6 months!

Brainspace software has many applications, but they have needed to partner with other companies to deliver a usable product to a customer.  For example, Brainspace partnered with Guidance to deliver a product called EnCase.  The Guidance software augments the Brainspace deciphering algorithm by providing protection against hacking and external threats.  The product is used in the investigative industry to provide auditing capabilities critical to large-scale investigations.

A potential competitor to Brainspace that we identified is Graphistry, which similarly takes data, quickly understands the results, and automatically visualizes the data to the user via graphs and cloud technology.  Graphistry may have better visualization software for analysis by a human user, but we believe that the patented deciphering technology of Brainspace is more powerful.  Building upon the software to improve the user friendliness of Brainspace is a potential opportunity.


Brainspace could apply its context deciphering technology to online news agencies or social media sites.  For instance, it could analyze twitter feeds and be a more reliable protection against “fake news” generated by underperforming algorithms.  Perhaps CFO’s at companies could implement the software to automate financial packets.  The software may be able to derive trends in financial statements, and derive context for those trends from access to the company email database.

Brainspace’s technology can also be very useful for the sensors revolution that is taking place, creating enormous amounts of information. Brainspace could use its expertise in analyzing structured and unstructured data to create insights from sensors data which can provide immense value for firms that use this data to monitor real-time systems.


  1. https://www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind-boggling-facts-everyone-must-read/#6055410d17b1
  2. http://www.businessinsider.com/temp-attorney-told-to-review-80-documents-per-hour-2009-10
  3. https://www.brainspace.com/

Posted by:  Dhruv Chadha, Keenan Johnston, Ashwin Avasaral, Andrew Kerosky, Akkaravuth Kopsombut, Ewelina Thompson


The Opportunity

Farmers crop losses to weeds, pests and diseases can range from 20% to 50%. To prevent this losses farmers apply chemicals to their crops, such as herbicides and pesticides. Farmers do not know exactly where to apply this chemicals, so when they apply them they usually do it to entire fields at a time, this increases the total cost of the application, decreases its efficacy and harms the environment.

In order to solve this, farmers manually scout and sample their fields to determine the locations in which it is best to apply chemicals. This is a time-consuming and costly process and cannot easily account for the enormous variety of factors that affect crops.


Gamaya aims to solve these issues by making crop scouting more efficient. To do so, Gamaya captures information through a patented hyperspectral camera deployed  by drones that fly over crop fields.

The information captured by this camera is then processed using algorithms and then it is translated into actionable information that farmers can use to optimize the usage of chemicals and fertilizers with the goal of improving production efficiency and reducing crop diseases and weed-related losses. Gamaya’s camera is 100x more efficient at compressing data than other hyperspectral cameras, allowing them to process image data faster and cheaper than competitors. As Gamaya collects more data, their machine learning algorithm increases the speed and accuracy with which they are able to make recommendations to farmers.

Effectiveness and Commercial Promise

Gamaya’s system has been effective in accomplishing their objectives. For example, they worked with K-Farm in Brazil during 2015 greatly improving the result of a 2,000 hectares  corn plantation. The improvements obtained can be seen in the following table:

  Before Gamaya After Gamaya

per season

$1.5m $1.8m

disease and pests

20% 10%

fertilisers and chemicals

$400k $300k

Currently, Gamaya offers two solutions, Soyfit and Canefit. The first one is aimed at sugarcane cultivation and is expected to generate up to a 15% increase in crop yield, a 20% decrease in weed related losses, a 30% reduction in chemical usage and a 30% decrease in disease related losses. The second one is aimed at soybean cultivation with a 10% increase in crop yields, a 20% decrease in weed related losses, a 30% reduction in chemical usage and a 50% reduction in losses due to soil erosion.

It is important to note that the benefits obtained by Gamaya seem relatively large when compared to the cost of the solution. Gamaya charges only 20$ per ha to farmers.

This  proven capability to improve farmers results at a low-cost of implementation has made Gamaya one of the most attractive firms in the AgriTech industry, being featured by Forbes as one of the 4 European AgriTech startups with a potential to become a $1 billion dollar company.


Although Gamaya has been successful in different markets, entrance to the U.S. and other regions might be more difficult than expected for two reasons: First, US regulatory environment regarding drone deployment is much more complex than in other countries, moreover, changes in international drone regulation might affect their ability to expand in other regions. Second, the price to pay may be really high in developing countries, which means that even though Gamaya will save money farmers may not have enough cash to invest in the project right away.


An interesting alteration to Gamaya technology would be to modify their algorithms, using the same sensor, in order to generate value in other industries. For example they might help fisheries track wild fish movements and developmental stage in order to increase their yield and reduce by-catch or help the forestry industry and national parks to predict and prevent wildfire given the  current conditions.


  1. http://worldagritechinvestment.com/wp-content/uploads/2015/11/Gamaya-Dr-Akim.pdf
  2. http://www.businessinsider.com/gamaya-raises-32-million-to-use-drones-and-ai-for-agriculture-201
  3. http://gamaya.com/blog-post/advanced-crop-intelligence-to-address-global-food-production-challenges/
  4. gamaya.com


Team CJEMS: Revolutionizing Middle Skills Hiring (Pitch)


The hiring process for new labor has a well-deserved reputation for being inconsistent, time-consuming, and non-democratic. Although some companies have made headway into solving this problem with new technological platforms (e.g., Aspiring Minds) they tend to focus primarily on high-skilled labor. As a result, hiring for high-skilled labor has become more seamless and streamlined. In stark contrast, however, there are nearly 12.3 million community college students in the United States who struggle to find employment. Community college graduates are more likely to seek what is known as “middle-skilled labor” and tend to fill roles that require specific skills and certifications, but typically not a four-year degree (e.g., HVAC technician). At the same time, paradoxically, there are more than 3 million unfilled middle-skilled jobs that these community college graduates could ostensibly fill. The inefficiency in this labor market is caused by a misalignment between job-seeking community college students and prospective employers. We believe that there is no effective channel or signaling mechanism community college students can use to convey their skills to prospective employers. A solution that can bridge the gap between students and employers would be a critical step toward optimizing middle-skilled employment and improving the economic fortunes of community college graduates.

Proposed Solution

We propose an augmented intelligence solution that would sit as a platform between prospective employers and community college students. Traditionally, applicants submit a non-standardized resume that they believe highlights their abilities. Prospective employers then try to match applicant resumes against a nebulous list of criteria they believe are important for the position they are trying to fill. The resulting inefficiency not only means that jobs go unfilled, but it also means that the wrong person may be hired for the wrong role.

In our augmented intelligence solution, we propose that employers use our platform to submit a standard, text-heavy job description. Our platform will translate these job descriptions into simplified skill lists based on a database of similar roles and demonstrated qualifications. This process will require iteration with employers at the outset in order to create a sufficiently large database that can be truly predictive. As these lists are completed, they will be published via the platform as simplified job descriptions reflected in terms of the core skills they require.

On the supply side, community college students create profiles via our platform and fill out all courses they have taken and all relevant work experience. The platform will use a similar database in order to translate disparate coursework and professional work into a list of skills. Students will then be able to access employer job postings and immediately see how well their skills align with employer requirements. Once they apply, employers will not only have access to our platform’s curated list of applicant skills but also traditional application components (e.g., resumes).

We believe that streamlining both sides of this market will enable more efficient hiring and will help close some of the gaps we observe in the middle-skills job market.

Solution Development and Validation

Data Collection

At the outset, we would collect data from prospective employers on some of their commonly filled roles and consult with them to identify the core skills they believe the roles require. On the supply side, we will work with community colleges to translate typical courses into skills (i.e., accounting classes demand a different skill set than a history course).

Model Development / Validation

As the database grows sufficiently large we will begin by conducting a number of small pilots at individual community colleges. Testing the platform in small environments will help us not only validate the efficacy of the model, but will also enable us to adjust the underlying data set in order to make the platform more predictive.

As the model matures, we will expand the pilots to include regions (e.g., Chicagoland) in order to see how combining data from disparate sources changes our model. We ultimately will need to adjust the underlying data set in order to account for differences in how employers articulate skillsets across different parts of the country. A more comprehensive set of pilots will enable us to understand how the data interplays and make these adjustments as necessary

Value Proposition

After sufficient testing of the model, we will provide a beta version for a target list of businesses and students to test.

We have defined a number of benchmarks for both employers and students that can help demonstrate our platform’s value. For employers, tracking average time to hire, average tenure of new hires, and management satisfaction ratings can help validate our platform’s effect relative to the base case.

For students, tracking average number of interviews completed, average time to hire, and post-hire satisfaction can help convey our platform’s value.

Team Dheeraj: I Don’t WannaCry No Mo’ (PROFILE)


Last week, a massive cyber-attack took place across more than 150 countries. The so-called “WannaCry” software would cause a screen to pop up on a victim’s computer demanding a $300 payment in return for access to their files. As of May 17, 2017, the total number of computers attacked had reached 300,000. What’s more, the success of the software is spurring imitators, causing more heartburn for cybersecurity experts the world-round. Enter Deep Instinct, a start-up focusing on using AI to detect zero-day cybersecurity threats. Although secretive about their methods, the firm recently competed in Nvidia’s start-up competition and showed how they were using machine learning techniques to identify malware. This is particularly difficult because parsing code for ill intent (like parsing natural language for the same) is difficult. According to an article written on the subject a“…a new family of malware is only about 30 percent different from the code of something that came before.”

Preventative v. Reactive

Given the difficulty in identification most anti-virus software rely on a combination of human reporting and reactive malware management. Deep Instinct, on the other hand, doesn’t rely on pre-existing knowledge or known virus signatures. One would believe this implies having to process an incredibly large amount of data, but the firm claims to use an ensemble algorithm that follows a two-step classification process. First, the firm removes about 95% of available data on a potential malware in a method the firm keeps secret. However, it seems safe to say this can be done using a variable selection tool such as LASSO or Elastic Net. Second, the firm then runs a second algorithm using the remaining variables (i.e. the 5% of remaining data) to classify a file as malware or not. Similarly, the firm does not disclose this method but a classification method such as random forest is likely to play a part here. The table below shows some of the firm’s self-reported results:

Detection Rates False-Positive Rate
Deep Instinct 99% 0.1%
Competitors ~80% ~2-3%

Next Steps

Deep Instinct is still an early-stage firm, but the need for scalable way to detect and prevent malware is clear from last week’s attack. But more long-term, this is a cat-and-mouse game; hackers will get more clever, forcing cybersecurity firms to get more intelligence, and so on and so forth. This begs the question: is there a better solution? In general, it appears a preventative measure that helps identify a file’s intent (by parsing the underlying code, for example) seems to be a good start. With this method, we prevent ransomware attacks from occurring, but we leave ourselves open to being overly-protective (anyone who works for a firm with an overly-active spam filter will commiserate). As we think about the evolution of this space we believe more investment should be done in preventative security in addition to general consumer education about how to identify and react to malware.


  1. http://www.npr.org/sections/thetwo-way/2017/05/15/528451534/wannacry-ransomware-what-we-know-monday
  2. https://venturebeat.com/2017/05/10/6-ai-startups-win-1-5-million-in-prizes-at-nvidia-inception-event/
  3. https://www.deepinstinct.com/#/about-us

Bankers: Clinical no-show reduction (Pitch)


According to The Washington Post [1], on average, you will need to wait 18.5 days before you can get an appointment to see your physician. A lot can happen during 18 days. Eventually, according to the Annals of Family Medicine, 18 percent of patients will decide to skip their appointments during this period because they are [2]:

  • Feeling worse and need to go to the emergency room
    Overscheduled and forget their appointment
  • Limited in their healthcare literacy and don’t understand or appreciate why the appointment is necessary
  • Not in an established relationship with their doctor and aren’t concerned about missing an appointment
  • Influenced by language barriers or, socio-economic factors and misunderstand when their appointment is

No show’s financial impact on the US healthcare system is estimated as $330+ billion year, resulting in US’s GDP reduction (2% of GDP).

A doctor is supposed to see 15 patients every day
10% no-show rate (*1) = 1.5 missed appointments daily = 8 no-shows per week (*2)
The doctor organizes appointments into 30-minute sessions at a cost of $100/session (*3).
Because of the 10% no-show rate, he loses $800 per week. This no-show rate costs the practice around $41,600 per year.
There are approximately 810,000 physicians in the U.S (*4). The total loss is estimated as $337 billion a year

(*1) We assumed a conservative 10% no-show rate referring to this study [3]
(*2) This example assumes a 5 day work week and does not exclude holidays
(*3) We assumed a conservative $150 no-show cost referring to the same study above [3]
(*4) Statista data [4]


We propose BookMe, a machine learning-based patient management tool (SaaS). With BookMe, medical providers can easily integrate their own website and BookMe Scheduler, which asks patients for an easy sign up and suggests available timeslots based on predicted no-show rates. BookMe also sends automated reminder texts and calls to those with relatively high no-show rates to minimize the costs associated with no show. BookMe predicts no-show rates analyzing a country-wide healthcare data, provider-specific data (if provided), transportation data etc. In addition to this 2-step no-show reduction process, each healthcare provider will get several relevant reports: capacity utilization report and BookMe performance report (prediction power, no-show reduction results, and additional revenue captured by this system).

If a clinic with 10 physicians can reduce no-show rate by 5% (half), it can save $208K a year. Clinics will expect cost cutting of employment through making scheduling and reminder operations automatic. BookMe costs free for small providers (up to 300 reservation a month) and $20/mo for larger providers (300+ reservation a month), both being attached with No-show Reduction Program (text/call reminder service).

Prototype Development Design

Data Collection

We would collect 10,000+ actual appointment data (gender, age, symptom, phone #, email address, and zip code) from our partner medical centers (University of Chicago and Rush University) through BookMe (beta) in combination with patient personal data (nationality, preferable language, family data, past diagnosis and prescription, etc) the medical centers own. From third party service providers, we would collect transportation data which would be related to one of the factors affecting the decision whether patients should skip their appointments or not. There are some preceding studies [5] about no-show patterns from which we could obtain insights for this process.

Model Development

We will structure Lasso-based logistic regression model with a minimum cross-validation error to predict accurate no-show rate (the higher the worse), which will be repetitively done whenever additional data is imported.

Model Validation

We would review the validation of the future appointment prediction, sorting out multiple reduction factors, in other words, whether the actual reduction comes from 1) optimized slot allocation or 2) no-show reduction program. Additionally, we would study on the effectiveness of adding provider-specific data to sample data because this affects how much data BookMe should collect from providers’ existing database in addition to the patients’ sign up data.


Team members:

Nobuhiro Kawai
Nadia Marston
Lisa Clarke-Wilson
Antonio Salomon


  1. https://www.washingtonpost.com/news/wonk/wp/2014/01/29/in-cities-the-average-doctor-wait-time-is-18-5-days/
  2. http://www.annfammed.org/content/2/6/541.full
  3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4714455/
  4. https://www.statista.com/topics/1244/physicians/
  5. http://www.mdpi.com/2227-9032/4/1/15/pdf

X.ai: Personal Assistant to Schedule Meetings (Profile)

The Opportunity

Scheduling meetings can often be a time-consuming and frustrating experience, especially if it involves several parties. Everyone has a different schedule, time and location preferences and it takes a lot of time and effort to coordinate several people to set up a meeting.


X.ai is an AI personal assistant that will coordinate all parties involved to schedule a meeting, thus savings hours of time and a lot of effort. The AI assistant called Amy or Andrew will reach out to all the meeting participants and suggest a few time slots and location that would work for the user. Then it will have a few rounds of correspondence with the participants to find a time that they are all available, without spamming the inbox of the user. If it’s an in-person meeting, that bot will start to suggest places to meet and times to meet. If you’ve used Amy or Andrew before, they will start to learn about whether you are a Starbucks or Blue Bottle type, or if you prefer knocking one back with your contact at the local pub. Once the meeting is set, the bot will add it into whatever calendar you use.

X.ai is a conversational, smart bot which makes interactions with the assistant seamless and effortless (you just message it like you would be a real personal assistant). The service can be used both for business and social purposes.

Effectiveness and Commercial Promise

X.ai is taking advantage of a new trend on how people approach technology. Users are getting “app-fatigued” – they get lost in the endless number of new apps and rather than that heavily use a few “trusted” and most useful apps (users on average download 0 apps a month and spend 80 percent of their time in just three apps that they use). Smart helpers, on the other hand, are getting traction with the customers. The likes of Amazon Alexa, Apple Siri, Google Assistant and customer support bots for Facebook Messenger are becoming more and more popular in b2c and b2b spaces. So instead of creating a new app, X.ai makes standard apps that everyone uses smarter – in this case, email and calendar.

Currently X.ai provides three different subscription plans: Personal (5 meetings per month for free, but long wait list to get registered), Professional (unlimited meetings and no waitlist for $39 a month) and Business (all professional features, plus assistant on company’s domain, plus instant internal meetings for all users of X.ai inside the same company). It promises high ROI as it frees up time for workers to do their main job and results in the lower need of real personal assistants. The service is already used by the likes of LinkedIn and Salesforce.

The commercial viability of the service will depend largely on the precision of the algorithm that is used to set up meetings. If it works seamlessly, doesn’t make mistakes and doesn’t require human interventions, it will most probably be successful. If it makes mistakes and requires constant attention from the user, it is likely to fail.

Alterations and progress up to date

Additional features such as proactive suggestion of meeting, conference room reservation and even restaurant/hotel/flight booking can expand the functionality of the personal assistant even further, practically removing the need for a real-life personal assistant. The bot could even negotiate deals with hotels/travel agencies by collecting offers and then using them as leverage to negotiate a better deal with another vendor (more applicable for personal vacations). At the same time, even by doing one thing well the company can hit significant valuation and client base. Although the exact # of clients is not disclosed, founders refer to hundreds of thousands with a significant backlog. Since the inception, it raised $34.3M in total funding rounds, from such reputable investors as SoftBank Capital and Pritzker


  1. http://www.x.ai
  2. https://www.theverge.com/2016/4/7/11380470/amy-personal-digital-assistant-bot-ai-conversational
  3. https://techcrunch.com/2016/04/07/rise-of-the-bots-x-ai-raises-23m-more-for-amy-a-bot-that-arranges-appointments/
  4. https://www.crunchbase.com/organization/x-ai#/entity
  5. https://smallbiztrends.com/2016/05/personal-assistant-x-ai.html
  6. http://www.businessinsider.com/review-of-amy-ingram-the-virtual-personal-assistant-from-xai-2015-7

Women Communicate Better – Pitch for Classy

The Problem

Everyone’s familiar with class-action lawsuits where a bunch of families sue a pharmaceutical company. However, the same problem often happens for investors as well, leading to a securities class-action lawsuit. Essentially, if a company neglects its fiduciary responsibility to keep investors informed about negative changes in the company, and those eventually impact the company’s stock price when the news becomes public, investors are entitled to sue a company.

Right now, the existing solution is throwing bodies at the problem – plaintiff firms keep tons of lawyers on staff whose job is to read the news and track stocks, and hopefully identify a situation where a securities class-action lawsuit could be filed. This is incredibly manual and time-intensive, and is an impossible process to ensure success – a person is always going to miss some opportunities.

The Solution

Instead of relying on plaintiff lawyers and industry blogs, like Lyle Roger’s The 10b-5 Daily, to just manually scan and analyze stock price data, we believe there is an opportunity to merge human understanding and machine learning to identify and even predict potential security class action suits. To solve this problem we propose the creation of Classy, a service that predicts potential lawsuits for plaintiff lawyers by combining machine algorithms and human intuition. Currently many of the class action suits brought by firms end up being frivolous and yield limited to no profit for plaintiffs. Classy will help plaintiff firms to mediate this error and increase their efficiency in pursuing the most fruitful cases. Additionally Classy will help plaintiff firms better prioritize their staffing structure, so that there are more lawyers using their time to execute suits rather than searching for potential signs of fraud.  

The Design

Our product would combine external sensors with machine and human algorithms to predict the likelihood of securities misconduct of various firms and help analyze the success of a suit. The two sensory inputs would be stock prices and news articles. First, we would utilize machine learning to flag precipitous stock price drops throughout the whole market. We would also use natural language processing and sentiment analysis to analyze relevant news items, identifying patterns of negative disclosures by a firm in the past or public apologies issued by CEOs. These sensory inputs would then be analyzed by a machine algorithm, which would use the data to create a likelihood score of disclosure malfeasance by the firm and the predicted settlement value. This information would then be transmitted through a human algorithm – plaintiff lawyers with years of experience and relationship expertise – who would then verify and expand upon the potential suits flagged by the machine algorithm. They would also provide feedback to the machine algorithm in order to improve its efficacy and accuracy over time.

Instituting an Ethics Framework for AI

The Problem

The progresses in Artificial Intelligence (AI) in recent years, from data mining to computer vision and from natural language processing to robotics, have demanded people to start think about morality’s importance and complexity in the design of AI. According to the Economist almost half of all jobs could be automated by computers within two decades.[i] Many of these jobs are complex and involve judgements that lead to significant consequences.

The dilemma self-driving cars faces is a good example. In the situation of a brake failure, a self-driving car can either keep going straight, which would result in the death of pedestrians, or swerving to avoid hitting the pedestrians, which would result in the death of dogs crossing the street. How should the AI be programed to make decisions for the self-driving car in situations like this? Drones used by military to target and suppress terrorists is another well-debated example of the importance of morality in the design of AI. According to the New York Times, the Pentagon has put AI at the center of its strategy to maintain the United States’ position as the world’s dominant military power.[ii] The new weapons would offer speed and precision unmatched by any human while reducing the number — and cost — of soldiers and pilots exposed to potential death and dismemberment in battle. How do we make sure these drones make the moral decision in the battlefield?

As innovation in AI accelerates, we need to get ahead of the curve and implements morality into the design of AI so that gains today are not taken at the cost of future abatement. We should define morality before an AI does.

Potential Challenges

The major challenge of programing ethics into AI is the fact that human ethical standards are currently imperfectly codified in law and they make all kinds of assumptions that are difficult to make. Machine ethics can be corrupted, even by programmers with the best of intentions. For example, the algorithm operating a self-driving car can be programed to adjust the buffering space it assigns to pedestrians in different districts based on monetary amount of settlement of previous accidents in each district. The assumption is that the bigger buffering space in districts with higher settlement costs can reduce the potential for higher settlement. The assumption seems reasonable, but it is possible that the lower settlements in certain districts were due to the lack of access to legal resources for residents of poorer neighborhood. Therefore, the algorithm could potentially disadvantage these residents based on their income.

The Solution

We have established the need to teach AI to have a learned concept of morality. However, given the challenges, governments and regulators need to be lead the effort to establish a globalized standard for machine ethics. These standards need to be clearly instituted and codified by legislature. However, governments’ lack the resources and talent in the field of AI would require them to have private sector’s involvement. At the same time, companies that are already developing products using AI such as self-driving cars have conflict of interests in assisting the government. This poses a potential opportunity for us. We can build a product that crowd-sources human opinions on how machines should make decisions when faced with moral dilemmas to help governments write these legislatures. For example, we could crowd source scenarios and the most appropriate responses to those scenarios for self-driving cars on our platform, and then contract with the governments to evaluate our findings and implement them into algorithms of self-driving cars looking to enter the market. Although sales process to government entities can be lengthy, our role as a third party between the regulator and the companies developing AI products and the potential multi-year revenue stream with a mandated project place us in a very good position.

[i] http://www.economist.com/news/briefing/21594264-previous-technological-innovation-has-always-delivered-more-long-run-employment-not-less

[ii] https://www.nytimes.com/2016/10/26/us/pentagon-artificial-intelligence-terminator.html


Team Awesome Members:

Rachel Chamberlain

Joseph Gnanapragasam

Cen Qian

Allison Weil

TalentSafe: Secure your firm’s talented workforce!

“The average worker today stays at each their job for 4.4 years, according to data from the Bureau of Labor Statistics, but the expected tenure of the youngest employees is half that.” – Forbes 2016

TalentSafe leverages signals from all over the workplace to predict who is at risk of attrition and how to maintain a healthy team.

The break up:

Quitting a job or firing someone is gruelling. However, not all turnover is bad. High employee-retention rate can be evidence of productivity, or alternatively suggest a culture of entitlement or one that fails to challenge employees. There is a healthy, ideal turnover rate applicable to different organizations.

The problem is not small. The potential benefits are massive.  

The real problem is how to increase the retention rate of high performance employees and maintain healthy retention rate of other employees.  


Accurate predictions enable organizations to take action for retention or succession planning of employees. To solve this problem, organizations can use academic research backed machine learning techniques[2] to predict employee turnover. However, no solution exists in the market because of a few gaps that can be addressed.

Fix data gaps:

Modeling data, a big issue, comes from HR Information Systems (HRIS), which are typically underfunded compared to other Information Systems in the organization. This creates noise in the data that renders predictive models prone to inaccuracies. Studies[4] use Extreme Gradient Boosting (XGBoost) to improve predictions from the data.

TalentSafe can use data and signals from different sources, such as:

  • Baselines from interview process and past experience, using studies[5]
    • Biodata
      • Employee reference
      • Prior job length
    • General work-related attitudes
      • Self-confidence
      • Decisiveness
      • Perseverance
    • Job-specific attitudes
      • Desire for the job
      • Overt intent to quit
    • Personality traits
      • Conscientiousness
      • Emotional stability
  • Sources of data during the job
    • Behavioral and attitudinal
      • Emails
      • Messages
      • Office phone conversation
      • Web browsing behavior
      • Applications usage
      • Job search history
      • Meeting attendance, cancellations
      • Reimbursements
    • Human responses
      • Survey feedback from employees, peers, managers, customers
    • HR information systems
      • Performance ratings
      • Performance reviews
      • Salary/raise
      • Leaders’ feedback
      • Employees’ self assessment
    • Machine sensors
      • Audio detection, video detection, and facial detection from camera
  • Measuring
    • Responses – Responsiveness, timeliness, positive/negative emotions
    • Motivation – self and others
    • Happiness at work place[6] using computer and other systems usage
    • Engagement/involvement in work related social events

Value of each attrition is different. The departure of VP of Sales has a very different impact compared to an analyst’s.

Benefits > Costs = functional turnover

Costs > Benefits = dysfunctional turnover

TalentSafe’s solution would assess information from all sources to understand sentiments, engagement, and emotions to

  • Predict attrition risk
  • Assess value of employees
  • Recommend the right action plan, and address long term trends and dysfunctions

* value to organization is subject to measurement process like performance metrics

Demonstration / Pilot

Objective: Predict turnovers, understand if the turnover is good or bad, and what kind of action to take.  The pilot will be tested in three ways.


  • Out of sampleTalentSafe will take all relevant data from a target firm for the previous year including people who stayed and who left. The models will be built on 70% of training data, and 30% testing data
  • Out of timeFor the same firm, TalentSafe will model using 2016 data and predict on 2017 data
  • Real time sampleTalentSafe will cluster different branches of the same organization based on their overall performance and attrition rates & quality. Within each cluster, branches will be randomly assigned into test and control. Nothing changes for control branches, but we implement TalentSafe in test branches to measure the  attrition rates and performance before and after implementation.

In all three cases, the assessment will be across metrics of attrition size, quality and performance, in a confusion matrix[7] comparing estimated and actuals.


Once the sources of problems are identified it becomes important to address the root causes. Are there systematic dysfunctions[8] in management/leadership, policies, process, compensation, people etc. that need to be addressed?

— Ewelina Thompson, Akkaravuth (March) Kopsombut, Andrew Kerosky, Ashwin Avasarala, Dhruv Chadha, Keenan Johnston


1. http://www.hrvoice.org/the-mystery-of-the-ideal-turnover-rate/
2. https://www.techemergence.com/machine-learning-in-human-resources/
3. https://business.udemy.com/blog/the-next-wave-of-predictive-analytics-in-hr-5-tips-for-success-in-2017/
4. https://thesai.org/Downloads/IJARAI/Volume5No9/Paper_4-Prediction_of_Employee_Turnover_in_Organizations.pdf
5. https://filene.org/assets/pdf-reports/1752-94Predicting_EE_Turnover.pdf
6. Csikszentmihalyi’s flow model – https://www.toolshero.com/effectiveness/flow-model-csikszentmihalyi/
7. https://classeval.wordpress.com/introduction/basic-evaluation-measures/
8. https://en.wikipedia.org/wiki/The_Five_Dysfunctions_of_a_Team