Did DataVisor Create the Solution to Fighting Fraud?

How Costly is Fraudulent Activity?

According to the Association of Certified Fraud Examiners, businesses lose over $3.5 trillion each year to fraud. As the individuals committing fraud are becoming increasingly more sophisticated and the number of fraudulent cases are increasing, there is a huge emphasis for businesses to become better at managing this activity. To become better at managing fraud, business must prevent and identify more fraudulent transactions than current methods allow for.

Currently, many businesses use manual rules for fraud detection and predictive prevention models. As new patterns of fraud occur, practitioners must manually update rules to manage new threats. Based on the magnitude of activity and the frequency at which fraudsters adapt to existing rules, utilizing manual rules to control fraudulent activity is arguably no longer an effective method.  

Where can business turn for help?

DataVisor’s patent-pending Automated Rules Engine solves this problem. Ultimately, DataVisor created a rules engine that adapts to changing fraud patterns. This technology automatically creates new rules to identify and prevent fraudulent activity and manages existing rules, which helps with reducing false positives resulting from outdated rules.

 

How does the Automated Rules Engine work?

The Automated Rules Engine utilizes big data  and unsupervised machine learning to automatically generate rules to detect and prevent fraudulent activity. Rules engines are part of many companies’ existing online fraud detection and anti-money laundering infrastructure and this technology helps banks and digital service providers recognize fraudulent activities in a more effective way.

While traditional rules engines are reactive to new attacks, unsupervised machine learning catches evolving attacks by correlating user and event attributes, automatically enabling the system to prevent attacks of this nature. Since the rules within the Automated Rules Engine are automatically being updated and refined, an added benefit of this technology is a significant reduction in employee manual review time. With employee time freed up, this enables cyber-risk teams to perform more value-added tasks that further reduce fraud risk.

Does DataVisor have a sustainable competitive advantage?

There are many competitors operating within the cyber fraud risk space. Currently, DataVisor only has a patent-pending right, which isn’t a guaranteed advantage in the marketplace. Two companies, Feedzai and Sift Science, both private entities, offer competing software that utilizes machine learning to detect and prevent fraud. Given the large market- $3.5 trillion annual revenue potential and these products effectiveness when compared to the traditional approach, there appears to be space in the market for many companies to operate successfully.


“Feedzai’s machine learning software recognizes what’s happening in the real world, how it’s different from what happened earlier, and adjust its responses accordingly. Machine Learning models can detect very subtle anomalies and very subtle signs of fraud without requiring new code or new configurations.”
“Every day, businesses worldwide rely on Sift Science to automate fraud prevention, slash costs, and grow revenue. Our cloud-based machine learning platform is powered by 16,000+ fraud signals updated in real-time from activity across our global network of 6,000+ websites and apps (and growing).”

 

Is DataVisor’s Automated Rules Engine a good solution?

Although DataVisor is a private company and doesn’t release financial information, on the company’s website, it shows that the Automated Rules Engine is utilized by popular companies such as Yelp, Pinterest, and Alibaba Group. Based on these successful companies’ utilization of the DataVisor product despite having access to competing products, we can assume that the product has some value within the marketplace.

As the machine learning technology relies on pattern recognition to identify fraudulent activity, it is dependent on fraudsters repeat use of historical actions. With sophisticated fraudsters constantly coming up with innovative ways to commit fraud, the machine learning tool, although more effective than manual rules, does not provide the ultimate solution.

Despite this shortfall, since fraud continues to be an expensive threat, companies will likely pay for the best option on the market to counteract this threat. Given the product attributes and an examination of competing products, we believe DataVisor’s Automated Rules Engine has a high likelihood of future commercial success.

To increase this likelihood, DataVisor should use machine learning to review actual fraud situations that were not predicted by technology and automatically create rules. An additional feature that stops users committing “abnormal” activities would also be beneficial.

Work Cited

https://www.datavisor.com/

http://www.statsoft.com/Textbook/Fraud-Detection

https://venturebeat.com/2017/02/18/how-ai-is-helping-detect-fraud-and-fight-criminals/

http://www.fico.com/en/blogs/analytics-optimization/ai-meets-aml-how-smart-analytics-fight-money-laundering/

http://www.cityam.com/241662/fraud-with-problems-ravelin-boss-martin-sweeney-talks-hailo-fake-credit-cards-and-machine-learning

http://www.pymnts.com/news/security-and-risk/2017/hacker-tracker-datavisor-global-insight-into-online-fraud-big-data-account-takeover-ato-cyberattack-aging-accounts-wonga-data-breach/

http://www.thepaypers.com/default/datavisor-rolls-out-automated-rule-engine-to-support-aml-teams/768504-0

 

Smart Store: Track your store like you would track the vitals of a patient in surgery

You think the shopper is smart?

With the rise in consumer preferences towards natural, organic and non-GMO food, retailers are faced with the challenge of supplying fruits, vegetables, and protein with a shorter shelf life, and adjusting to these trends of a dynamic marketplace.  86% of shoppers are confident the food they buy is safe from germs and toxins, down from 91% in 2014.  Retailers must become more operationally efficient or increase their stock to overcompensate for higher rates of spoilage in order to counteract shorter shelf life challenges.  Planning for fresh produce is more complicated than for non-perishable goods. According to a BlueYonder study, 68% of shoppers feel disappointed with the freshness of their purchases, and 14% of shoppers seek organic certification.

By using machine learning solutions, retailers will be able to optimize the environmental conditions affecting spoilage. In addition, there are risks of being out of compliance on food, health and environmental safety regulations with very high penalty, like Walmart paid $81M in environmental compliance.

How can you keep up?

Grocery retailers generally have low profit margins, so slight improvements to efficiency are important.  Our machine learning solution is aimed at helping retailers improve their management of shorter shelf life products, and ultimately their profitability through optimization of their energy cost and prediction of temperature control equipment failure.  

  • Energy Savings:  In some cases, utilities can amount to up to 50% of profit margin for a store, and energy savings driven by machine learning translate immediately to profit margins.  For example, within the perishable seafood or meat sections, overcooling is a significant cost that can automatically be optimized by sensors that measure temperature in a cooler or refrigerator.
  • Effectivity and Efficiency:  Better allocation of resources like people and machines is very useful for top and bottom line. E.g. out of stock inventory can lead to $24M lost sales per $1B retail sales. Automatic tracking of inventory levels can help increase productivity and also revenues.
  • Predictive Maintenance:  Because refrigeration equipment has to run 24 / 7, there are high breakdown rates of equipment.  Sensing equipment can be applied to HVAC and Nitrogen equipment to predict failure ahead of time.  Even just small freeze / thaw cycles can quickly damage product and lead to waste for retailers.
  • Compliance: FSMA and EPA includes multiple guidelines for retailers and grocery stores to follow, with high penalties for out of compliance.
  • Consumer behavior: Consumer preferences and potential trends can be identified and acted upon if predicted. The Amazon store could even track which products you are interested in, but  had not purchased.
  • Risk mitigation: We could observe financial transactions, customer behavior etc. to predict risks, fraud, shoplifting etc. automatically and proactively.

Organizations are already moving to smarter technology for help.

 

What if the store was also smart?

Grocery retailers could use advanced analytics through IOT and other technology to revamp the way they monitor their stores.

  1. Video feeds
  2. Point Of Sale sensors
  3. Mobile phones / equipment of Associates in store
  4. IR Motion Sensors
  5. HVAC and Energy monitoring using sensing of temperature, pressure, humidity, Carbon Monoxide
  6. Weight Mats
  7. Parking Space sensor
  8. Digital Signage
  9. Gesture Recognition/ accelerometers
  10. Door Hinge Sensor motion/ pressure
  11. Wifi Router and connections
  12. Shelf Weight
  13. Air Filter/humidity
  14. Lighting
  15. Electricity, Water, gas meters
  16. Spark (Temperature) for places this device is taken to

Example use cases:

  1. Predictive Device Maintenance to avoid compliance lapse (e.g. Fridge for Food Safety, Fire Safety equipment, lighting, etc.)
  2. Hazard detection and prevention through monitoring of toxic substance spill and disposal (air filter, shelf weight and video sensor)
  3. FSMA compliance across labels, food expiry, storage conditions, etc.
  4. Health safety with store conditions like canopy use, weather, leaks etc.
  5. Temperature, defrost and humidity monitoring for Ice-cream, meat, dairy, and pharmaceuticals
  6. Video analysis to predict long lines and avoid bad customer experience or lack of lost customers increased productivity etc. by alerting and optimizing resource allocation
  7. Video + Point Of Sale analysis for fraudulent transactions avoidance

A central monitoring within stores, and centrally can be created, to mimic the NASA base in Houston, is always able to support all adventurers within the store. Roger that?

_____________________________________________________________

Team – March and the Machines

Ewelina Thompson, Akkaravuth Kopsombut, Andrew Kerosky, Ashwin Avasarala, Dhruv Chadha, Keenan Johnston

_____________________________________________________________

Sources:
  1.  FMI U.S. Shopper Trends, 2016. Safe: A32. Fit health: A12. Sustain health: A9, A12. Community: A12, * The Hartman Group. “Transparency”, 2015.
  2. http://www.cnsnews.com/news/article/wal-mart-pay-81-million-settlement-what-epa-calls-environmental-crimes
  3. https://www.slideshare.net/vinhfc/out-of-stock-cost-presentation
  4. https://www.fda.gov/food/guidanceregulation/fsma/
  5. https://www.epa.gov/hwgenerators/hazardous-waste-management-and-retail-sector
  6. Amazon store – https://www.youtube.com/watch?v=NrmMk1Myrxc
  7. https://foodsafetytech.com/tag/documentation/

 

Team Dheeraj: Company Pitch: Check Yourself

 

Opportunity:

Fake news is not a new phenomenon by any means. However, in the last 12 months, the engagement of users across fake news websites has increased significantly. In fact, in the final 3 months of the 2016 election season, the top 20 fake news articles had more interactions (shares, reactions, comments) than the top 20 real articles.1 Furthermore, 62% of Americans get their news via social media, while 44% use Facebook, the top distributor of fake news.2 This represents a major shift in the way individuals receive information. With the dissemination of inaccurate content, people are led to believe misleading and often completely inaccurate claims. This will lead people to make incorrect decisions and embodies a serious threat to our democracy and integrity.

Media corporations are recovering from playing a part in either disseminating this news or inadvertently standing by. Governments have ordered certain social media sites to remove fake news or else face a hefty punishment (e.g. $50 million by Germany).3 Companies like Google and Facebook are scrambling to find a solution.

Solution:

Check Yourself (CY) provides real-time fact checking solutions to minimize the acceptance of fake news. It combines natural language processing techniques with machine learning techniques to immediately flag fake content.

The first approach will be to identify whether the article is fake based on semantic analysis. Specifically, it will connect the headline with the body of the text, see if they are related/unrelated, and then see if the content supports the headline. Verification would happen against established websites, fact trackers, and other attributes (e.g. domain name, Alexa web rank).

The second approach involves identifying website tracker usage (ads, cookies, widgets) and patterns over time and language, connecting them with platform engagement (Facebook, Twitter), and linking them with each other. This will result in a neural network where the algorithm is able to predict the probability that the source is fake.

Using an ensemble approach, combining ‘front-end’ and ‘back-end’ methods, leads to a novel solution. After designing the baseline algorithm in-house, we will then use crowdsourcing to improve upon the algorithm. Given the limited supply of data scientists in-house, it would be best to generate ideas from all disciplines, maximizing our success potential.

Pilot:

We will publicly pilot test our application through a live primary debate after we have done rigorous internal checks. As the candidates speak, information they say that is false (e.g. “The economy grew by 6% in the last year”) will be relayed to the interviewer. Additionally, the false information will also be displayed on the TV for consumers to see. At the end of the show, a bi-partisan expert panel along with fact checkers will verify whether the algorithm was accurate. Assuming a successful experiment, this has the power to allow interviewers to fact-check any claims on the spot, ensuring their viewership is well informed.

The Competition:

Currently, many companies trying to solve this problem. Existing solutions encompass mainly fact checkers, but they are not as comprehensive in their approach as we are. Furthermore, these solutions are not real-time. Universities are also trying to solve this problem but are doing so with small teams of students and faculties. The advantage we have over universities as well as companies like Google and Facebook is that crowdsourcing the solution allows for the best ideas in a newly emerging area.

Market Viability:

Even though our value proposition affects companies and customers, we will primarily start with B2B in order to build credibility and then expand to B2C. Large media companies have around 10-20 fact checkers on staff for any live debate or otherwise. This results in an average value of $600-$1.2M (assuming they spend $60k per checker per year). Furthermore, they often use Twitter and Reddit and would find our service invaluable to confirm the veracity of statements/claims immediately. Once we are established, we will move towards a B2C freemium model.

Sources:

1https://www.buzzfeed.com/craigsilverman/viral-fake-election-news-outperformed-real-news-on-facebook?utm_term=.nbR6OEK6E#.ghz5aZk5Z

2https://techcrunch.com/2016/05/26/most-people-get-their-news-from-social-media-says-report/

3https://yourstory.com/2017/04/faceboo-google-fake-news/

Stuck with 2Y’s: Latch (Pitch)

Ask: $ 200,000 ($ 100,000 – initial cloud data storage, 60,000 – team member salaries, 40,000 – marketing & advertisement and other)

Opportunity

According to Pew Research poll, 40% of Americans use online dating(1) and 59% “think online dating can be a good way to meet people”(2). UK country manager of a dating app, eHarmony, Romain Bertrand mentioned that by 2040, 70% of couples will get to meet online(3). Thus, the online dating scene is a huge and ever growing market. Nevertheless, as of 2015, 50% of the US population consisted of single adults, only 20% of current committed relationships have started online, and only 5% when it comes to marriages(1). There is a clear opportunity to improve the success rate of dating apps and improve the dating scene in the US (for a start).

As per Eli Finkel from Northwestern University (2012) (3), likelihood of a successful long-term relationship depends on the following three components: individual characteristics (such as hobbies, tastes, interests etc.), quality of interaction during first encounters, and finally, all other surrounding circumstances (such as ethnicity, social status etc.). As we cannot affect the latter, dating apps have been historically focusing on the first, and have recently started working with the second factor, by suggesting perfect location for the first date etc.

For individual characteristics, majority of dating apps and websites focus on user-generated information (through behavioral surveys) as well as user’s social network information (likes, interests etc.) in order to provide dating matches.  Some websites, such as Tinder, eHarmony and OkCupid go as far as to analyze people’s behavior, based on their performance on the website and try to match the users to people with similar or matching behavior.

Nevertheless, current dating algorithms do not take into account vital pieces of information that are captured neither by our behavior on social media, nor by our survey answers.

Solution

Our solution is an application called “Latch” that would add the data collected through wearable technology (activity trackers such as Fitbit), online/offline calendars, Netflix/HBO watching history (and goodreads reviews), and user’s shopping patterns via bank accounts to the data currently used in apps (user-generated and social media) in order to significantly improve offered matches.

According to John M. Grohol, Psy.D. from PsychCentral, the following are the six individual characteristics that play a key role in compatibility of people for a smooth long-term relationship (4):

  • Timeliness & Punctuality (observable via calendars)
  • Cleanliness & Orderliness (partially observable – e-mails/calendars)
  • Money & Spending (observable via bank accounts)
  • Sex & Intimacy
  • Life Priorities & Tempo (observable via calendars and wearables)
  • Spirituality & Religion (partially observable via calendar, social media, Netflix/HBO patterns, and e-mail)

Out of the six factors mentioned above, 5 are fully or partially observable and analyzable through the data already available online or offline via the sensors mentioned earlier. As all the information we would request digs deeper into privacy circle of a target user, we would be careful to request only information that adds value to our matching algorithm and will use third-parties to analyze such sensitive info as spending patterns.

Commercial Viability – Data Collection

As a new company entering the market Latch would have a clear advantage over the current incumbents, as it would not have to use old and commonly used interface of dating process. As per Mike Maxim, Chief Technology Officer at OkCupid, “The users have an expectation of how the site is going to work, so you can’t make big changes all the time.”

Prior to the launch, we would have to collect initial information. In order to analyze only the relevant data we would have to analyze the behavioral patterns of current couples before they started dating. Thus, we would aggregate data available on their historical purchase decisions and time allocation in order to launch a pilot.

Pilot

The pilot version will be launched for early adopters based on human- and machine-analyzed historical data of existing couples. The early adopters would use automated sensors (Fitbit, gmail etc.) to aggregate data on their spending and behavioral patterns, which will be compared by Latch to algorithms developed on previous experience of existing couples, and will generate matches. Further, the future success rate and compatibility of the matched early adopters will be fed back into the data, and used for further pattern recognition and improved matching algorithms with next users. Future expansion opportunities exist by integrating DNA ancestry analysis (such as provided by MyHeritage DNA), digging deeper in geolocation data (suggest which coffee shops both matches visited), matching games/apps usage history on smartphones, and other.

Sources:

 

Alexander Aksakov

Roman Cherepakha

Nargiz Sadigzade

Yegor Samusenko

Manuk Shirinyan

Bright Cellars: Simplifying Wine Through Algorithms

The Problem / Opportunity

Entering the world of wine can be very daunting, there are seemingly endless varietals/regions and even more vineyards to explore. For someone with little exposure to wine who is looking to learn more about the topic some of the only ways to find out what types you like is the guess-and-check method. For many people, this is extremely frustrating and can be a large waste of money.

Many subscription services offer to filter through the landscape by sending 6 or 12 bottles a month to a person. Unfortunately, this method offers little to no customization to a person’s taste or preferences as the services generally send the same bottles to all their members. Currently, there are few services that offer a quick, affordable, and curated solution for people to experience wines based on their taste preferences.

 

The Solution

 

     

The subscription-based service, Bright Cellars looks to take out what most people find to be the boring hard part about drinking wine – picking out one that they will like! Users go to the Bright Cellars website and take a quiz that gives the company insights into your personal preferences. From there, “the algorithm scores each wine by comparing 18 attributes to your preferences” and you receive a box of wine that the algorithm has picked out for you.

 

Once a user has received and tasted their wine, they can rate the matches that the Bright Cellars algorithm has provided them. This iterative process allows the algorithm to better learn a user’s preferences the more a person enters ratings and uses the service. As the database of ratings keeps growing the algorithm can draw associations from what other people have liked, the creators view the service as a subscription service and Pandora-like matching service mixed into one. Bright Cellars is targeting the millennial market who have not yet committed to drinking a specific type of wine yet, don’t know what they like, or are looking to experiment with uncommon varietals.

 

Market & Comparable Solutions

 

Obviously, the wine market is very mature as is the idea of a subscription wine service with many different options available to consumers. Some of the companies that are getting similar press and attention as Bright Cellars are Club W and Pour This.

 

Club W started off with a very similar model as Bright Cellars, people would take a quiz and they would source and select bottles based on the quiz results. Most recently they have acquired their own winery and have rebranded to Winc. Now members are paired with wines that are made in house or that come from small partner vineyards. They still have a profiling quiz and rating algorithm that tries to continually better understand their members’ taste preferences. This is the closest competitor to Bright Cellars, but as they have their own in-house wine, the algorithm and cost structure is significantly different.

 

Pour This, on the other hand, acts more in the capacity of a traditional wine subscription service but proclaims that it is more curated than its predecessors. Pour This sends its members the same lot of three wines, but they are all hand picked by their in-house curator and tend to be very obscure. They are looking to capture the market of people who want to explore new and different wines that they might never have come across or experimented with in their wine consumption.

 

Proposed Alterations

 

Bright Cellars could make their offering even more customizable by allowing people take quizzes based on their wine experience or knowledge. Someone who knows exactly what they like but is looking to be exposed to more uncommon brands might want to be able to specify that initially. Currently, their model only really caters to those who are beginners in wine and can only identify flavors they prefer rather than wine producing regions or countries.

 

Additionally, Bright Cellars with $150,000 could hire another full-time employee to make improvements to the algorithm while the founders work on business development. This is especially true as they grow the business and look to work directly with wineries.  

 

Sources

https://www.brightcellars.com/

 

http://www.bizjournals.com/boston/blog/startups/2014/08/birchbox-for-wine-bright-cellars-wants-to-use-big.html

 

http://news.mit.edu/2015/alumni-wine-matching-startup-bright-cellars-0903

 

https://www.geekwire.com/2015/online-wine-club-uses-an-algorithm-to-find-you-the-perfect-wines/

 

http://www.xconomy.com/wisconsin/2015/08/18/bright-cellars-snags-1-8m-to-expand-wine-delivery-service/#

 

https://www.eater.com/drinks/2015/12/10/9881394/best-wine-club-delivery-online

 

Quantopian: inspiring talented people everywhere to write investment algorithms

 

“Quantopian inspires talented people everywhere to write investment algorithms”

 

The Opportunity

Quantitative hedge funds, which instead of human traders use computer algorithms and mathematical models to make investment decisions, are becoming increasingly popular. This is due to the fact that their performance has been much better than that of traditional hedge funds.

As more investment managers seek to implement quantitative strategies, finding people has become a great challenges as many people with the requisite skills to develop trading algorithms have little interest in working for a big, established hedge funds.

Solution

Quantopian, a crowd-sourced quantitative investing firm, solves this issue by allowing people to develop algorithms as a side-job.

On one hand, the company provides access to a large US Equities dataset, a research environment, and a development platform to community members, which are mainly data scientists, mathematicians and programmers,  enabling them to write their own investment algorithms.

On the other, Quantopian acts as an investment management firm, allocating money from individuals and institutions to the community top-performing algorithms.  The allocation is based on the results of each algorithm backtesting and live track record.

If an algorithm receives an allocation, the algorithm developer earns 10% of the net profit over the allocated capital.

 

Effectiveness and Commercial Promise

On the algorithm developer side, Quantopian’s community currently has over 100,000 members including finance professionals, scientists, developers, and students from more than 180 countries from around the world.

Members collaborate with each other through forums and in person at regional meetups, workshops, and QuantCon, Quantopian’s flagship annual event.
On the fundraising side, last year Steven Cohen, one of the hedge fund world’s biggest names, promised up to $250m of his own money to the platform. Moreover, they started managing investor capital last month. The initial allocations ranged from $100k to $3m with a median of $1.5m per algorithm. By the end of 2017, the expect allocations to average $5m-10m per algorithm.

 

Concerns

Trading algorithms sometimes converge on buy or sell signals which can generate systemic events. For example in August 2007 all quant algorithms executed sell orders at the same time. During two weeks quant trading strategies created chaos in the financial markets.

If an event similar to August 2007 occurs again, it might harm Quantopian  returns if the algorithms in which money is allocated converge.

Quantopian can mitigate this risk by regularly analyzing their exposition to particular stocks and the overlap between the different strategies they managed and their portfolio with the portfolios of other quant managers that disclose their positions.

Alterations

We believe that Quantopian approach can be utilized for any “market” that requires accurate predictions. Therefore, Quantopian model can be exported to other markets such as weather prediction, in which users are paid by weather forecasting agencies, or sports results, in which users are paid by sport betting sites, or for public policy solutions as part of open government initiatives.

 

Sources:

  1. https://www.ft.com/content/b88e6830-1969-11e7-9c35-0dd2cb31823a
  2. https://www.quantopian.com/about
  3. https://www.quantopian.com/faq
  4. https://www.quantopian.com/home
  5. https://www.ft.com/content/0a706330-5f28-11e6-ae3f-77baadeb1c93
  6. https://www.novus.com/blog/rise-quant-hedge-funds/
  7. http://www.wired.co.uk/article/trading-places-the-rise-of-the-diy-hedge-fund

 

Team members:

Alex Sukhareva
Lijie Ding
Fernando Gutierrez
J. Adrian Sánchez
Alan Totah
Alfredo Achondo

Brainspace: Increasing Productivity using Machines

Opportunity

To make data driven decisions, public and private investigative agencies, especially in legal, fraud detection investigations, compliance and governance issues, are overburdened by record amounts of investigative requests. Data is growing faster than ever and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. The document reviewing process is both time consuming and expensive. These organizations seek more efficient methods to analyze data given the cost associated with people sifting through documents. For example, the most efficient lawyers can sift through 80 documents in an hour, but this process is prone to mistakes due to fatigue, etc.  Digitization has given rise to opportunities for e-discovery solutions. Companies like Brainspace are developing software and machine learning tools that augment intelligence of public and private investing agencies to reduce the amount of time of reviewing documents to make data driven decisions.

Solution

Brainspace analyzes structured or unstructured data to derive concepts and context. Using visual data analytics, it then engages human interaction to refine a search for maximum relevancy.  Brainspace software is able to learn at a massive scale (1 million documents in 30 mins) and the process is entirely transparent in which a user can see and interact with the machine learning.

By using Brainspace, productivity is enhanced due to interaction between the machine and human. Brainspace is better at ingesting, connecting and recalling information than humans, while humans are better at using information to reason, judge and strategize than machines. For example, after the platform organizes the unstructured texts into concepts, humans could filter based on a concept and weight “suggested contexts” to curate relevant search results.

Brainspace is differentiated versus other text searching algorithms in that it searches to recognize concepts not just text.  The software “reads between the lines” as humans do, except that it can handle thousands of pages instantly.  The software is dynamic and unsupervised utilizing no lexicons, synonym lists or ontologies.

Effectiveness and Commercial Promise

The market opportunity and applications of Brainspace’s leading machine learning software are enormous, though the first application has been in the investigative industry.  One can imagine the immense savings associated with document reviewing costs.  Other research-heavy applications include legal e-discovery, fraud detection investigations within financial-services organizations and compliance or governance issues.

For example, remember Enron?  As a demo, Brainspace imported 450 million Enron documents, and traced the emails about an offshore account.  It took 5 clicks and less than a minute to nail down the executives involved.  By hand, it would have taken lawyer’s 6 months!

Brainspace software has many applications, but they have needed to partner with other companies to deliver a usable product to a customer.  For example, Brainspace partnered with Guidance to deliver a product called EnCase.  The Guidance software augments the Brainspace deciphering algorithm by providing protection against hacking and external threats.  The product is used in the investigative industry to provide auditing capabilities critical to large-scale investigations.

A potential competitor to Brainspace that we identified is Graphistry, which similarly takes data, quickly understands the results, and automatically visualizes the data to the user via graphs and cloud technology.  Graphistry may have better visualization software for analysis by a human user, but we believe that the patented deciphering technology of Brainspace is more powerful.  Building upon the software to improve the user friendliness of Brainspace is a potential opportunity.

Alterations

Brainspace could apply its context deciphering technology to online news agencies or social media sites.  For instance, it could analyze twitter feeds and be a more reliable protection against “fake news” generated by underperforming algorithms.  Perhaps CFO’s at companies could implement the software to automate financial packets.  The software may be able to derive trends in financial statements, and derive context for those trends from access to the company email database.

Brainspace’s technology can also be very useful for the sensors revolution that is taking place, creating enormous amounts of information. Brainspace could use its expertise in analyzing structured and unstructured data to create insights from sensors data which can provide immense value for firms that use this data to monitor real-time systems.

Sources:

  1. https://www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind-boggling-facts-everyone-must-read/#6055410d17b1
  2. http://www.businessinsider.com/temp-attorney-told-to-review-80-documents-per-hour-2009-10
  3. https://www.brainspace.com/

Posted by:  Dhruv Chadha, Keenan Johnston, Ashwin Avasaral, Andrew Kerosky, Akkaravuth Kopsombut, Ewelina Thompson

Gamaya

The Opportunity

Farmers crop losses to weeds, pests and diseases can range from 20% to 50%. To prevent this losses farmers apply chemicals to their crops, such as herbicides and pesticides. Farmers do not know exactly where to apply this chemicals, so when they apply them they usually do it to entire fields at a time, this increases the total cost of the application, decreases its efficacy and harms the environment.

In order to solve this, farmers manually scout and sample their fields to determine the locations in which it is best to apply chemicals. This is a time-consuming and costly process and cannot easily account for the enormous variety of factors that affect crops.

Solution

Gamaya aims to solve these issues by making crop scouting more efficient. To do so, Gamaya captures information through a patented hyperspectral camera deployed  by drones that fly over crop fields.

The information captured by this camera is then processed using algorithms and then it is translated into actionable information that farmers can use to optimize the usage of chemicals and fertilizers with the goal of improving production efficiency and reducing crop diseases and weed-related losses. Gamaya’s camera is 100x more efficient at compressing data than other hyperspectral cameras, allowing them to process image data faster and cheaper than competitors. As Gamaya collects more data, their machine learning algorithm increases the speed and accuracy with which they are able to make recommendations to farmers.

Effectiveness and Commercial Promise

Gamaya’s system has been effective in accomplishing their objectives. For example, they worked with K-Farm in Brazil during 2015 greatly improving the result of a 2,000 hectares  corn plantation. The improvements obtained can be seen in the following table:

  Before Gamaya After Gamaya
REVENUE

per season

$1.5m $1.8m
LOSSES

disease and pests

20% 10%
EXPENSES

fertilisers and chemicals

$400k $300k
PROFIT MARGIN 15% 30%

Currently, Gamaya offers two solutions, Soyfit and Canefit. The first one is aimed at sugarcane cultivation and is expected to generate up to a 15% increase in crop yield, a 20% decrease in weed related losses, a 30% reduction in chemical usage and a 30% decrease in disease related losses. The second one is aimed at soybean cultivation with a 10% increase in crop yields, a 20% decrease in weed related losses, a 30% reduction in chemical usage and a 50% reduction in losses due to soil erosion.

It is important to note that the benefits obtained by Gamaya seem relatively large when compared to the cost of the solution. Gamaya charges only 20$ per ha to farmers.

This  proven capability to improve farmers results at a low-cost of implementation has made Gamaya one of the most attractive firms in the AgriTech industry, being featured by Forbes as one of the 4 European AgriTech startups with a potential to become a $1 billion dollar company.

Concern

Although Gamaya has been successful in different markets, entrance to the U.S. and other regions might be more difficult than expected for two reasons: First, US regulatory environment regarding drone deployment is much more complex than in other countries, moreover, changes in international drone regulation might affect their ability to expand in other regions. Second, the price to pay may be really high in developing countries, which means that even though Gamaya will save money farmers may not have enough cash to invest in the project right away.

Alterations

An interesting alteration to Gamaya technology would be to modify their algorithms, using the same sensor, in order to generate value in other industries. For example they might help fisheries track wild fish movements and developmental stage in order to increase their yield and reduce by-catch or help the forestry industry and national parks to predict and prevent wildfire given the  current conditions.

Sources:

  1. http://worldagritechinvestment.com/wp-content/uploads/2015/11/Gamaya-Dr-Akim.pdf
  2. http://www.businessinsider.com/gamaya-raises-32-million-to-use-drones-and-ai-for-agriculture-201
  3. http://gamaya.com/blog-post/advanced-crop-intelligence-to-address-global-food-production-challenges/
  4. gamaya.com

 

Team CJEMS: Revolutionizing Middle Skills Hiring (Pitch)

Problem

The hiring process for new labor has a well-deserved reputation for being inconsistent, time-consuming, and non-democratic. Although some companies have made headway into solving this problem with new technological platforms (e.g., Aspiring Minds) they tend to focus primarily on high-skilled labor. As a result, hiring for high-skilled labor has become more seamless and streamlined. In stark contrast, however, there are nearly 12.3 million community college students in the United States who struggle to find employment. Community college graduates are more likely to seek what is known as “middle-skilled labor” and tend to fill roles that require specific skills and certifications, but typically not a four-year degree (e.g., HVAC technician). At the same time, paradoxically, there are more than 3 million unfilled middle-skilled jobs that these community college graduates could ostensibly fill. The inefficiency in this labor market is caused by a misalignment between job-seeking community college students and prospective employers. We believe that there is no effective channel or signaling mechanism community college students can use to convey their skills to prospective employers. A solution that can bridge the gap between students and employers would be a critical step toward optimizing middle-skilled employment and improving the economic fortunes of community college graduates.

Proposed Solution

We propose an augmented intelligence solution that would sit as a platform between prospective employers and community college students. Traditionally, applicants submit a non-standardized resume that they believe highlights their abilities. Prospective employers then try to match applicant resumes against a nebulous list of criteria they believe are important for the position they are trying to fill. The resulting inefficiency not only means that jobs go unfilled, but it also means that the wrong person may be hired for the wrong role.

In our augmented intelligence solution, we propose that employers use our platform to submit a standard, text-heavy job description. Our platform will translate these job descriptions into simplified skill lists based on a database of similar roles and demonstrated qualifications. This process will require iteration with employers at the outset in order to create a sufficiently large database that can be truly predictive. As these lists are completed, they will be published via the platform as simplified job descriptions reflected in terms of the core skills they require.

On the supply side, community college students create profiles via our platform and fill out all courses they have taken and all relevant work experience. The platform will use a similar database in order to translate disparate coursework and professional work into a list of skills. Students will then be able to access employer job postings and immediately see how well their skills align with employer requirements. Once they apply, employers will not only have access to our platform’s curated list of applicant skills but also traditional application components (e.g., resumes).

We believe that streamlining both sides of this market will enable more efficient hiring and will help close some of the gaps we observe in the middle-skills job market.

Solution Development and Validation

Data Collection

At the outset, we would collect data from prospective employers on some of their commonly filled roles and consult with them to identify the core skills they believe the roles require. On the supply side, we will work with community colleges to translate typical courses into skills (i.e., accounting classes demand a different skill set than a history course).

Model Development / Validation

As the database grows sufficiently large we will begin by conducting a number of small pilots at individual community colleges. Testing the platform in small environments will help us not only validate the efficacy of the model, but will also enable us to adjust the underlying data set in order to make the platform more predictive.

As the model matures, we will expand the pilots to include regions (e.g., Chicagoland) in order to see how combining data from disparate sources changes our model. We ultimately will need to adjust the underlying data set in order to account for differences in how employers articulate skillsets across different parts of the country. A more comprehensive set of pilots will enable us to understand how the data interplays and make these adjustments as necessary

Value Proposition

After sufficient testing of the model, we will provide a beta version for a target list of businesses and students to test.

We have defined a number of benchmarks for both employers and students that can help demonstrate our platform’s value. For employers, tracking average time to hire, average tenure of new hires, and management satisfaction ratings can help validate our platform’s effect relative to the base case.

For students, tracking average number of interviews completed, average time to hire, and post-hire satisfaction can help convey our platform’s value.

Team Dheeraj: I Don’t WannaCry No Mo’ (PROFILE)

Cybersecurity

Last week, a massive cyber-attack took place across more than 150 countries. The so-called “WannaCry” software would cause a screen to pop up on a victim’s computer demanding a $300 payment in return for access to their files. As of May 17, 2017, the total number of computers attacked had reached 300,000. What’s more, the success of the software is spurring imitators, causing more heartburn for cybersecurity experts the world-round. Enter Deep Instinct, a start-up focusing on using AI to detect zero-day cybersecurity threats. Although secretive about their methods, the firm recently competed in Nvidia’s start-up competition and showed how they were using machine learning techniques to identify malware. This is particularly difficult because parsing code for ill intent (like parsing natural language for the same) is difficult. According to an article written on the subject a“…a new family of malware is only about 30 percent different from the code of something that came before.”

Preventative v. Reactive

Given the difficulty in identification most anti-virus software rely on a combination of human reporting and reactive malware management. Deep Instinct, on the other hand, doesn’t rely on pre-existing knowledge or known virus signatures. One would believe this implies having to process an incredibly large amount of data, but the firm claims to use an ensemble algorithm that follows a two-step classification process. First, the firm removes about 95% of available data on a potential malware in a method the firm keeps secret. However, it seems safe to say this can be done using a variable selection tool such as LASSO or Elastic Net. Second, the firm then runs a second algorithm using the remaining variables (i.e. the 5% of remaining data) to classify a file as malware or not. Similarly, the firm does not disclose this method but a classification method such as random forest is likely to play a part here. The table below shows some of the firm’s self-reported results:

Detection Rates False-Positive Rate
Deep Instinct 99% 0.1%
Competitors ~80% ~2-3%

Next Steps

Deep Instinct is still an early-stage firm, but the need for scalable way to detect and prevent malware is clear from last week’s attack. But more long-term, this is a cat-and-mouse game; hackers will get more clever, forcing cybersecurity firms to get more intelligence, and so on and so forth. This begs the question: is there a better solution? In general, it appears a preventative measure that helps identify a file’s intent (by parsing the underlying code, for example) seems to be a good start. With this method, we prevent ransomware attacks from occurring, but we leave ourselves open to being overly-protective (anyone who works for a firm with an overly-active spam filter will commiserate). As we think about the evolution of this space we believe more investment should be done in preventative security in addition to general consumer education about how to identify and react to malware.

Sources:

  1. http://www.npr.org/sections/thetwo-way/2017/05/15/528451534/wannacry-ransomware-what-we-know-monday
  2. https://venturebeat.com/2017/05/10/6-ai-startups-win-1-5-million-in-prizes-at-nvidia-inception-event/
  3. https://www.deepinstinct.com/#/about-us