fAIrytale – Choose your own story… $175000 Ask: Louis Ernst, Jessica Goldberg, Andrew Herrera, Pranav Himatsingka, and Anu Mohanachandran

May 29, 2018May 29, 2018aherrer0

Choose your own story…

*Executive Summary***

Children’s books provide a portal to creativity and human development. However, even as the importance of positive role models for self-esteem becomes more apparent, many children do not see themselves represented in the stories they read. Using machine learning techniques and natural language processing, we can programmatically generate stories with characters and contexts to make all readers feel represented, empowering them to learn and grow.

We seek to raise $175,000 to create a beta version of the iOS application of our product. This funding will be applied to an application designer, consulting from a data scientist, a children’s linguistics expert, legal support, and access to data in the form of children’s literature. We intend to publish an app in the Apple App Store in 2019.

Our Challenge

Today, over 20% of US households spend more than 25% of their income on their children. Children’s books comprise a significant share of that budget. The proper development of a child’s reading ability starts at day one and has incredible effects on that child’s later ability to succeed in life. The US children’s book publishing industry is big business, to the tune of $166 million profit on $2.3 billion of revenue annually, with ebooks comprising 12% of the industry. Outside the home, annual US public school spending per student on supplies averages about $939 per student, totalling to approximately $47 billion.

However, opportunities remain. A 2018 study found that bias is often present in children’s books. Female characters are grossly underrepresented, or appear as the sidekick. The picture is even worse for people of color: they are represented in just 3% of books. And if your child has special needs, they are almost entirely missing from children’s books. It is important for children’s books to represent the diversity of the “real” world [6]. Exposure to diversity through children’s books can help “normalize” for children what may otherwise be perceived as different. It is also an opportunity for children to see role models in what they read.

Introducing fAIrytale

Children, parents and teachers no longer have to search for stories that represent them. fAIrytale is an app created with the mission to represent, empower and grow young readers and their communities through interactive storytelling. From personalized avatars to customizable storylines and dynamic creative, fAIrytale uses the power of augmented intelligence to make reader-centered stories a reality.

We envision programmatically populating stories using parameters filled out by a reader. Stories are co-created with the readers based on the child’s development level and selected descriptors. A reader would be able to choose characters’ gender, ethnicity, family structure, special needs, or more.

Initially, we will use a corpus of pre-written stories customized with character names, pronouns, and family dynamics that match those specified by the reader. Using machine learning techniques, fAIrytale fills out the selected story using the parameters selected by the reader. For instance, if a reader indicates that their parents are a same sex couple, fAIrytale will automatically populate the story with two moms and the appropriate gender pronouns. This will accomplish our minimum goal of aiding representation of groups that don’t usually get to see themselves as the heroes in the story.

Once parameters are selected, imagery that aligns with each sentence will populate the screen. This can be accomplished using dynamic creative, a technique already used in digital advertising. Dynamic creative optimization selects the best combination of elements to include in an ad based on data and real-time feedback. A similar technique could be used to create thousands of iterations of imagery that align with text from the story, for instance, penguins eating ice cream at a beach.

Pilot

Today, much of what fAIrytale can be is aspirational. We are confident that any one piece of this plan has the potential to be a successful business. Currently, we are asking for $175,000. With this money, we will work with a mobile application developer to help build out the system. We will also buy time with a data scientist to build out the machine learning system to populate the story, consulting from a linguist to understand what language is most appropriate for children, and legal services to protect any intellectual property. Currently, we have access to the text of one hundred contemporary children’s stories and many more public domain stories through project Gutenberg. Remaining funding will be used to license access to additional stories, thereby increasing the robustness of our data.

After interviewing several parents about our planned offering, we found strong support for our idea. One parent indicated they spend $500/year on books for their child. Another indicated his desire to share his Indian heritage with his child growing up in Chicago.

We envision our target segments to be both families and schools. Parents will have access to a freemium service. The free version will have more limited customization, and based on further customer research, may have a limited number of monthly stories. The full featured version of fAIrytale will allow for more robust customization and unlimited access for a monthly fee. Based on preliminary interviews with parents, we plan on charging $7.99 per month.

fAIrytale will also target school districts, to enable schools to further expand their libraries and engage students. Schools will also have access to the free model, or will be able to license access to the premium version.

Future Considerations

This is only the first challenge in our quest. Ultimately, our goal is AI-generated children’s storytelling.

We will tag our existing database for story, semantic structure and other characteristics. Humans will provide nuance, while algorithmic tagging and word embeddings such as word2vec will provide insight into parts of speech, and sentence structure. From there, we can automate portions of the story generation process. Because children’s stories are generally more simple, the task of algorithmically generating a story is less complex.

Even as the natural language processing capability develops, story curation will still be required. The AI will propose 2-3 suggested next sentences, and a reader will choose the next step. These interactions will also provide data on how readers engage with stories, allowing our algorithms to develop more favorable stories.

The primary candidates to enable this today are generative adversarial networks, long short-term memory networks (a technique for relational neural networks that has proven results for time delayed tasks, such as talking about a character that doesn’t appear in every sentence) and hidden Markov models (especially known for their application in reinforcement learning and temporal pattern recognition such as speech, handwriting, gesture recognition, and part-of-speech tagging). With a working model to generate simple stories combined with research on childhood language development, we can adjust stories to the reading skill of our readers and help them grow.

Beyond the technical, there are also various features and business opportunities we will explore as fAIrytale grows. This includes interactive images, so that children can touch the screen and get a response. Parent interviews revealed a desire to train the AI to mimic a parents’ voice, so that they can still “read” to their child when they are away. The app could enable multiple device logins, so that families can share stories, even at a distance.

Additional opportunities include creating merchandise around recurring characters, or printing favorite stories as physical books. In the classroom, we see additional opportunities to develop lesson plans and learning opportunities through stories.

Challenges & Risks

We strongly believe in this business model, but acknowledge the risks. While the pilot technology is certainly executable, the generation of new stories is a challenge. However, the development of algorithms such as word embedding, generative adversarial networks, and long short-term memory networks are making big strides in the area of natural language processing. By developing our application starting with our pilot, we can take advantage of this technology as it develops.

Additionally, in using existing stories to train generative systems, we risk codifying biases from past works. Word embedding techniques show that the word “girl” is more closely related to “homemaker” than “scientist”. By using these techniques, we will only perpetuate harmful biases. However, work on the subject has shown that it is possible to differentiate between bias and properly gendered words. I.e. “girl” should be considered closer to “waitress” than “waiter”. These techniques of debiasing should work equally well across other biased differences such as ethnicity.

Looking Forward

We believe that fAIrytale holds great promise. The technology to create a minimum viable product is proven, and the technology for many of our aspirational goals is fast improving. Given the extremely positive responses from interviews with parents and school teachers, and the importance of reading to child development, this project holds great potential not only in terms of economic value but in social value as well. We are excited to choose our adventure with fAIrytale, and we hope you select to join us.

CITATIONS:

Rivera, Edward. “OD4394: Children’s Book Publishing Industry Report”. IBISWorld. Dec. 2017. Web. 5/28/2018.
https://www.nytimes.com/2014/03/16/opinion/sunday/where-are-the-people-of-color-in-childrens-books.html?_r=2
Speech, Language, and Swallowing – Development, American Speech-Language-Hearing Association, https://www.asha.org/public/speech/development/34/. Accessed 28 May 2018.
School spending by student https://nces.ed.gov/programs/coe/indicator_cmb.asp
Horning, Kathleen T. Publishing Statistics on Children’s Books about People of Color and First/Native Nations and by People of Color and First/Native Nations Authors and Illustrators, Cooperative Children’s Book Center School of Education, University of Wisconsin-Madison, 22 Feb. 2018, ccbc.education.wisc.edu/books/pcstats.asp. Accessed 29 May 2018.
Epstein, BJ. Kids Need Diversity in Books to Prepare Them for the Real World, Newsweek, 7 Feb. 2017, www.newsweek.com/childrens-books-diversity-ethnicity-world-view-553654. Accessed 29 May 2018
http://www.jmlr.org/papers/volume3/gers02a/gers02a.pdf

fAIrytales – A Pitch for an interactive children’s book App

May 23, 2018May 23, 2018aherrer0

Opportunity

Over one fifth of US households spending more than 25% of their income on their children, childcare and early child education hurts a lot of household budgets. Within this picture, one important piece is children’s books. The proper development of a child’s reading ability starts at day 1 and has incredible effects on that child’s later ability to succeed in life.

The US children’s book publishing industry is big business, to the tune of $166 million profit on $2.3 billion of revenue annually. Within this, ebooks make up 12% of the industry, or $244 million. Within this industry there are unaddressed problems. A study conducted earlier this year found that there are all too often biases in children’s books. Female characters are grossly underrepresented in children’s books. And when they are, they tend to be the sidekick.

AI of Solution

The way we could build this system would be to incorporate several different machine learning tools to generate and evaluate stories until a threshold of acceptability was reached. We would ingest a large volume of data, likely from open source books to begin with. From there, we use word2vec, sentiment analysis, and topic mapping to generate stories, and use guidelines from the latest research into child development regarding sentence structure and vocabulary to ensure simplicity and age-appropriateness. Using massive training sets, a RNN will use layered outputs that are then reused in the inputs. Another challenge of the model is the short-term memory. The algorithm cannot remember “long-term” and so an architecture being explored now is LSTM and GRU, using gates within the code. Adding additional layers of gates could help to find higher-level interactions but the more layers we choose, the more training data we need to avoid overfitting.

The long-term vision for this product would be an application that allows the user to feed it the reader’s demographic profile. Children become the protagonist in a story of their own creation. In addition, using technology similar to that used in real-time digital advertisement generation, we include pictures to follow along with the story. We then can build partnerships with various children’s entertainment content providers..

Pilot

In order to pilot this concept, we would first need to ingest a lot of data. Luckily, there is a wealth of children’s books available within Project Gutenberg. Less luckily, these are mostly written in the 19th century. That means that they contain outdated language and social ideas that are largely shunned in most modern children’s stories. However, for a proof of concept, this is acceptable.

Once we have data, we can create a genetic algorithm to generate simple sentences using the book data we ingested, word mapping algorithms, and a list of age appropriate vocabulary to create sentence. Evaluation includes: 1) Using sentiment analysis, how well does the sentiment of the story follows the sentiment of training stories. 2) using topic mapping and word2vec, the stories transition topics gradually and make logical sense. We use a genetic algorithm to optimize both the story arc and the story coherence thendisplaying it to a young reader.

Commercial Viability

Our app is commercially viable given our initial research and interviews with parents.Based on the feedback of data scientists from the hackathon, our research seems promising and is executable given the fact that children’s stories are simpler than complex stories or writing for adults. Further, we have created a survey for parents to demonstrate the commercial viability of our app.

One parent indicated they spend $500/year on books for their child. Another indicated his interest specifically because he’s from India and the books he buys in the U.S. are more focused on American characters with American names. He is interested because he would have customized stories with Indian demographics that his child does not get living in Chicago. One concern is if he has to input too many things it might become tedious to use our app and he might stop using it after a while. We should aim to provide an easy “default case” that can readily be used for his child.

We also analyzed our competition: Epic! Which is a digital library for kids with a library of over 25,000 books both for kids and for educators. The price is $7.99/month and it has over 44,000 reviews on the app store. While they have an early mover’s advantage they do not have the option to customize based on various factors such as gender, race, citizenship of the parents etc. which will be our unique selling point and provides us with an advantage. We also plan on having a freemium model with up to 3 free stories in a month which will be based on a basic initial input into the app. For anything more interactive or customized, we plan on charging $8/month (similar to our competition) to our users as an upgrade cost. Based on the fact that the children’s story book market in the U.S. is worth $2.3 billion we have a huge potential even if we target the users who only use e-books.

Appendix:

Link to survey sent out to parents:

https://goo.gl/forms/S5mDSH9846hHhKYA2

Profile on the National Residency Match Program

April 18, 2018aherrer0

Challenge:

In 1952, the National Residency Program (NRMP) was established as a non-profit organization to match medical students and residency programs. However, by the 1990s, the system was under serious strain. With thousands of students attempting to match across dozens of specialties and thousands of residency programs, there was concern that hospital preference was favored over that of applicants’. Further, more applicants were trying to match as a couple, something the current system could not handle, causing students to opt out of the process. The NRMP hired Al Roth to design an algorithm that would lead to more desirable outcomes and encourage applicants to opt into the process.

Solution:

Today, the NRMP uses Roth’s adaptation of the Gale-Shapley algorithm to produce “stable” matches, that ultimately favor applicants. It considers their preferences first, then the hospitals’. It also can do joint matching for couples. Beyond residency programs, this algorithm has applications from matching students to public high schools in NYC and to organ donor matching. In 2012, Alvin Roth and Lloyd Shapely won a Nobel Prize in economics for the algorithm’s application across markets that require choice from both sides of the market, and where price is not a factor.

The algorithm works by having both students and hospitals rank their preferences after an interview process. Both applicants and hospitals simultaneously submit their preferences in rank order. Then, the algorithm considers the first student and determines if that student has ranked a given hospital, if not, it moves on. If she has ranked the hospital, then the algorithm determines if the hospital has ranked the student. If so, then they are “tentatively matched”. The algorithm goes to the second student, looking at the same hospital. If it is a mutual match, then it determines 1) if the hospital has a spot remaining and 2) if the ranking is higher or lower than the previous student already “tentatively matched”. If higher than the previous match, then the previous match is demoted. If not, and all spots are filled, then the algorithm considers the student’s second choice. This process occurs across dozens of specialties, thousands of global applicants and thousands of residency positions.

Results:

In 2017 Match, 30,478 students matched, or ~71% of registrants, an all-time high. Unfilled positions were placed in the Match Week Supplemental Offer and Acceptance Program, ultimately leading to a 99.4% fill rate. Couples are increasingly satisfied with the new algorithm – 95.4% matched in 2017. The previous methodology would not have been able to handle this volume and level of complexity at this speed, about 17 seconds. Literature exists to refine the algorithm, but there are no commercially viable alternatives as most agree with the “stable marriage” approach to this process. Considering how adaptable the algorithm is, it is unlikely any competing approach will exist in the short-term.

Opportunities:

The process is imperfect. Both students and hospitals attempt to influence the other party’s rankings by over-embellishing their true preferences. Trust in the process and concerns about whether applicant preferences are taken into account remain.

To maintain trust, the NRPM is not currently changing the algorithm. However, there are other methods of engendering trust – such as continuing data transparency. In not adjusting the algorithm, the NRMP loses any opportunity to improve upon it. Significant time and money is also spent in the interview process itself.

This algorithm could be expanded to other industries that have many companies with similar offerings targeting a specific pool of candidates. Example industries are investment banking, consulting, and law. This process could also expand globally, for example NRMP has licensed the algorithm to the Canadian Resident Matching Service. It also could be used for other professions such as matching law students to firms, though Dr. Roth indicated this only would work if there was a demand for a market to be created.

This type of algorithm could be used in any two-sided market with uncomplicated preferences (firms or people or a combination of both) and where money is not a deciding factor. Evolving the algorithm to handle complicated preferences is necessary to further commercialize to other markets where price is not a reasonable arbiter to set supply and demand.

Sources:

https://www.carms.ca/en/residency/match-algorithm/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3399603/

http://www.nrmp.org/about-nrmp/

http://freakonomics.com/2015/06/17/110307/

http://www.nrmp.org/wp-content/uploads/2017/06/Main-Match-Results-and-Data-2017.pdf

https://web.stanford.edu/~alroth/nrmp.html

https://jamanetwork.com/journals/jama/fullarticle/195998

http://www.nrmp.org/wp-content/uploads/2017/09/Applicant-Survey-Report-2017.pdf

https://www.forbes.com/sites/theapothecary/2014/04/15/how-a-nobel-economist-ruined-the-residency-matching-system-for-newly-minted-m-d-s/#91dbc3055850

https://www.forbes.com/sites/theapothecary/2014/05/27/the-real-problem-with-medical-education-isnt-the-residency-matching-system/#6fa0e2ab2433

Augmented Bling

April 6, 2018aherrer0

Opportunity:

The US Luxury goods market is $85B and growing slowly, worldwide the market was $249B in 2016. The luxury goods industry has seen a recent foray into augmented intelligence with the advent of the Apple watch. At the Basel Fair in March, de Grisogono, one of the biggest jewelers in the world launched a product-driven chatbot, that guides users into selecting various types of jewelry (rings and pendants in this case). The chatbot first introduces itself and then compliments the customer, finally proceeding to ask questions about the customer’s taste and using augmented intelligence eventually offering a choice of jewelry to buy. The industry in general is facing a decline with big players like Tiffany & Co. seeing declining sales and profits in the last two years. As Millennial tastes move away from jewelry and traditional diamonds, companies like De Beers are focusing on ways to improve customer experience. This brings us to our solutions to improve this market.

We believe a potential enhancement to this industry would be an app to virtually try on jewelry. Data-driven marketing utilizing customer CRM data and on-site browsing habits would ensure customers tastes are met in the best manner. WIth the help of the AI demonstration and experience, retailers such as Tiffany & Co could potentially decrease the size of their brick and mortar stores, reducing rent costs.

Effectiveness and commercial promise:

This strategy seems like it will pay off in a large way. According to statistics by McKinsey, online sales of luxury goods have been increasing relative to overall sales, showing an increasing willingness to make large purchases virtually. By creating augmented displays, these companies decrease the time and effort required for a consumer to “try on” their product and increase the variety of available options to test out with smaller operational costs. These tools could be easily scaled with little additional marginal cost.

One limitation to this effort’s success is the ease of reproducibility. Because other competitors in the space can easily copy any successful initiatives, it may not serve as a strong differentiator for any one firm. However, it should drive further sales industry wide by lowering the cost of creating an endowment effect by showing the potential customer what it would look like if they wore this particular jewelry.

As the chatbot and the photo get information on customer preference, the company can further personalize the offerings. In this case, the phone acts as a sensor, returning information on how long users interact with the app and how many options they are choosing between. Information from photos that users take with the app would sense information as well. For example, the app could suggest different jewelry for different outfits. Chatbots are an easy win for luxury retail. The more data is gathered by the chatbot, the more personalized the price can become too, allowing for capture of the maximum consumer surplus.

Chatbots are replacing human customer service through online chat. The more bot use, lead to decreased cost and higher service value for the customer. For a concierge, the efforts to ask many questions, the time it takes to analyze it and the potential for error in recommending a new product are all costly. Whereas once a chatbot is programmed well, the cost is very low and the service is available 24/7/365.

The chatbot will assist in gathering consumer preferences and then the virtual “fitting room” will be used to determine whether the customer is satisfied with the item. There is significant effort in the online retail space to develop the technology needed for an accurate virtual fitting room. If the experience is not accurate, it could adversely affect the credibility of the retailer and its technology. Amazon has introduced the Echo Look style assistant, which has received positive reviews.

The virtual stylist is being applied to fashion, there is an opportunity to bring it specifically to the jewelry space. The company GRANI is an early adopter of the virtual jewelry fitting room space. The design will be refined, allowing users to try jewelry with different outfits/ hairstyles, improving the image quality so the exact cut and quality of the jewelry is apparent.

LINKS:

https://www.wsj.com/articles/tiffany-hunts-for-path-to-regain-cool-1499621248

https://thinkmobiles.com/blog/augmented-reality-jewelry/

https://www.fool.com/investing/2018/03/17/why-tiffany-co-stock-dropped-on-friday.aspx

https://www.mckinsey.com/industries/retail/our-insights/luxury-shopping-in-the-digital-age

https://www.ft.com/content/1c2a6b24-a514-11e7-8d56-98a09be71849

Market size: https://www.luxurysociety.com/en/articles/2017/07/us-luxury-goods-market-sees-another-year-slow-growth/

http://www.bain.com/publications/articles/luxury-goods-worldwide-market-study-fall-winter-2016.aspx

Personalized Pricing http://review.chicagobooth.edu/marketing/2018/article/are-you-ready-personalized-pricing

Analogy to Chatbots: https://chatbotsmagazine.com/3-high-value-chatbots-types-and-1-you-need-to-fire-immediately-49832901fe8a

https://www.econsultancy.com/blog/66058-fashion-ecommerce-are-virtual-fitting-rooms-the-silver-bullet

https://www.prnewswire.com/news-releases/facecake-releases-first-online-mobile-and-in-store-augmented-reality-shopping-platform-for-jewelry-at-nrf-2018-300583203.html