Over one fifth of US households spending more than 25% of their income on their children, childcare and early child education hurts a lot of household budgets. Within this picture, one important piece is children’s books. The proper development of a child’s reading ability starts at day 1 and has incredible effects on that child’s later ability to succeed in life.
The US children’s book publishing industry is big business, to the tune of $166 million profit on $2.3 billion of revenue annually. Within this, ebooks make up 12% of the industry, or $244 million. Within this industry there are unaddressed problems. A study conducted earlier this year found that there are all too often biases in children’s books. Female characters are grossly underrepresented in children’s books. And when they are, they tend to be the sidekick.
AI of Solution
The way we could build this system would be to incorporate several different machine learning tools to generate and evaluate stories until a threshold of acceptability was reached. We would ingest a large volume of data, likely from open source books to begin with. From there, we use word2vec, sentiment analysis, and topic mapping to generate stories, and use guidelines from the latest research into child development regarding sentence structure and vocabulary to ensure simplicity and age-appropriateness. Using massive training sets, a RNN will use layered outputs that are then reused in the inputs. Another challenge of the model is the short-term memory. The algorithm cannot remember “long-term” and so an architecture being explored now is LSTM and GRU, using gates within the code. Adding additional layers of gates could help to find higher-level interactions but the more layers we choose, the more training data we need to avoid overfitting.
The long-term vision for this product would be an application that allows the user to feed it the reader’s demographic profile. Children become the protagonist in a story of their own creation. In addition, using technology similar to that used in real-time digital advertisement generation, we include pictures to follow along with the story. We then can build partnerships with various children’s entertainment content providers..
In order to pilot this concept, we would first need to ingest a lot of data. Luckily, there is a wealth of children’s books available within Project Gutenberg. Less luckily, these are mostly written in the 19th century. That means that they contain outdated language and social ideas that are largely shunned in most modern children’s stories. However, for a proof of concept, this is acceptable.
Once we have data, we can create a genetic algorithm to generate simple sentences using the book data we ingested, word mapping algorithms, and a list of age appropriate vocabulary to create sentence. Evaluation includes: 1) Using sentiment analysis, how well does the sentiment of the story follows the sentiment of training stories. 2) using topic mapping and word2vec, the stories transition topics gradually and make logical sense. We use a genetic algorithm to optimize both the story arc and the story coherence thendisplaying it to a young reader.
Our app is commercially viable given our initial research and interviews with parents.Based on the feedback of data scientists from the hackathon, our research seems promising and is executable given the fact that children’s stories are simpler than complex stories or writing for adults. Further, we have created a survey for parents to demonstrate the commercial viability of our app.
One parent indicated they spend $500/year on books for their child. Another indicated his interest specifically because he’s from India and the books he buys in the U.S. are more focused on American characters with American names. He is interested because he would have customized stories with Indian demographics that his child does not get living in Chicago. One concern is if he has to input too many things it might become tedious to use our app and he might stop using it after a while. We should aim to provide an easy “default case” that can readily be used for his child.
We also analyzed our competition: Epic! Which is a digital library for kids with a library of over 25,000 books both for kids and for educators. The price is $7.99/month and it has over 44,000 reviews on the app store. While they have an early mover’s advantage they do not have the option to customize based on various factors such as gender, race, citizenship of the parents etc. which will be our unique selling point and provides us with an advantage. We also plan on having a freemium model with up to 3 free stories in a month which will be based on a basic initial input into the app. For anything more interactive or customized, we plan on charging $8/month (similar to our competition) to our users as an upgrade cost. Based on the fact that the children’s story book market in the U.S. is worth $2.3 billion we have a huge potential even if we target the users who only use e-books.
Link to survey sent out to parents: