Hello Everyone,
As we enter week 7, we’ll conclude our discussions on individual experiments with ChatGPT on Monday – if you didn’t talk last time, please get prepared on Monday. After the discussion, we’ll delve into the intricacies of Dall.E.3, which will serve as our prime examples of Large Multimodal Models’ creativity.
Wednesday will be our second debate. The format is similar to our first, but we should have some time to do Q&A at the end.
Regarding the readings, they may seem extensive at first glance; but they predominantly revolve around image-text models and Dall.E.3. In some previous sessions we did a number of ad-hoc evaluations of Dall E.3, and on Monday we’re going to take a more detailed/systematic look at its performance. When doing the readings, please focus on the central question about evaluating multimodal creativity and try examples from the readings or by you own experiments – feel free to include your own examples of experimenting Dall.E.3 in your weekly response. For those without access to ChatGPT 4, Microsoft Bing is an alternative for experimenting with Dall E.3 for free.
Also, please ensure that you read Daston (2022)’s piece on Babbage and Wittgenstein as well, as it will inform our examination of creativity in today’s large generative models. A note for the entrepreneurial minds in class: Babbage was not only a pioneer in computer but also the first one to venturing into the start-up arena of computing machines (even though it didn’t quite pan out – he eventually failed to draw enough funding..).
Lastly, please remember to send me by email your group project topic and one-paragraph description over this weekend.
I look forward to our engaging discussions and the second debate!
Week 7 (November 6 and 8): Evaluating LLMs: Creativity and Multimodality
– Assignments:
Daston, Lorraine (2022). “Algorithmic Intelligence in the Age of Calculating Machines”
from Rules, 122 -150. Princeton University Press.
Malihe, Alikhani et al. (2023). “Text Coherence and its Implications for Multimodal AI: Frontier in Artificial Intelligence, Section of Language and Computation, Volume 6: http://www.frontiersin.org/articles/10.3389/frai.2023.1048874/full
Yang et al. (2023). “The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).” arxiv.org: https://arxiv.org/pdf/2309.17421.pdf
Hsu, Tiffany and Steven Lee Myers (April 8, 2023). “Can We No Longer Believe Anything We See?” The New York Times: https://www.nytimes.com/2023/04/08/business/media/ai-generated-images.html
OpenAI’s Dall E.3: https://openai.com/dall-e-3 (Dall E.3 is an image-and-text LLM model that can work together with ChatGPT Plus, which was released by Open AI on September 21 2023. We will show and discuss some examples generated by Dall E.3 in class).
Optional (vision and touch): Guzey et al. (2023). “See to Touch: Learning Tactile Dexterity through Visual Incentives.” arxiv.org: https://arxiv.org/pdf/2309.12300.pdf
Optional (text and speech): Meta AI’s Introduction to its new multimodal AI model for speech translation (released in August 22, 2023): https://ai.meta.com/blog/seamless-m4t/
– Second class debate (November 8):
“How Much AI Will Bring More Creativity and Innovation than Risks and Dangers?”
Besides materials from the class, you can also visit the two webpages that I mentioned in class to find inspirations, which show today’s top AI practitioners antagonistic attitudes towards AI regulations: https://managing-ai-risks.com/ and https://open.mozilla.org/letter/. The relevant debate is still going on on Twitter so feel free to find inspirations from there as well.