QFM009: Machine Intelligence Reading List March 2024

Everything that I found interesting in March 2024 about machines behaving intelligently

Matthew Sinclair
11 min readApr 11, 2024
We kick off this month with NVIDIA’s ambitious Project GR00T, which aims to create a general-purpose foundation model for humanoid robots. Then, we move to DeepMind’s SIMA, which explores the use of generative AI in handling 3D virtual environments.

We also explore the use of AI with infrastructure and tools with Meta’s development of large AI clusters and then take a quick look at Noi a neat macOS desktop wrapper for interactions with LLMs.

On the theoretical and safety fronts, we have articles covering the intriguingly unexplained abilities of LLMs, the security concerns surrounding AI metacognition, and innovative defence strategies against jailbreaking attacks. These demonstrate the field’s ongoing attention to understanding, securing, and ethically advancing AI tech.

We explore the implications of AI on creativity and human authenticity as well as its potential disruptions to the ad industry and traditional software development roles.

Finally, we look at the democratisation of AI alongside philosophical inquiries into human uniqueness in the age of AI.

As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Project GR00T: is NVIDIA’s new initiative to create a general-purpose foundation model for humanoid robot learning. This model will enable humanoid robots to understand and act on multimodal instructions, enhancing their ability to perform various tasks. The project involves collaborations with leading humanoid companies worldwide and utilises NVIDIA’s technology stack for simulation, training, and deployment. Project GR00T is part of the GEAR Lab’s mission to create generally capable agents that operate skilfully in both virtual and real environments.

The Expanding Dark Forest and Generative AI: This article explores the challenge of distinguishing humans from AI in the digital world, as generative AI floods the web with content that mimics human output. It discusses strategies for proving human authenticity in an environment increasingly dominated by artificial intelligence.

An Introduction to Knowledge Graphs: This article provides an introduction to knowledge graphs, highlighting their importance in organising and structuring vast amounts of data by representing relationships between entities. Knowledge graphs are valuable for various applications across industries like e-commerce and financial services, improving semantic search, recommendation systems, and natural language processing.

A generalist AI agent for 3D virtual environments: DeepMind introduces SIMA, a generalist AI capable of understanding and performing tasks in various 3D video games using natural language instructions, showcasing the potential for more versatile AI agents that can adapt to different virtual environments and objectives.

Building Meta’s GenAI Infrastructure: Meta recently announced the development of two large AI clusters, marking a significant investment in AI infrastructure with 24k GPU clusters designed for high throughput and reliability across various AI workloads. This infrastructure supports the training of advanced AI models, including Llama 3, and aims to lead in AI by building flexible, scalable systems that promote open innovation and responsible AI development.

GitHub: lencx / noi: Noi is a neat repo that centres on empowering users of LLMs, offering features like URL loading, system tray support, theme modes, multiple languages, prompt management, and AI batch questioning. It’s designed to be a versatile tool for enhancing productivity and AI interactions. I used the previous incarnation of lencx’s tool (a macOS native ChatGPT client) almost every day. That earlier project has now been superseded by Noi.

Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup, Cognition AI, founded by gold-medalist coders, has developed an AI, named Devin, capable of autonomously completing software projects, heralding a significant advancement in AI’s ability to reason and execute complex tasks. This development challenges the current dynamics in the software industry and raises questions about the future of software development jobs.

Whilst we remain in the rhyming not reasoning phase of generative AI, I remain skeptical about the applicability of these kinds of models to de novo computing problems. However, there is so much work going into moving out of rhyming and into reasoning (like with the work of Devin) that my skepticism is not “if” but rather “when” this problem gets solved.

Large language models can do jaw-dropping things, but nobody knows exactly why. This article discusses the intriguing and largely unexplained abilities of large language models (LLMs) to perform complex tasks without a clear understanding of the underlying mechanisms. It highlights the importance of unravelling these mysteries to advance AI technology and ensure its safe future development.

Oxen.ai Blog: The Oxen.ai blog is dedicated to supporting AI practitioners in moving from research to production. It features a variety of content, including discussions on state-of-the-art research in their ArXiv Dives, practical machine learning advice, and tips on everything from prompt engineering to data versioning. They aim to help readers navigate the journey from raw datasets to production-ready AI/ML systems.

You can now train a 70b language model at home: The article introduces an open source system by Answer.AI that combines FSDP and QLoRA, enabling the efficient training of a 70b large language model on desktop computers with standard gaming GPUs. This breakthrough makes high-capacity model training accessible to smaller labs and individual researchers, aligning with Answer.AI’s mission to democratise AI development.

“AI, no ads please”: 4 words to wipe out $1tn: This article discusses how AI could significantly disrupt the ad industry by reducing the visibility of ads, potentially wiping out a significant portion of revenue for major tech companies reliant on ad sales. It explores the implications for both the supply and demand sides of the ad industry, highlighting a shift towards AI-driven content delivery that prioritises user preferences over ad display.

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416: Yann LeCun, Meta’s Chief AI Scientist and a Turing Award winner, explores the breadth of machine intelligence with Lex Fridman, discussing the potential, challenges, and future directions of AI, including open source initiatives and the limits of large language models (LLMs) towards achieving artificial general intelligence (AGI).

Anthropic’s Claude 3 causes stir by seeming to realize when it was being tested: Anthropic’s Claude 3 AI demonstrated a surprising level of “meta-awareness” by recognising an artificially inserted scenario during testing, sparking debates about AI metacognition and the ethical implications of such capabilities.

pg_vectorize: a VectorDB for Postgres: This is a Postgres extension that automates the transformation and orchestration of text to embeddings and provides hooks into the most popular LLMs. This allows you to do vector search and build LLM applications on existing data with as little as two function calls. The project relies heavily on the work by pgvector for vector similarity search, pgmq for orchestration in background workers, and SentenceTransformers.

Autogenerating a Book Series From Three Years of iMessages: This article describes how Ben Kettle autogenerates a book series from three years of iMessages, detailing the technical process of extracting, formatting, and printing the messages into physical books. This creative project utilises SQL, LaTeX, and XeLaTeX to manage message data and emojis, resulting in a personal memento that is easier to flip through than digital messages. If you like this idea, there is a GitHub repo here with some code that can help you generate your own book from iMessages.

Meta is building a giant AI model to power its ‘entire video ecosystem,’ exec says: Meta is developing a comprehensive AI model to enhance its entire video ecosystem, aiming to unify video recommendations across Facebook and other platforms. This initiative, part of Meta’s long-term tech strategy, leverages significant investment in AI and hardware to potentially increase user engagement and streamline content recommendations.

Introducing TripoSR: Fast 3D Object Generation from Single Images: Stability AI introduces TripoSR, a revolutionary model developed in partnership with Tripo AI, capable of generating high-quality 3D models from single images in under a second. This model is designed to meet the demands of industries like entertainment and architecture, offering fast and detailed 3D reconstructions accessible even without GPU hardware.

LLM Prompt Injection Worm: Bruce Schneier discusses the development and demonstration of a worm that can spread through large language models (LLMs) by prompt injection. The worm exploits GenAI-powered applications to perform malicious activities without user interaction. It emphasises the potential risks in the interconnected ecosystems of Generative AI applications.

Stable Diffusion 3: Research Paper: This research paper on Stable Diffusion 3 details its superior performance in text-to-image generation, outdoing competitors in typography and prompt adherence through a novel Multimodal Diffusion Transformer architecture. It also introduces a flexible text encoder strategy, allowing significant reductions in memory requirements with minimal impact on output quality.

AI startups require new strategies: This time it’s actually different: This article argues that AI startups face unique challenges not seen in previous tech revolutions, necessitating new strategies to succeed against well-funded incumbents with vast data, talent, and innovation capabilities. Unlike before, incumbents quickly embrace AI, making traditional startup advantages less effective.

The Problem of Human Specialness in the Age of AI: Renowned quantum computer scientist Scott Aaronson delves into the philosophical and technical challenges of AI surpassing human intelligence, questioning what makes humans unique if AI can perform tasks as well or better than humans. He explores the development of large language models (LLMs), AI safety concerns, and the broader implications of AI on human specialness, creativity, and the future of pedagogy. If you prefer video, see this YouTube video of Scott Aaronson’s talk.

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs: This article introduces ArtPrompt, a novel ASCII art-based attack that exploits vulnerabilities in large language models (LLMs) by bypassing safety measures to induce undesired behaviours. This approach highlights the limitations of current LLMs in recognising non-semantic forms of text, such as ASCII art, posing significant security challenges. More on harmful ASCII-art here: ASCII art elicits harmful responses from 5 major AI chatbots.

Top AIs still fail IQ tests: The article discusses a series of IQ tests given to ChatGPT-4 and Google’s Gemini Advanced, finding that both AIs performed poorly, with scores indicating significant gaps in their visual-spatial reasoning and logical intelligence compared to human capabilities. This suggests that, despite their vast knowledge and abilities in specific areas, current AIs lack general intelligence, particularly in interpreting and solving IQ test puzzles. I tested GPT’s creativity when it was first launched and again recently with Claude. Both GPT and Claude do better than most humans on the Divergent Association Task.

The Era of Abstraction & New Creative Tensions: This article discusses the profound effects of abstraction in technology and AI on creativity, information consumption, and the creative industry, highlighting the shift towards interfaces that simplify access to information and the creative tensions arising from AI’s influence on creativity and design. It also touches on the implications for trust and authenticity in digital content.

As Nvidia hits $2 trillion, billionaire Marc Rowan’s asset manager Apollo calls AI a ‘bubble’ worse than even the dotcom era: This article describes Apollo Global Management’s assertion that the AI sector, exemplified by Nvidia’s rise to a $2 trillion market cap, is in a bubble surpassing the excesses of the dotcom era, with concerns over valuation, market expectations, and national security implications highlighted.

Defending LLMs against Jailbreaking Attacks via Backtranslation: This article presents a new method to protect large language models (LLMs) from jailbreaking attacks, which try to bypass model restrictions with altered prompts. This approach uses “backtranslation” to infer the original intent of a prompt based on the model’s response, refusing prompts if the backtranslated version is also refused, demonstrating improved defence effectiveness and minimal impact on benign inputs.

