Can We Build Deterministic LLMs That Produce Reliable Outputs?
15th Sept 2025 | Superintelligence Newsletter
Hey Superintelligence Fam 👋
Large Language Models (LLMs) continue to impress - but also mislead. Thinking Machines Lab just showed that even in deterministic settings, subtle choices like batch structure can cause surprising output shifts. Why does this matter? Because hallucinations - confident but false answers - persist even with perfect data. Fixing them isn’t just about better training; it’s about rethinking model design.
Meanwhile, AI governance is heating up: OpenAI’s major restructuring, a landmark U.S.–UK partnership, and rising public demand for regulation. The bigger question remains: if LLMs are built to guess, not to doubt, can we ever trust them? Hallucinations aren’t a bug - they’re a design problem.
Thinking Machines Lab Uncovers Hidden Variable Behind LLM Inconsistency : Mira Murati’s lab discovers that batch structure - not just randomness - is causing LLMs to output different results even under “deterministic” settings, and proposes kernels that fix it.
Microsoft & OpenAI Restructure Partnership With a $100B Nonprofit Stake : OpenAI’s nonprofit parent is set to hold over $100 billion in equity in a new Public Benefit Corporation, redefining its relationship with Microsoft while boosting commitments to safety and governance.
U.S. and UK Seal Historic AI-Tech Deal During Trump’s UK Visit : The U.S. and UK forge a sweeping tech alliance during Donald Trump’s UK visit: cooperation spans AI, quantum computing, semiconductors and infra-projects, flagged as a strategic move in tech geopolitics.
Genkit : A powerful open-source framework by Google for building AI apps. Unified APIs let you connect many model providers, build agents, tool-calls, RAG systems & deploy fast.
Seedream 4.0 : Next-gen image generation & editing in one model: handles complex tasks, reference consistency, 4K output, batch in/out, and styles from watercolor to cyberpunk.
AI Quests by Google : A game-based, code-free journey teaching AI literacy to 11-14 year olds: real science/health challenges, data basics, model building & ethical decisions.
Why Language Models Hallucinate : Training & evaluation push LLMs to guess rather than express uncertainty. Even perfect data won’t erase errors: abdicating abstention yields persistent hallucination rates.
Disentangling the Factors of Convergence between Brains and Computer Vision Models. : Model size, data quantity, and image type each drive brain-like alignment in DINOv3 vision models, matching human fMRI and MEG patterns spatially and temporally.
Universal Deep Research: Bring Your Own Model and Strategy : UDR lets users define research workflows strategy-first, swapping models or tools freely. Executes custom agents sans fine-tuning, yields structured progress & reproducible reports.
Get A foundational Understanding of LLMs
Join our complimentary FREE 7-day email series to explore LLM essentials - from core concepts and model design to training workflows, real-world use cases, and ethical best practices.
Sign up now and take your AI expertise to the next level!
The FTC opened a sweeping inquiry into how major chatbot makers handle user data and risks; California advanced SB 53, requiring safety frameworks and incident reporting for frontier models; activists staged a hunger strike at Anthropic and DeepMind offices over existential AI risks; and UK experts called for mandatory labelling of AI-generated content to protect public trust and curb deepfake harms.
Thank you for tuning in to this week's edition of Superintelligence Newsletter! Stay connected for more groundbreaking insights and updates on the latest in AI and superintelligence.
For more in-depth articles and expert perspectives, visit our website | Have feedback? Provide feedback.
To explore sponsorship opportunities : Explore Here
Stay curious, stay informed, and keep pushing the boundaries of what's possible!
Until Next Time!
Superintelligence Team.