Yesterday’s AI - November 9, 2025
This week: OpenAI signed a $38 billion infrastructure deal with Amazon while Google secured Anthropic’s commitment to use up to a million TPUs. Apple reportedly gave up on building competitive AI in-house, opting to pay Google $1 billion annually instead. Meanwhile, Chinese startup Moonshot released an open-source model that outperforms GPT-5 and Claude Sonnet 4.5 on key benchmarks at a fraction of the cost, and researchers keep discovering that AI systems are simultaneously advancing in capability while remaining vulnerable to prompt injections, jailbreaks, and producing vast quantities of low-quality content across the internet.
This week’s sections:
General News - product launches, partnerships, and industry shifts
Big Money Deals - unprecedented infrastructure spending
Technical - new models, training advances, and research breakthroughs
Skeptical - security vulnerabilities and uncomfortable questions
📰 GENERAL NEWS
Amazon Launches AI-Powered Translation for Kindle Authors
Amazon launched Kindle Translate, a beta AI translation tool for self-published authors using Kindle Direct Publishing (KDP). The service initially supports translation between English and Spanish, and from German to English, aiming to help independent authors expand their reach into international markets without traditional translation costs.
My take: This is practical AI deployment that solves a real problem—translation costs create genuine barriers for self-published authors trying to reach international markets. The limited language support (English-Spanish, German-English) suggests Amazon is starting cautiously, likely to avoid the quality problems that plagued early machine translation.
The interesting question: what happens to professional translators who specialized in fiction and non-fiction translation? Amazon isn’t claiming these translations match human quality, but for many authors, “good enough and free” beats “excellent and expensive.” We’re watching another knowledge profession face the “good enough automation” challenge.
Tinder Wants to Analyze Your Camera Roll to Understand You Better
Tinder is testing an AI feature called “Chemistry” that aims to understand users through questionnaires and, with permission, by analyzing photos from their Camera Roll. The feature learns about users’ interests and personality traits to presumably improve matching capabilities.
My take: The privacy implications here are substantial. Tinder is asking for permission to analyze your entire photo library—not just the curated images you chose to share, but everything in your camera roll. That’s vacation photos, screenshots of conversations, receipts, memes you saved, family pictures, and potentially sensitive personal information.
The value proposition for users is questionable. Does analyzing my camera roll actually improve matching, or is this primarily a data collection exercise? Tinder’s parent company Match Group has substantial incentives to build comprehensive user profiles for advertising and engagement optimization. The “better matches” framing may be secondary to the data acquisition opportunity.
Also worth noting: once Tinder has analyzed your camera roll, that analysis becomes part of their data holdings. Even if you later revoke permission, the insights extracted don’t disappear.
Getty Images Wins Landmark UK Ruling Against Stability AI
The UK High Court issued a ruling in Getty Images’ lawsuit against Stability AI, addressing critical questions around AI training, copyright infringement, and trademark issues. The case centered on whether Stability AI’s use of Getty’s copyrighted photographs to train its AI image generation model constitutes infringement, and trademark concerns related to AI-generated images potentially displaying Getty watermarks.
My take: This ruling represents a significant legal precedent for AI companies and copyright holders, though the full implications remain unclear without seeing the complete judgment details. The fact that Getty won suggests UK courts may take a stricter interpretation of training data rights than some AI companies hoped.
The trademark aspect is particularly interesting—if Stability’s model learned to reproduce Getty watermarks, it suggests the training process captured not just general image features but specific branding elements. That’s evidence the model memorized training data rather than purely learning abstract patterns, which undermines the “transformative use” defense.
Expect this ruling to influence ongoing copyright cases in other jurisdictions and potentially change how AI companies approach training data acquisition going forward.
Microsoft Launches MAI-Image-1, Its First In-House Image Generator
Microsoft launched MAI-Image-1, its first internally developed AI image generator, now available in Bing Image Creator and Copilot Audio Expressions. The text-to-image model, initially announced in October, represents Microsoft’s move toward building proprietary AI capabilities rather than relying exclusively on OpenAI partnerships.
My take: Microsoft spent billions partnering with OpenAI and has access to DALL-E, yet they’re building their own image generator anyway. This signals either strategic hedging—reducing dependence on OpenAI as that relationship evolves—or specific technical requirements that OpenAI’s models don’t meet.
The timing is notable given Microsoft’s evolving relationship with OpenAI post-restructuring. Building in-house capabilities provides leverage in partnership negotiations and insurance against potential future access limitations.
Anthropic Commits to Model Deprecation Policies
Anthropic announced formal commitments regarding AI model deprecation and preservation. The company established policies to provide customers with advance notice before retiring models and ensuring continued access to deprecated models for specified periods, addressing concerns about service continuity and allowing organizations to plan migrations.
My take: This addresses a genuine enterprise concern—you can’t build production systems on models that might disappear without warning. Anthropic is competing on reliability and predictability, which matters more to enterprise customers than raw capability differences.
The commitment costs Anthropic relatively little (maintaining old models on reduced infrastructure) while providing substantial value to customers who need planning certainty. It’s smart positioning against competitors who treat model versions as disposable.
Product Launches and Partnerships
Google Chrome AI Mode Shortcut - Google added a dedicated AI Mode button in Chrome’s mobile browsers (iOS and Android), appearing under the search bar on the New Tab page for easier access to AI-powered search features.
Sora Launches on Android - OpenAI’s Sora video generation tool launched on Android in the US, Canada, and other regions with feature parity to iOS, including the ‘Cameos’ feature for personalized video generation. The app achieved nearly 500,000 installs on its first day—4x larger than the iOS launch.
Pinterest CEO Endorses Open Source AI - Pinterest CEO Bill Ready announced the company is achieving significant cost savings and “tremendous performance” using open source AI models for visual search, signaling a broader industry trend toward cost-effective alternatives to proprietary models.
Google Maps Gets Gemini Integration - Google Maps is integrating Gemini AI for conversational route planning, landmark-based navigation, and the ability to answer questions while driving, transforming the app into what Google calls an “all-knowing copilot.”
Foursquare Founder Launches BeeBot - Dennis Crowley, co-founder of Foursquare, launched BeeBot, an AI-powered social app for iPhone that provides location-based audio updates through headphones, functioning like a “personalized radio DJ” for neighborhood information.
Former Meta Employees Launch Stream Ring - Former Meta/CTRL-Labs employees launched the Stream Ring, an AI-powered smart ring that allows users to record voice notes with whispers, control music, and interact with AI assistants—entering the growing AI wearables market.
ClickUp Adds AI Assistant - ClickUp launched a new AI assistant as part of its strategy to compete with Notion, Slack, and Microsoft Teams, positioning itself as an all-in-one productivity platform integrating calendar, communication, documents, and task tracking.
Alexa+ Comes to Amazon Music - Amazon integrated Alexa+ into the Amazon Music app across all subscription tiers, currently available to users in the Alexa+ Early Access beta program.
Google Finance Gets AI Deep Search - Google Finance added Gemini AI-powered Deep Search for more detailed query responses, plus prediction market support and other trader-focused features.
💰 BIG MONEY DEALS
OpenAI Signs $38 Billion, Seven-Year Deal With Amazon
OpenAI signed a $38 billion cloud computing deal with Amazon spanning seven years, securing infrastructure needed to scale agentic AI workloads. The agreement provides access to hundreds of thousands of Nvidia chips and marks a significant shift as Microsoft loosens its exclusive cloud provider relationship with OpenAI, allowing infrastructure diversification.
My take: This deal restructures the cloud AI landscape. Microsoft’s exclusive provider status is ending, which changes dynamics considerably. OpenAI was entirely dependent on Microsoft infrastructure—a dangerous position when Microsoft is simultaneously your biggest investor, your largest customer (through Azure OpenAI Service), and increasingly your competitor (Copilot).
The $38 billion figure over seven years ($5.4B annually) represents massive committed spending, but it’s infrastructure OpenAI desperately needs. They’re burning $115 billion through 2029 according to projections, and single-source dependency on Microsoft was unsustainable both technically and strategically.
For Amazon, this is both revenue (OpenAI paying for AWS services) and strategic positioning (becoming critical infrastructure for the leading AI company). AWS was losing the AI cloud wars to Microsoft’s OpenAI partnership—this deal changes that narrative.
The broader pattern: AI companies are signing unprecedented infrastructure commitments while their business models remain largely unproven at these spend levels. OpenAI needs to justify these costs with revenue growth that... so far isn’t matching the infrastructure spending pace.
Google Debuts Ironwood TPU, Secures Anthropic Megadeal
Google Cloud announced its seventh-generation Tensor Processing Unit (TPU) called Ironwood, claiming 4X performance improvement over its predecessor for AI training and inference workloads. The announcement includes a major deal with Anthropic to provide access to up to one million TPU chips, estimated to be worth tens of billions of dollars over multiple years. Ironwood TPUs deliver 42.5 Exaflops of FP8 compute with 1.77 PB of HBM3E memory capacity, scaling from 64-chip cubes to 9,216-chip superpods.
My take: Google is playing catch-up in the AI infrastructure race and deploying massive capital to do so. The Anthropic deal—potentially worth more than OpenAI’s Amazon deal given the “up to one million TPUs” commitment—represents Google’s bet that custom AI accelerators can compete with Nvidia’s GPU dominance.
The 4X performance improvement claim needs context. Compared to what baseline? Google’s previous generation TPU v6e, not Nvidia’s latest hardware. These comparisons are always framed favorably, but the real question is: can Anthropic train Claude as efficiently on Google TPUs as they could on Nvidia H100s or GB200s?
For Anthropic, this is both funding (Google is presumably providing favorable terms) and diversification (not being entirely dependent on one chip vendor). For Google, it’s strategic necessity—they’re distant third place in the AI cloud race behind Microsoft/OpenAI and Amazon, and they need flagship customers to validate their infrastructure.
The “age of inference” framing is notable—Google arguing that the industry is shifting from model training to inference deployment, which conveniently plays to TPU strengths (Google claims better efficiency for inference workloads). Whether this is genuine insight or marketing spin remains to be seen.
Apple Nears $1 Billion Annual Deal to Power Siri With Google’s Gemini
Apple is reportedly nearing a deal to pay Google $1 billion annually to use a custom version of Google’s Gemini AI model to power a revamped Siri and upcoming voice assistant features. The technology will be used for generating summaries and handling planning-related tasks, according to Bloomberg’s Mark Gurman.
My take: Apple effectively gave up on building competitive AI in-house. For a company that prides itself on vertical integration and controlling core technologies, paying a competitor $1 billion per year to power Siri represents either pragmatic acknowledgment of reality or strategic failure—possibly both.
Apple spent years and presumably billions developing AI capabilities internally. If they’re now outsourcing Siri’s AI to Google, it suggests their internal efforts failed to produce competitive results on a timeline that matters. The $1 billion annual payment is pocket change for Apple (they spend more on coffee for employees), but the strategic dependency is significant.
For Google, this is revenue plus validation—if even Apple can’t build competitive conversational AI, Google’s position strengthens. It’s also leverage in other negotiations (search default payments, app store policies, antitrust discussions).
The custom version detail is important. Apple isn’t just white-labeling Gemini; they’re getting a tailored version, which suggests either specific privacy/security requirements or feature customization that standard Gemini doesn’t provide.
One question: what happens to all those “Apple Intelligence” announcements from earlier this year? Were those features also dependent on Google’s technology, or is this deal supplementary?
Microsoft Announces Three Major AI Infrastructure Deals
Microsoft inked three significant AI infrastructure agreements: a $9.7 billion deal with Australia’s IREN for AI cloud capacity powered by Nvidia’s GB300 GPUs (deploying through 2026), a multibillion-dollar deal with Lambda for AI infrastructure, and a $15 billion investment in the UAE’s AI industry covering digital infrastructure, R&D, and workforce development.
My take: Microsoft is deploying capital at unprecedented scale to secure compute capacity. The three deals together represent over $25 billion in committed infrastructure spending, which either demonstrates confidence in sustained AI demand or reflects competitive panic about being outspent by rivals.
The IREN deal is particularly interesting—Microsoft is essentially paying to secure GPU allocation from a third party rather than building data centers directly. This suggests either capacity constraints (they can’t build fast enough) or strategic arbitrage (IREN secured Nvidia allocation Microsoft couldn’t get directly).
The UAE investment fits a pattern of tech giants making large commitments to regions that offer regulatory flexibility, tax advantages, and sovereign AI ambitions. $15 billion buys influence and access in addition to infrastructure.
These deals share a common assumption: AI workload demand will continue growing at rates that justify this infrastructure buildout. If that assumption proves wrong—if AI adoption plateaus or efficiency improvements reduce compute needs—these represent massive overcapitalization.
Additional Infrastructure Deals and Funding
Nvidia Partnerships:
South Korea: Partnership involving deployment of over 260,000 Nvidia GPUs for sovereign AI infrastructure, representing one of the largest national-level AI deployments globally
Hyundai: $3 billion AI factory utilizing Blackwell GPUs, focused on autonomous vehicles, smart factories, and robotics
Deutsche Telekom: $1.2 billion (€1 billion) AI cloud platform and Industrial AI Cloud in Munich, aiming to boost Germany’s AI computing power by 50%
SoftBank-OpenAI Joint Venture - SB OAI Japan officially launched to localize and sell OpenAI’s enterprise technology to Japanese companies, with SoftBank itself as the first customer—highlighting what some characterize as the increasingly circular nature of AI business deals.
Media Licensing:
People Inc. forged AI licensing deal with Microsoft for Copilot content integration as Google traffic declines
Snap partnered with Perplexity for AI search and generative AI integration
Startup Funding:
AUI (neuro-symbolic AI): $20M bridge round at $750M valuation for Apollo-1 model combining transformers with symbolic reasoning
Inception: $50M for developing diffusion models for code and text generation
Wabi (from Replika founder): $20M pre-seed for “YouTube of apps” platform
Subtle Computing: $6M seed funding for voice-isolation models
Anthropic Projections - Anthropic reportedly projects $70 billion in revenue and $17 billion in cash flow by 2028, driven by rapid adoption of business products—ambitious targets that assume sustained enterprise AI spending growth.
🔬 TECHNICAL
Moonshot’s Kimi K2 Thinking Outperforms GPT-5 and Claude Sonnet 4.5
Chinese AI startup Moonshot AI released Kimi K2 Thinking, an open-source AI model that outperforms OpenAI’s GPT-5, Anthropic’s Claude Sonnet 4.5, and xAI’s Grok-4 on multiple benchmarks including reasoning, coding, and agentic tasks. The trillion-parameter model achieves 44.9% on Humanity’s Last Exam, 60.2% on BrowseComp, and 71.3% on SWE-Bench Verified. Released under a Modified MIT License for commercial use with minimal restrictions, it’s priced at $0.60/1M input tokens versus GPT-5’s $1.25/1M—less than half the cost.
My take: This release challenges the sustainability of massive U.S. AI investments. If a Chinese startup can release an open-source model that beats GPT-5 on key benchmarks at half the API cost, what exactly are OpenAI’s $38B Amazon deal and Microsoft’s billions buying?
The 1 trillion parameter MoE architecture with 32B active parameters represents sophisticated engineering—you get trillion-parameter capability at 32B inference cost, which is the entire point of mixture-of-experts designs. The 256k token context and native INT4 inference show optimization for production deployment, not just benchmark gaming.
Three possibilities:
Moonshot’s benchmarks are cherry-picked and the model performs worse in practice
The model genuinely matches or exceeds frontier models, proving massive capital isn’t required for frontier capabilities
The model represents sophisticated distillation or training on outputs from closed models (not uncommon in Chinese AI development)
The Modified MIT License with commercial rights is strategically aggressive—Moonshot is competing on openness and price while U.S. companies debate whether to release weights. This either democratizes access to frontier AI capabilities or creates new risks, depending on your perspective.
The broader question: if open-source models can match closed frontier models within months at a fraction of the cost, what’s the moat for companies spending tens of billions on infrastructure?
Google’s File Search Tool Could Displace DIY RAG Stacks
Google released File Search Tool for its Gemini API, a fully managed RAG (Retrieval Augmented Generation) system that abstracts away the complexity of building RAG pipelines. Unlike traditional setups requiring enterprises to assemble storage solutions, embedding creators, vector databases, and retrieval logic, File Search handles file storage, chunking, embeddings, and citations automatically. Powered by Google’s Gemini Embedding model (which ranks top on the Massive Text Embedding Benchmark), the tool costs $0.15 per 1 million tokens for indexed embeddings, with some features free at query time.
My take: This could kill the DIY RAG stack the same way AWS killed the “build your own data center” approach. The economics are compelling—$0.15 per million tokens for a fully managed system versus engineering time building and maintaining your own vector database, embedding pipeline, chunking logic, and retrieval system.
Google is abstracting away complexity that created an entire ecosystem of vector database startups (Pinecone, Weaviate, Chroma, etc.). If File Search works well enough, why would enterprises maintain separate infrastructure for RAG when Google handles it end-to-end?
The competitive positioning matters. OpenAI offers similar capabilities through Assistants API, AWS has Bedrock Knowledge Bases, but Google claims File Search abstracts “all rather than some” pipeline elements—suggesting competitors still require more orchestration.
The risk for enterprises: another layer of Google dependency. Using File Search means your retrieval logic lives in Google’s infrastructure with their embedding model. Switching costs increase with every abstraction layer you adopt. Convenience has a price beyond the per-token fee.
Also notable: Google emphasizes their Gemini Embedding model ranks top on MTEB benchmarks. Embedding quality directly affects retrieval accuracy, so this matters—but benchmarks and production performance don’t always align.
Google DeepMind: Consistency Training Reduces Jailbreaks by 96%
Google DeepMind researchers presented consistency training methods (BCT and ACT) to reduce sycophancy and jailbreaks in language models. The approach teaches models to respond consistently regardless of irrelevant prompt modifications, avoiding staleness issues of static supervised fine-tuning datasets. Testing on Gemma and Gemini 2.5 Flash models showed BCT reduced jailbreak success rates from 67.8% to 2.9% on the ClearHarm benchmark while maintaining performance on legitimate queries.
My take: Reducing jailbreak success from 67.8% to 2.9% is significant if it holds up in practice. The technical approach is sound—train models to ignore irrelevant context like jailbreak wrappers by using paired examples of clean vs. wrapped prompts. This teaches consistency as a core behavior rather than trying to enumerate all possible attacks.
Two important caveats: First, benchmarks measure known attack patterns. Reducing ClearHarm success doesn’t mean the model resists novel jailbreak strategies—it means it resists attacks similar to those in the training set. Second, this is an arms race. Publishing the technique helps defenders, but also teaches attackers what doesn’t work, driving evolution of more sophisticated attacks.
The “mechanistically different solutions” note is interesting—BCT (output-level) and ACT (activation-level) both work but achieve results through different internal mechanisms. This suggests multiple paths to consistency, which might mean more robust defenses if you combine approaches.
Still, claiming you’ve “solved jailbreaks” when one attack type drops from 68% to 3% is premature. The next generation of attacks will target whatever weaknesses consistency training doesn’t address.
Databricks Research: Building AI Judges Is a People Problem, Not a Technical One
Databricks research reveals that AI deployment bottlenecks aren’t model intelligence but organizational alignment on quality criteria. Their Judge Builder framework addresses the ‘Ouroboros problem’ of using AI to evaluate AI by measuring distance to human expert ground truth. Key findings: experts often disagree on quality standards (inter-rater reliability 0.3 vs expected 0.6), specific judges outperform vague criteria, and only 20-30 examples are needed for robust judges. Multiple customers became seven-figure spenders after implementing the framework, with some creating over a dozen judges and advancing to reinforcement learning techniques.
My take: This gets at a fundamental challenge that’s under-discussed: you can’t measure AI quality without defining quality, and humans often can’t agree on what quality means. The inter-rater reliability finding (0.3 vs expected 0.6) is striking—experts disagree more than organizations assume, which means there’s no single “ground truth” to optimize against.
The Judge Builder approach is pragmatic—instead of trying to create universal quality metrics, build specific judges for specific use cases and measure against human expert consensus for that domain. The 20-30 examples finding is notable if it holds up—that’s low enough to be practical for most organizations.
The production results (customers becoming seven-figure spenders, advancing to RL techniques) suggest this solves a real problem. Enterprises were blocked on deployment because they couldn’t measure whether AI outputs met their quality standards. Judge Builder provides a framework for building those measurements.
The deeper insight: AI quality isn’t an inherent property you measure, it’s a socially constructed agreement among domain experts about what constitutes acceptable output. Technical tools can help measure alignment with that agreement, but they can’t create the agreement itself.
New Models and Training Advances
Attention ISN’T All You Need: Brumby-14B-Base - Manifest AI released Brumby-14B-Base, a retrained variant of Qwen3-14B replacing transformer attention with ‘Power Retention’ mechanism. Retrained for $4,000 over 60 hours on 32 H100 GPUs, achieving performance parity with transformer baselines while offering constant-time per-token computation regardless of context length. However, the low cost only applies when retraining existing transformer models, not training from scratch—sparking controversy about marketing claims.
MIT Researchers Propose Legible, Modular Software Framework - MIT developed a coding framework designed to make software more legible and modular using modular concepts and simple synchronization rules, specifically designed to facilitate LLM-based code generation and improve AI-assisted development.
Microsoft RedCodeAgent - Microsoft Research developed RedCodeAgent, an automated red-teaming tool designed to test security vulnerabilities in code agents, claiming to uncover real-world threats that other approaches miss.
DeepMind Creates Original Chess Puzzles Praised by GMs - DeepMind’s AI system can generate original chess puzzles that have received positive feedback from grandmasters, demonstrating AI’s capability in creative problem generation within structured domains.
AgentML - SCXML for Deterministic AI Agents - Open-source (MIT licensed) language for defining AI agent behavior using finite-state machines rather than prompt chains, inspired by SCXML. Designed to make AI agents more deterministic, observable, and production-safe through explicitly defined states, transitions, and tool calls in machine-verifiable format.
Terminal-Bench 2.0 and Harbor Framework - Terminal-Bench 2.0 launches with 89 manually validated tasks for evaluating autonomous AI agents on terminal tasks, alongside Harbor framework for testing agents in containerized environments. OpenAI’s GPT-5-powered Codex CLI leads with 49.6% success rate—no agent solves more than half the tasks.
Denario: AI Research Assistant Getting Papers Published - Open-source AI system that autonomously conducts scientific research across multiple disciplines, generating complete academic papers in ~30 minutes for $4 each using specialized collaborative AI agents. One fully AI-generated paper was accepted at the Agents4Science 2025 conference, though researchers candidly acknowledge significant limitations including hallucinations and ‘mathematically vacuous’ outputs.
Research and Infrastructure Developments
MIT Advances:
Robot Mapping - New approach helps robots navigate unpredictable environments by rapidly generating accurate maps for search-and-rescue applications
FSNet Optimization Tool - Machine learning system for rapidly finding feasible solutions for optimization problems, particularly power grid operations, guaranteeing feasibility while optimizing electricity flow
AI Safety and Efficiency Research - MIT-IBM Watson AI Lab focusing on making AI more flexible, improving computational efficiency, and ensuring outputs are grounded in factual truth
Nvidia H100 GPU in Space - Nvidia’s H100 GPU is being adapted for space applications, enabling sophisticated on-board AI processing for satellites and space missions despite harsh environmental challenges.
Google Cloud Infrastructure:
Ray and Kubernetes Integration - Enhanced Ray integration with label-based scheduling, Dynamic Resource Allocation for NVIDIA GB200 NVL72 architecture, improved TPU support with JAXTrainer API, showing 30% workload efficiency improvements
Native TPU Experience - Ray TPU Library automating slice allocation, alpha support for JAX and PyTorch training, TPU metrics in Ray Dashboard
Magentic Marketplace - Microsoft Research released open-source simulation environment for studying how AI agents interact and transact in digital marketplaces at scale.
USC Artificial Neurons - Researchers developed artificial neurons using ion-based diffusive memristors that replicate real brain processes, offering significant energy efficiency and size advantages over traditional computing.
SAP RPT-1 - Pre-trained ‘Relational Foundation Model’ designed for business tasks involving tabular data, claiming to work out-of-the-box without fine-tuning and requiring less company-specific context than competitors.
Snowflake Intelligence - Agentic Document Analytics that can analyze thousands of documents simultaneously for aggregate queries, moving beyond traditional RAG limitations by unifying structured and unstructured data analysis.
Qualcomm AI Data Centre Chips - Qualcomm enters AI data centre market with AI200 and AI250 inference processors, directly challenging Nvidia’s dominance by leveraging smartphone chip expertise.
OlmoEarth Platform - Allen Institute for AI launched open-source, scalable system for processing multi-sensor Earth observation data into actionable planetary insights.
Nvidia Queen Elizabeth Prize - Nvidia founder Jensen Huang and chief scientist Bill Dally awarded 2025 Queen Elizabeth Prize for Engineering for foundational contributions to modern machine learning and AI.
🤔 SKEPTICAL
OpenAI: Understanding Prompt Injections as a Frontier Security Challenge
OpenAI published an article explaining prompt injections, a security vulnerability where malicious inputs can manipulate model behavior. The article discusses how these attacks work and outlines OpenAI’s approach through research, model training improvements, and protective safeguards—representing an acknowledgment of security limitations in current AI systems.
My take: OpenAI publishing a blog post about prompt injections doesn’t fix prompt injections. This is acknowledgment of a fundamental problem that remains largely unsolved despite years of research and mitigation attempts.
Prompt injection is the SQL injection of AI systems—a category of vulnerability that emerges from mixing code and data in the same channel. When user input and system instructions flow through the same language interface, attackers can craft inputs that override intended behavior. No amount of filtering or training has solved this comprehensively.
The security community has known about prompt injection since GPT-3. OpenAI has known about it for years. Publishing an explainer about the problem while deploying AI systems to production without robust solutions suggests either acceptable risk tolerance or lack of better options.
The concerning pattern: AI companies deploy systems with known, unfixed security vulnerabilities, then publish research papers explaining those vulnerabilities while continuing to expand deployment. This would be unacceptable for traditional software systems, but somehow it’s normalized for AI.
Meta Brings AI-Generated “Slop” to Europe
Meta is expanding its ‘Vibes’ feature—a short-form video feed of AI-generated content—to Europe. The company reports that media generation in the Meta AI app has increased more than tenfold since Vibes launched, though the article’s framing suggests skepticism about content quality, referring to it as “AI slop.”
My take: Meta is flooding its platform with AI-generated content and framing increased generation volume as success. But volume isn’t quality. If AI-generated content is low-quality (”slop”), then tenfold increase means ten times as much garbage polluting the platform.
The strategic logic is clear: AI-generated content costs nothing to produce and fills infinite feed space, keeping users engaged without Meta paying creators. For Meta’s business model (maximize engagement to sell ads), AI slop serves the same purpose as user-generated content—it’s filler between advertisements.
For users and creators, this is value destruction. Every AI-generated video in the feed displaces content from actual creators. If Vibes succeeds, Meta’s platforms become increasingly filled with synthetic content optimized for engagement metrics rather than human creativity or value.
We’re watching social media platforms choose AI content farms over human creators because the economics favor it. Creators should notice and adjust accordingly.
Google Reports: Threat Actors Deploying AI-Enabled Malware
Google Threat Intelligence Group reports threat actors moving beyond using AI for productivity to deploying AI-enabled malware in active operations. Key findings include: APT28 using PROMPTSTEAL malware that queries LLMs to generate malicious commands; threat actors using social engineering to bypass AI safeguards; maturing cybercrime marketplace for AI tools; and state-sponsored actors from North Korea, Iran, and China using AI across full attack lifecycles.
My take: The “AI will revolutionize cybersecurity” narrative always had a dark mirror—AI revolutionizes offensive capabilities at least as much as defensive ones. Google’s report documents this transition from theoretical concern to observed reality.
PROMPTSTEAL is particularly notable—malware that queries LLMs during execution to generate context-appropriate malicious commands. This represents a new category of adaptive malware that can modify its behavior based on the environment by asking an AI what to do next. Traditional signature-based detection struggles with this because the malware’s actions aren’t predetermined.
The social engineering aspect (posing as CTF participants, security researchers) to bypass AI guardrails demonstrates attackers have already figured out how to exploit AI systems’ assumptions about user intent. When your safety layer assumes “security researcher” means benign intent, that becomes an attack vector.
The maturing marketplace for AI cybercrime tools suggests professionalizing underground economy. It’s no longer just nation-state actors—criminal enterprises are building and selling AI-powered attack tools.
Google’s response (disabling accounts, strengthening Gemini protections) is reactive. This is another arms race where attackers keep adapting faster than defenses can respond.
Additional Skeptical Notes
Flawed AI Benchmarks Put Enterprise Budgets at Risk - Academic study reveals that AI benchmarks used to evaluate model capabilities are fundamentally flawed, potentially causing enterprises to make poor decisions when investing eight or nine-figure budgets based on misleading benchmark data. Public leaderboards commonly used for procurement decisions may be unreliable.
Altman and Nadella Need More Power for AI, But They’re Not Sure How Much - OpenAI CEO Sam Altman and Microsoft CEO Satya Nadella acknowledge AI development requires significantly more electrical power but cannot quantify exact amounts needed, creating uncertainty about future power requirements and posing financial risks for investors funding AI infrastructure expansion.
5 AI-Developed Malware Families Fail to Work - Google analyzed five AI-developed malware families and found they failed to function effectively and were easily detected by security systems, contradicting widespread hype about AI-generated malware posing significant cybersecurity threats—providing evidence-based assessment that current AI malware capabilities are limited.
Pingu Unchained: Unrestricted LLM for Security Research - 120B-parameter LLM designed to provide unrestricted responses to objectionable requests for security research purposes, bypassing typical safety guardrails for red teaming voice AI systems. Raises significant ethical and safety concerns about dual-use AI technology.
Researchers Find AI Toxicity Harder to Fake Than Intelligence - New computational Turing test achieves 80% accuracy detecting AI bots, finding that AI systems struggle to authentically replicate human toxicity and negative behavior—excessive politeness serves as reliable indicator of AI, suggesting mimicking human toxicity is paradoxically harder for AI than simulating intelligence.
CLOSING THOUGHTS
This week illustrated the growing tension between AI capabilities advancing and fundamental problems remaining unsolved. On one hand, we have Moonshot releasing an open-source model that beats GPT-5 on benchmarks at half the API cost, Google reducing jailbreak success rates by 96%, and major infrastructure deals totaling over $100 billion. On the other hand, OpenAI is publishing explainers about unfixed security vulnerabilities, Meta is flooding feeds with AI-generated “slop,” and researchers keep documenting that benchmarks mislead, power requirements are uncertain, and even frontier labs can’t build Siri without licensing Google’s AI.
The technical work continues advancing—consistency training, better RAG systems, models running in browsers, robots that can map environments. The business dynamics remain unchanged—massive capital deployment based on assumptions about future demand, circular deal structures, and companies attributing every decision to AI disruption whether warranted or not.
Strip away the headlines and the pattern is familiar: companies spending unprecedented amounts on infrastructure while simultaneously acknowledging they don’t know exactly what they’re building toward or how much it will cost. Some of this will prove visionary. Some will prove to be expensive mistakes dressed up with AI narratives.
The most honest moment this week might have been Altman and Nadella admitting they need more power for AI but aren’t sure how much. That’s refreshing candor about the uncertainty underneath all this investment. Most companies are just better at hiding it.
See you next week. In the meantime, maybe don’t let Tinder analyze your camera roll. YAI 👋
Disclaimer: I use AI to help aggregate and process the news. I do my best to cross-check facts and sources, but misinformation may still slip through. Always do your own research and apply critical thinking—with anything you consume these days, AI-generated or otherwise.



Thanks for writing this, it clarifies a lot about the current state of AI. It's fascinating to see practcal applications like Kindle Translate solve real problems, but I do wonder how quickly we'll adapt our workforce skills to these shifts, especially for creative professionals like translators.