📰

Safety & Rules

AI safety, privacy, and regulation

📰safety-rules

From guardrails to governance: A CEO’s guide for securing agentic systems

The previous article in this series, “Rules fail at the prompt, succeed at the boundary,” focused on the first AI-orchestrated espionage campaign and the failure of prompt-level control. This article is the prescription. The question every CEO is now getting from their board is some version of: What do we do about agent risk? Across…

Jessica HammondFeb 4, 2026

📰safety-rules

Inside the marketplace powering bespoke AI deepfakes of real women

Civitai—an online marketplace for buying and selling AI-generated content, backed by the venture capital firm Andreessen Horowitz—is letting users buy custom instruction files for generating celebrity deepfakes. Some of these files were specifically designed to make pornographic images banned by the site, a new analysis has found. The study, from researchers at Stanford and Indiana…

James O'DonnellJan 30, 2026

📰safety-rules

Mercedes-Benz Unveils New S-Class Built on NVIDIA DRIVE AV, Which Enables an L4-Ready Architecture

Mercedes-Benz is marking 140 years of automotive innovation with a new S-Class built for the AI era, bringing together automotive safety and NVIDIA’s advanced autonomous driving platform to enable a level 4-ready architecture designed for trust. The new S-Class with MB.OS, which will be equipped with the NVIDIA DRIVE Hyperion architecture and full-stack NVIDIA DRIVE Read Article

Jessica SoaresJan 29, 2026

📰safety-rules

Powering tax donations with AI powered personalized recommendations

TRUSTBANK partnered with Recursive to build Choice AI using OpenAI models, delivering personalized, conversational recommendations that simplify Furusato Nozei gift discovery. A multi-agent system helps donors navigate thousands of options and find gifts that match their preferences.

OpenAIJan 27, 2026

📰safety-rules

The Adolescence of Technology: Anthropic's CEO Issues His Starkest Warning Yet

In a sweeping 20,000-word essay, Dario Amodei argues that humanity is entering the most dangerous window in AI history — and proposes a radical blueprint for surviving it. We unpack the predictions, the proposals, and the blind spots.

AI Tech Review EditorialJan 26, 2026

📰safety-rules

NVIDIA DRIVE AV Raises the Bar for Vehicle Safety as Mercedes-Benz CLA Earns Top Euro NCAP Award

AI-powered driver assistance technologies are becoming standard equipment, fundamentally changing how vehicle safety is assessed and validated. The recent recognition of the Mercedes-Benz CLA as Euro NCAP’s Best Performer of 2025 underscores this shift, as the vehicle combines traditional passive safety features with NVIDIA DRIVE AV software to achieve the highest overall safety score of Read Article

Ali KaniJan 22, 2026

📰safety-rules

Continuously hardening ChatGPT Atlas against prompt injection

OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-and-patch loop helps identify novel exploits early and harden the browser agent’s defenses as AI becomes more agentic.

OpenAIDec 22, 2025

📰safety-rules

Chemical hygiene

Following up on digital hygiene, I wanted to write up my (evolving, opinionated) guide to chemical hygiene. I keep ranting about this topic to all of my friends recently (you can tell I'm really fun at parties), so I thought it would be worth writing it up to have it all in one place/url: Water Starting out with controlling your water system, which is the easiest in terms of concrete, high confidence recommendations that in my experience still only Reverse Osmosis - the gold-standard Point of Use water filtration system, with a remineralization post filter. Ideally install an under the sink system, but fallback to countertop systems is ok. Brita and other basic filters are not good enough to adequately filter your drinking water. In addition, install a whole-home water filter (usually sediment+carbon, not Reverse Osmosis, that would be impractical), to enjoy cleaner water in your entire home, including shower, dishwasher, laundry, etc. If that's too expensive or impossible (e.g. you're renting), at least install a shower filter. Avoid drinking water from water bottles, certainly from plastic bottles but also in general. You cannot control that supply chain, both during collection but also during delivery (especially light, heat). Avoid drinking tap water, it's a lot less clean than you'd think (it is relatively poorly treated centrally and then it has to be delivered to your home through undefined pipes) and, with proper dental care, includes unnecessary and possibly mildly harmful "public health" additives especially and controversially fluoride. Example fun study: people living near golf courses (which are heavily treated with pesticides) show an increased risk for Parkinson Disease. Water is the easiest section in this entire article because it has well-understood ways to spend $/risk reduction compared to a lot more complex categories we'll see later (food especially). I would recommend contacting a company in your local area to install both a whole home filter

karpathyDec 18, 2025

📰safety-rules

Updating our Model Spec with teen protections

OpenAI is updating its Model Spec with new Under-18 Principles that define how ChatGPT should support teens with safe, age-appropriate guidance grounded in developmental science. The update strengthens guardrails, clarifies expected model behavior in higher-risk situations, and builds on our broader work to improve teen safety across ChatGPT.

OpenAIDec 18, 2025

📰safety-rules

AI literacy resources for teens and parents

OpenAI shares new AI literacy resources to help teens and parents use ChatGPT thoughtfully, safely, and with confidence. The guides include expert-vetted tips for responsible use, critical thinking, healthy boundaries, and supporting teens through emotional or sensitive topics.

OpenAIDec 18, 2025

📰safety-rules

Strengthening cyber resilience as AI capabilities advance

OpenAI is investing in stronger safeguards and defensive capabilities as AI models become more powerful in cybersecurity. We explain how we assess risk, limit misuse, and work with the security community to strengthen cyber resilience.

OpenAIDec 10, 2025

📰safety-rules

Funding grants for new research into AI and mental health

OpenAI is awarding up to $2 million in grants for research at the intersection of AI and mental health. The program supports projects that study real-world risks, benefits, and applications to improve safety and well-being.

OpenAIDec 1, 2025

📰safety-rules

Strengthening our safety ecosystem with external testing

OpenAI works with independent experts to evaluate frontier AI systems. Third-party testing strengthens safety, validates safeguards, and increases transparency in how we assess model capabilities and risks.

OpenAINov 19, 2025

📰safety-rules

gpt-oss-safeguard technical report

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using the underlying gpt-oss models as a baseline. For more information about the development and architecture of the underlying gpt-oss models, see the original gpt-oss model model card⁠.

OpenAIOct 29, 2025

📰safety-rules

Expert Council on Well-Being and AI

OpenAI’s new Expert Council on Well-Being and AI brings together leading psychologists, clinicians, and researchers to guide how ChatGPT supports emotional health, especially for teens. Learn how their insights are shaping safer, more caring AI experiences.

OpenAIOct 14, 2025

📰safety-rules

Detecting and reducing scheming in AI models

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.

OpenAISep 17, 2025

📰safety-rules

Collective alignment: public input on our Model Spec

OpenAI surveyed over 1,000 people worldwide on how AI should behave and compared their views to our Model Spec. Learn how collective alignment is shaping AI defaults to better reflect diverse human values and perspectives.

OpenAIAug 27, 2025

📰safety-rules

OpenAI and Anthropic share findings from a joint safety evaluation

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.

OpenAIAug 27, 2025

📰safety-rules

Toward understanding and preventing misalignment generalization

We study how training on incorrect responses can cause broader misalignment in language models and identify an internal feature driving this behavior—one that can be reversed with minimal fine-tuning.

OpenAIJun 18, 2025

📰safety-rules

Scaling security with responsible disclosure

OpenAI introduces its Outbound Coordinated Disclosure Policy to guide how it responsibly reports vulnerabilities in third-party software—emphasizing integrity, collaboration, and proactive security at scale.

OpenAIJun 9, 2025