AI Dependence and Mental Models in Berkeley CS Classes

Failing grades soar as professors see greater AI usage, dwindling math skills in UC Berkeley computer science classes

When the world's top computer science students start failing basic math, the problem isn't the AI. It's the erosion of the mental models required to actually use it.

Look at the numbers from UC Berkeley. In spring 2026, 6% of students in CS 61A received Fs. For a course that serves as the gateway to a degree at one of the best engineering schools on earth, that's a loud signal. These aren't students who can't handle the material. They're students who have outsourced the "thinking" part of coding to a LLM and realized too late that they've forgotten how to debug their own logic.

I've seen this cycle before with calculators and IDEs, but this feels different. We're not just automating syntax. We're automating the struggle that actually creates a programmer.

If we stop valuing the friction of learning, we're just training a generation of operators who can't fix the machine when it breaks. I want to look at where exactly these mental models are collapsing and whether we can actually build a way back.

The Signal in the Data

Failure rates in Berkeley CS courses are climbing because students are using LLMs as a crutch rather than a tool. There's a direct correlation here: as reliance on AI for homework increases, math scores are plummeting. This is a classic case of "hollow learning." Students can produce a working Python script without understanding the underlying logic, but they fail the moment they're asked to prove a theorem or solve a problem on a whiteboard.

The problem is that LLMs are great at syntax but mediocre at rigorous logic. When a student lets an AI handle the "boring" parts of a data structures assignment, they aren't just skipping the typing; they're skipping the mental struggle where actual learning happens. It's frustrating to watch because the tools are capable, but the pedagogy hasn't caught up.

If you're trying to debug a student's logic, you'll often see code that looks professional but is fundamentally broken. For example, they might use a complex library function to solve a problem that requires a basic loop, masking a total lack of understanding of time complexity.

def find_duplicates(data):
    # Using a list comprehension to check every element against every other element
    # A student might not realize this is inefficient for large datasets
    return [item for item in data if data.count(item) > 1]

This part is genuinely confusing: we're seeing a gap where students can build complex systems but can't explain how a pointer works. It's a weird, fragmented kind of intelligence. We're trading deep conceptual foundations for the ability to prompt a model into a plausible answer.

The Math-Code Paradox

Prompting isn't algorithmic thinking. When you ask an LLM to "write a function that sorts a list," you're not solving a problem; you're requesting a pattern match. The danger is that this creates a feedback loop where developers stop learning the foundational math—like Big O notation or linear algebra—because the AI provides a working snippet in two seconds. It's a bit like using a calculator before you understand how multiplication works. You get the right answer, but you have no idea why it's right or when it'll fail.

This part is genuinely confusing for a lot of junior devs because the code works. If the output passes the test cases, it feels like the math doesn't matter. But that's a lie. If you don't understand the time complexity of the suggested approach, you're just importing technical debt. A prompt can't tell you why a nested loop on a 100,000-item array will crash your production server; it just gives you the syntax.

def find_duplicates(items):
    dupes = []
    for i in items:
        for j in items:
            if i == j: # This logic is flawed and slow
                dupes.append(i)
    return dupes

The actual skill of engineering is knowing how to optimize that logic. Relying on a prompt to "make it faster" is a gamble. You're essentially asking a statistical model to guess a more efficient pattern without actually understanding the underlying constraints of the hardware or the data structure. It's a fragile way to build software.

The Illusion of Competence

The spike in failing grades at UC Berkeley isn't a fluke; it's a lagging indicator of what happens when students mistake fluency for mastery. LLMs are built to sound right, not to be right. When a student uses one to bypass the struggle of synthesis, they aren't just cheating—they're outsourcing the actual cognitive work that leads to learning. I think the community reaction calling for a total ban on AI in education is a bit reactive, but the core premise is sound: mimicry is the opposite of understanding.

This matters for introductory courses where the goal is to build a mental model from scratch. If you use an LLM to summarize a complex theory before you've actually wrestled with the primary text, you're essentially building a house on sand. It might look fine from the street, but it collapses the moment a professor asks a question that requires a logical leap the model didn't happen to predict in its training set.

I'm not convinced we've found a way to integrate these tools without eroding the quality of the output. We keep talking about "AI-assisted learning," but in practice, that usually means "AI-generated shortcuts."

The real question is whether we can actually design a curriculum that makes the "shortcut" more difficult than the learning itself. Or are we just waiting for the grades to bottom out further before we change how we test intelligence?

Redefining Assessment

The spike in failing grades at UC Berkeley suggests we've hit a wall with "AI-proof" assignments. For a while, the instinct was to just ban the tools or design prompts that were too specific for a bot to fake. That didn't work. Now we're seeing that when students use LLMs to bypass the struggle of learning, they aren't just cheating—they're losing the ability to perform the actual task when the bot isn't there.

I agree with the critics who say LLMs are mimicry engines, not knowledge engines. The problem isn't that the AI is "wrong" in a way that's easy to catch; it's that it produces work that looks exactly like a B+ student's effort while the student's actual understanding remains at a D level. We're moving toward a reality where the output of a student's work is almost entirely decoupled from their actual competence.

This matters for high-stakes certifications or engineering degrees, but for a general humanities elective, I'm not sure it's a crisis yet. We might just be seeing a painful correction where educators are forced to return to pen-and-paper exams in a room with no devices.

The real question is whether we can actually distinguish between a student who is using AI as a sophisticated calculator and one who is using it as a prosthetic brain. I suspect we can't, at least not at scale.

Conclusion

The Berkeley CS experiment shows that we're currently in a weird limbo. We have tools that can solve a LeetCode hard in seconds but can't explain why a specific logic gate is failing. It's a strange kind of competence—one that looks like mastery until you ask it to deviate from the pattern.

I'm still not convinced that "AI-integrated" curricula are anything more than a temporary patch while we figure out how to stop students from cheating. We can change the rubrics and move the exams back to pen and paper, but that doesn't actually solve the problem.

The real question is whether we're okay with a generation of developers who know how to prompt the answer but have no idea how to debug the hallucinations when the LLM hits a wall.

Search This Blog

Tech Radar