July 14, 2024
How Elon Musk set Tesla on a new course for self-driving
Tesla's latest version of FSD had taught itself how to drive by processing billions of frames of video of how humans do it, Isaacson writes.

The following is adapted from Walter Isaacson’s biography “Elon Musk,” publishing Sept. 12.

On a Friday in late August of this year, Elon Musk got into his Model S at Tesla headquarters in Palo Alto, selected a random spot on his navigation screen, and let the car drive itself using its Full Self Driving technology. For 45 minutes, while listening to Mozart, he livestreamed his trip, including a pass by the home of Mark Zuckerberg, whom he had been jokingly challenging to a cage-match fight. “Perhaps I should knock on the door and make a polite enquiry of whether he would like to engage in hand-to-hand combat,” he said with a laugh before letting the car drive on.

Musk uses FSD 12 on Aug. 25, 2023.

Musk had used FSD hundreds of times before, but this drive was profoundly different, and not just because it was much smoother and more reliable. The new version he was using, FSD 12, was based on a radical new concept that he believes will not only totally transform autonomous vehicles but also be a quantum leap toward artificial general intelligence that can operate in physical real-world situations. Instead of being based on hundreds of thousands of lines of code, like all previous versions of self-driving software, this new system had taught itself how to drive by processing billions of frames of video of how humans do it, just like the new large language model chatbots train themselves to generate answers by processing billions of words of human text.

Amazingly, Musk had set Tesla on this fundamentally new approach just eight months earlier.

“It’s like ChatGPT, but for cars,” Dhaval Shroff, a young member of Tesla’s autopilot team, explained to Musk in a meeting in December. He was comparing the idea they were working on to the chatbot that had just been released by OpenAI, the lab that Musk cofounded in 2015. “We process an enormous amount of data on how real human drivers acted in a complex driving situation,” said Shroff, “and then we train a computer’s neural network to mimic that.”

Dhaval Shroff works at his desk at Tesla.

Until then, Tesla’s Autopilot system had been relying on a rules-based approach. The car’s cameras identified such things as lane markings, pedestrians, vehicles, signs and traffic signals. Then the software applied a set of rules, such as: Stop when the light is red, go when it’s green, stay in the middle of the lane markers, proceed through an intersection only when there are no cars coming fast enough to hit you, and so on. Tesla’s engineers manually wrote and updated hundreds of thousands of lines of C++ code to apply these rules to complex situations.

The “neural network planner” that Shroff and others were working on took a different approach. “Instead of determining the proper path of the car based on rules,” Shroff says, “we determine the car’s proper path by relying on a neural network that learns from millions of examples of what humans have done.” In other words, it’s human imitation. Faced with a situation, the neural network chooses a path based on what humans have done in thousands of similar situations. It’s like the way humans learn to speak and drive and play chess and eat spaghetti and do almost everything else; we might be given a set of rules to follow, but mainly we pick up the skills by observing how other people do them. It was the approach to machine learning envisioned by Alan Turing in his 1950 paper, “Computing Machinery and Intelligence” and which exploded into public view a year ago with the release of ChatGPT.

By early 2023, the neural network planner project had analyzed 10 million clips of video collected from the cars of Tesla customers. Did that mean it would merely be as good as the average of human drivers? “No, because we only use data from humans when they handled a situation well,” Shroff explained. Human labelers, many of them based in Buffalo, New York, assessed the videos and gave them grades. Musk told them to look for things “a five-star Uber driver would do,” and those were the videos used to train the computer.

Musk regularly walked through the Autopilot workspace in Palo Alto and knelt next to the engineers for impromptu discussions. As he studied the new human-imitation approach, he had a question: Was it truly needed? Might it be a bit of overkill? One of his maxims was that you should never use a cruise missile to kill a fly; just use a flyswatter. Was using a neural network unnecessarily complicated?

Shroff showed Musk instances where a neural network planner would work better than a rules-based approach. The demo had a road littered with trash cans, fallen traffic cones, and random debris. A car guided by the neural network planner was able to skitter around the obstacles, crossing the lane lines and breaking some rules as necessary. “Here’s what happens when we move from rules-based to network-path-based,” Shroff told him. “The car will never get into a collision if you turn this thing on, even in unstructured environments.”

It was the type of leap into the future that excited Musk. “We should do a James Bond-style demonstration,” he said, “where there are bombs exploding on all sides and a UFO is falling from the sky while the car speeds through without hitting anything.”

Machine-learning systems generally need a metric that guides them as they train themselves. Musk, who liked to manage by decreeing what metrics should be paramount, gave them their lodestar: The number of miles that cars with Full Self-Driving were able to travel without a human intervening. “I want the latest data on miles per intervention to be the starting slide at each of our meetings,” he decreed.  He told them to make it like a video game where they could see their score every day. “Video games without a score are boring, so it will be motivating to watch each day as the miles per intervention increases.”

Members of the team installed massive 85-inch television monitors in their workspace that displayed in real time how many miles the FSD cars were driving on average without interventions. They put a gong near their desks, and whenever they successfully solved a problem causing an intervention, they got to bang the gong.

By mid-April 2023, it was time for Musk to try the new neural network planner. He sat in the driver’s seat next to Ashok Elluswamy, Tesla’s director of Autopilot software. Three members of the Autopilot team got in the back. As they prepared to leave the parking lot at Tesla’s Palo Alto office complex, Musk selected a location on the map for the car to go and took his hands off the wheel.

When the car turned onto the main road, the first scary challenge arose: a bicyclist was heading their way. On its own, the car yielded, just as a human would have done. 

For 25 minutes, the car drove on fast roads and neighborhood streets, handling complex turns and avoiding cyclists, pedestrians and pets. Musk never touched the wheel. Only a couple of times did he intervene by tapping the accelerator when he thought the car was being overly cautious, such as when it was too deferential at a four-way stop sign. At one point the car conducted a maneuver that he thought was better than he would have done. “Oh, wow,” he said, “even my human neural network failed here, but the car did the right thing.” He was so pleased that he started whistling Mozart’s “A Little Night Music” serenade in G major.

A frame of the livestream of Musk’s drive using FSD 12 on Aug. 25, 2023.

“Amazing work, guys,” Musk said at the end. “This is really impressive.” They all then went to the weekly meeting of the Autopilot team, where 20 guys, almost all in black T-shirts, sat around a conference table to hear the verdict. Many had not believed that the neural network project would work. Musk declared that he was now a believer and they should move their resources to push it forward.

During the discussion, Musk latched on to a key fact the team had discovered: The neural network did not work well until it had been trained on at least a million video clips. This gave Tesla a big advantage over other car and AI companies. It had a fleet of almost 2 million Teslas around the world collecting video clips every day. “We are uniquely positioned to do this,” Elluswamy said at the meeting.

Four months later, the new system was ready to replace the old approach and become the basis of FSD 12, which Tesla plans to release as soon as regulators approve. There is one problem still to overcome: human drivers, even the best, usually fudge traffic rules, and the new FSD, by design, imitates what humans do. For example, more than 95% of humans creep slowly through stop signs, rather than coming to a complete stop. The chief of the National Highway Safety Board says that the agency is currently studying whether that should be permissible for self-driving cars as well.

Walter Isaacson is a CNBC contributor and the author of biographies of Elon Musk, Jennifer Doudna, Leonardo da Vinci, Steve Jobs, Albert Einstein, Benjamin Franklin, and Henry Kissinger. He teaches history at Tulane University and was the editor of Time and the CEO of CNN.