February 21, 2025
Microsoft’s Latest Muse AI Model Can Generate Game Environments
Microsoft researchers introduced a new artificial intelligence (AI) model on Wednesday that can generate 3D gameplay environments. Dubbed the World and Human Action Model (WHAM) or Muse, the new AI model was developed by the tech giant’s Research Game Intelligence and Teachable AI Experiences (Tai X) teams in collaboration with Xbox Games Studios’ Ninja Theory.

Microsoft researchers introduced a new artificial intelligence (AI) model on Wednesday that can generate 3D gameplay environments. Dubbed the World and Human Action Model (WHAM) or Muse, the new AI model was developed by the tech giant’s Research Game Intelligence and Teachable AI Experiences (Tai X) teams in collaboration with Xbox Games Studios’ Ninja Theory. The company said that the large language model (LLM) can help game designers in the ideation process, as well as help generate game visuals and controller actions to help creatives in game development.

Microsoft Unveils Muse AI Model

In a blog post, the Redmond-based tech giant detailed the Muse AI model. This is a research product currently, although the company said that it is open-sourcing the weights and sample data of the model for the WHAM Demonstrator (a concept prototype of a visual interface to interact with the AI model). Developers can try out the model on Azure AI Foundry. A paper detailing the technical aspects of the model is published in the Nature journal.

To train a model on such a complex area is a difficult proposition. Microsoft researchers collected a large amount of human gameplay data from the 2020 game Bleeding Edge, a game published by Ninja Theory. The LLM was trained on a billion image action pairs, which is equivalent to seven years of human gameplay. The data is said to be collected ethically and is used only for research purposes.

The researchers said that scaling up the model training was a major challenge. Initially, Muse was trained on a cluster of Nvidia V100 GPUs, but then it was scaled to multiple Nvidia H100 GPUs.

Coming to the functionality, the Muse AI model accepts text prompts as well as visual inputs. Additionally, once a game environment is generated, it can be further enhanced using controller actions. The AI responds to the movements made by the user to render new environments aligned with the initial prompt, and consistent with the rest of the gameplay.

Due to being a unique AI model, typical benchmark tests cannot properly evaluate its capabilities. The researchers highlighted that they have internally tested the LLM on metrics such as consistency, diversity, and persistence. Since it is a research-focused model, the outputs have been limited to just 300x180p resolution.

Affiliate links may be automatically generated – see our ethics statement for details.