Develop Web Game: Iterative Game Development with Automated Testing
The develop web game skill from OpenAI enables AI assistants to build HTML/JavaScript games through small, validated iterations. Rather than implementing complete games blindly, this skill establishes a tight feedback loop: implement small changes, run Playwright-based tests with short input bursts and intentional pauses, inspect screenshots and text state, review console errors, and adjust—creating a disciplined development process that prevents broken builds.
What This Skill Does
This skill provides a complete game development workflow treating each iteration as implement → act → pause → observe → adjust. It defines integration points games must provide (single canvas, window.render_game_to_text for state inspection, window.advanceTime(ms) for deterministic stepping), runs automated Playwright tests executing action sequences, captures screenshots after each burst, extracts game state as JSON, and reviews console errors.
The Playwright test script (web_game_playwright_client.js) drives the testing loop. It loads games, optionally clicks start buttons, executes action payloads defining keyboard/mouse inputs per frame, advances time deterministically using the game's step hook, captures screenshots showing visual state, reads JSON state from render_game_to_text, and buffers console errors for review.
Action payloads define test scenarios as sequences of steps—each step specifies buttons pressed (keyboard keys or mouse buttons), frame count to hold those inputs, and optional mouse coordinates. Example: press right arrow for 8 frames, release for 6 frames, press space for 4 frames. This creates reproducible test scenarios validating specific game behaviors.
Getting Started
The skill requires Playwright availability. Projects with local Playwright dependencies use those; otherwise, the skill checks for npx and installs Playwright globally if needed. It prefers @playwright/mcp (MCP integration) over @playwright/test unless explicitly requested.
Before implementation, initialize or read progress.md. This file tracks the original user prompt (preserved at top as "Original prompt:"), TODOs and suggestions from previous agents (if continuing work), and notes about decisions made, bugs found, and features added. This enables seamless handoffs between agents working on the same game.
Games must provide three integration points for testing to work. First, a single canvas element for rendering. Second, window.render_game_to_text function returning concise JSON representing current state (player position/velocity, entities, score, mode). Third, window.advanceTime(ms) function stepping the game forward deterministically so tests aren't flaky due to timing variations.
Key Features
Iterative Development Loop: The skill enforces small iterations with validation between changes. Pick a single feature to implement, make the smallest change moving toward that goal, run Playwright tests, inspect screenshots and state, fix issues, repeat until stable. This prevents accumulating bugs from large unvalidated changes.
Automated Action Testing: Define test scenarios as JSON action payloads specifying exact button presses, frame counts, and mouse positions. The Playwright script executes these deterministically, enabling reliable testing of movement, jumping, shooting, menu navigation, and any other interactions.
Visual Validation: After each test run, screenshots capture actual visual output. The skill requires actually opening and inspecting these images, not just generating them. This catches issues where logic works but visuals are missing, wrong, or broken—ensuring what should be visible actually appears.
State Inspection: The render_game_to_text JSON output provides machine-readable state for validation. Tests confirm this text state matches what screenshots show, catching mismatches between internal state and rendered visuals.
Console Error Review: The Playwright script buffers console errors and presents them after test runs. The skill prioritizes fixing the first new error before continuing, preventing error accumulation and ensuring clean console output.
Progress Tracking: The progress.md file maintains continuity across work sessions. When another agent (or the same agent in a later session) picks up the game, they read this file to understand the original prompt, completed features, known issues, and suggestions for next steps.
Usage Examples
When building a platformer game, the skill implements collision detection for a single platform, runs Playwright tests with action sequences moving the player into the platform, inspects screenshots confirming the player stops at the platform surface rather than passing through, verifies render_game_to_text shows correct player position, checks console for errors, and only then moves to implementing jumping.
For testing shooting mechanics, action payloads define sequences: aim at enemy (mouse movement), shoot (space bar press), wait for projectile travel, check enemy state. Screenshots confirm projectiles are visible and enemies react appropriately. JSON state shows health values decreasing. The skill validates the full causal chain: shoot → projectile appears → hits enemy → health decreases → enemy disappears when health reaches zero.
When implementing menu systems, tests navigate through start/pause/resume flows with action sequences: click start button, play briefly, press escape to pause, verify pause menu visible in screenshot, press resume, confirm gameplay continues. Each step validated before proceeding to the next.
Best Practices
Make changes one at a time. Adjust a single variable—frame count, input timing, entity position, collision threshold—then retest. When multiple changes happen simultaneously, identifying which caused improvements or regressions becomes difficult.
Actually open and inspect screenshots every test run. Don't just generate them and assume they're correct. Visually verify all newly added features appear on screen, not just the start screen. Screenshots are ground truth—if something's missing, it's missing in the build.
Reset state between test scenarios validating distinct features. Cross-test contamination makes debugging harder. When testing jumping, don't carry over state from shooting tests. Fresh starts for each feature validation keeps tests reliable.
Exhaustively test important interactions thinking through full multi-step sequences. Shooting an enemy should reduce health AND update score AND remove enemy when health hits zero. Test the entire chain, not just the first step. Broken intermediate states create broken games.
Update progress.md after meaningful chunks of work. Record features completed, bugs discovered, decisions made, approaches that failed. This creates institutional memory preventing wasted effort when work resumes later.
When to Use This Skill
Use this skill when building or iterating on web-based games where reliable development and testing loops matter. It's ideal for projects requiring validation that controls work correctly, visual output matches intentions, and game state stays consistent.
The skill is particularly valuable for games with complex interactions—platformers with multiple mechanics, shooters with enemies and projectiles, puzzle games with state machines. The automated testing catches interaction bugs that manual playtesting might miss.
It's appropriate for collaborative or resumable game development. The progress tracking and structured workflow enable multiple agents to work on the same game sequentially without losing context or duplicating effort.
When NOT to Use This Skill
Don't use this skill for simple static web pages or non-interactive visual content. The testing infrastructure assumes game loops, input handling, and state evolution. Static content doesn't benefit from Playwright action sequences.
Avoid using it when game logic is extremely simple and manual testing suffices. Not every project justifies automated test infrastructure. Quick prototypes or teaching examples might not need the full workflow.
It's not appropriate for non-web games. This skill targets HTML/JavaScript games running in browsers. Native games, console games, or server-based games require different tooling and workflows.
Don't expect the skill to make game design decisions. It provides development and testing infrastructure but doesn't determine whether gameplay is fun, difficulty is balanced, or mechanics are interesting. Those remain creative decisions.
Related Skills
This skill complements playwright for browser automation, figma-implement-design for implementing designed game UI, and webapp-testing for testing web applications.
Source
This skill is maintained by OpenAI. View on GitHub