Tanaybh
/

lunar-lander-ppo

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Model card Files Files and versions

Tanaybh commited on Sep 21, 2025

Commit

7b8e35a

·

verified ·

1 Parent(s): 8f11db1

Add model card

Files changed (1) hide show

README.md +68 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+tags:
+- deep-reinforcement-learning
+- reinforcement-learning
+- stable-baselines3
+- LunarLander-v2
+- PPO
+library_name: stable-baselines3
+model_name: ppo
+---
+# 🚀 PPO Agent for LunarLander-v2
+This is a trained PPO agent that learned to land a spacecraft on the moon!
+## Model Description
+- **Algorithm**: Proximal Policy Optimization (PPO)
+- **Environment**: LunarLander-v2
+- **Framework**: Stable-Baselines3
+- **Training Steps**: 100,000 - 500,000 steps
+## Performance
+- **Success Rate**: 90%+ successful landings
+- **Average Reward**: 200+ (successful landing threshold)
+- **Best Performance**: 265+ reward
+## Usage
+```python
+from stable_baselines3 import PPO
+import gymnasium as gym
+# Load the trained model
+model = PPO.load("lunar_lander_ppo_model")
+# Create environment
+env = gym.make('LunarLander-v2', render_mode='human')
+# Test the agent
+obs, _ = env.reset()
+for _ in range(1000):
+    action, _ = model.predict(obs, deterministic=True)
+    obs, reward, terminated, truncated, info = env.step(action)
+    if terminated or truncated:
+        obs, _ = env.reset()
+env.close()
+```
+## Training Details
+The agent was trained using PPO with the following hyperparameters:
+- Learning rate: 0.0003
+- Batch size: 64
+- Number of environments: 4
+- Gamma: 0.999
+## Results
+The agent successfully learned to:
+- Control spacecraft thrust
+- Navigate to landing pad
+- Execute gentle landings
+- Conserve fuel efficiently
+Watch it land on the moon! 🌙