This project explored how an autonomous agent can learn to drive a race car using only visual input. The goal was to train an AI to complete procedurally generated racetracks in the CarRacing-v3 environment, where the agent must make real-time driving decisions based on raw images. This problem matters because it reflects real challenges in autonomous driving, such as perception, decision-making, and long-term planning. I worked on this project as part of a team for a reinforcement learning course, with my role focused on implementing and analyzing one of the learning agents.
I was responsible for developing and training the Deep Q-Network (DQN) agent, analyzing its performance, and contributing to the final report and presentation. The project lasted about 3 months, from initial setup and experimentation to evaluation and write-up.
This project was challenging due to the complexity of learning from raw visual data and the instability of reinforcement learning training. Small changes in parameters could cause agents to fail entirely or behave unpredictably. Balancing training time, performance, and fair comparison across different algorithms also required careful experimentation and iteration.
I implemented the DQN agent, helped standardize the training setup across all agents, and analyzed learning curves to compare performance. Rather than focusing only on final results, I spent time understanding why certain agents struggled or succeeded, especially how design choices like action space and reward structure affected learning. This project helped me build a stronger intuition for experimentation, debugging learning systems, and clearly communicating technical results to others.
Our results showed clear tradeoffs between learning stability, training cost, and final performance across the different reinforcement learning approaches. The DQN agent was the fastest to train and the easiest to stabilize, making it a strong baseline, but it struggled with precise control and consistent lap completion. PPO achieved more stable learning behavior and smoother driving, though it required longer training to reach strong performance. SAC ultimately showed the highest potential for continuous control, but its success depended heavily on careful hyperparameter tuning and came with long training times and unstable early learning. Overall, the findings highlight that stronger performance often comes at the cost of increased complexity, making algorithm choice highly dependent on available resources and reliability requirements.

