JUNE 10, 2024

Akhil Nishad, Adithya S. Kumar

Introduction

Welcome to our AI Safety Fundamentals Project! As beginners in the field of artificial intelligence and reinforcement learning, this project represents our journey of learning and applying technical skills. Our goal is to understand the intricacies of reinforcement learning, reward functions, and their implications for AI safety.

About us

We are both new to the world of AI and machine learning. This project is a hands-on approach to solidifying our understanding of key concepts in reinforcement learning and ensuring that we can apply them practically. By diving into coding algorithms, experimenting with different environments, and addressing the challenges that arise, we aim to build a solid foundation in this exciting field.

Purpose

We created this project not only to enhance our skills but also to help others who might be in a similar situation. If you are a beginner like us, navigating through the complexities of AI, we hope this project serves as a valuable resource. You'll find detailed explanations of concepts, code implementations, and insights from our experiments. Our experiences, challenges, and learnings are documented here with the intention of guiding and encouraging fellow learners.

Importance of RL and Reward Functions in AI Safety

Reinforcement learning (RL) is a powerful technique that enables agents to learn optimal behaviours through interactions with their environment. The design of reward functions is critical in guiding these agents towards desired outcomes. However, misaligned rewards can lead to unintended and potentially harmful behaviours. For instance, an agent might exploit loopholes in the reward structure, leading to reward hacking or undesirable actions that maximise short-term rewards but compromise long-term goals.

Careful reward design is essential to ensure that agents not only achieve high performance but also align with ethical standards and safety protocols. By thoroughly understanding and experimenting with different reward structures, we aim to contribute to the development of safe and efficient AI systems. This project explores various reward functions, their implications, and the challenges of aligning rewards with desired objectives to mitigate potential risks.

Weekly Progress

Week - 1: Learning the Fundamentals

The first week was spent learning the basic knowledge and skills needed to work on the project successfully. This involved getting familiar with the basics of Python programming, just enough to complete this project, as well as understanding essential libraries like Pandas and NumPy. However, the main focus was on the theoretical side of reinforcement learning, with emphasis on the most important concepts. These included Markov Decision Processes, Bellman Equations, Q-Learning algorithms, and the exploration-exploitation trade-off, which were considered crucial for a comprehensive understanding of the subject. The efforts during this week provided a solid foundation, equipping with the necessary programming abilities and conceptual understanding to effectively tackle the project's subsequent stages.

Concepts Learned

  1. Markov Decision Process (MDPs):

MDPs are a framework that help explain most reinforcement learning problems. They provide a mathematical structure for modelling decision-making scenarios where outcomes are influenced by both random factors and the actions of a decision maker. An MDP is defined by: