An introduction to Q-Learning: reinforcement learning (2024)

/ #Machine Learning
An introduction to Q-Learning: reinforcement learning (1)

by ADL

An introduction to Q-Learning: reinforcement learning (2)

This article is the second part of my “Deep reinforcement learning” series. The complete series shall be available both on Medium and in videos on my YouTube channel.

In the first part of the series we learnt the basics of reinforcement learning.

Q-learning is a values-based learning algorithm in reinforcement learning. In this article, we learn about Q-Learning and its details:

  • What is Q-Learning ?
  • Mathematics behind Q-Learning
  • Implementation using python

Q-Learning — a simplistic overview

Let’s say that a robot has to cross a maze and reach the end point. There are mines, and the robot can only move one tile at a time. If the robot steps onto a mine, the robot is dead. The robot has to reach the end point in the shortest time possible.

The scoring/reward system is as below:

  1. The robot loses 1 point at each step. This is done so that the robot takes the shortest path and reaches the goal as fast as possible.
  2. If the robot steps on a mine, the point loss is 100 and the game ends.
  3. If the robot gets power ⚡️, it gains 1 point.
  4. If the robot reaches the end goal, the robot gets 100 points.

Now, the obvious question is: How do we train a robot to reach the end goal with the shortest path without stepping on a mine?

An introduction to Q-Learning: reinforcement learning (3)

So, how do we solve this?

Introducing the Q-Table

Q-Table is just a fancy name for a simple lookup table where we calculate the maximum expected future rewards for action at each state. Basically, this table will guide us to the best action at each state.

An introduction to Q-Learning: reinforcement learning (4)

There will be four numbers of actions at each non-edge tile. When a robot is at a state it can either move up or down or right or left.

So, let’s model this environment in our Q-Table.

In the Q-Table, the columns are the actions and the rows are the states.

An introduction to Q-Learning: reinforcement learning (5)

Each Q-table score will be the maximum expected future reward that the robot will get if it takes that action at that state. This is an iterative process, as we need to improve the Q-Table at each iteration.

But the questions are:

  • How do we calculate the values of the Q-table?
  • Are the values available or predefined?

To learn each value of the Q-table, we use the Q-Learning algorithm.

Mathematics: the Q-Learning algorithm

Q-function

The Q-function uses the Bellman equation and takes two inputs: state (s) and action (a).

An introduction to Q-Learning: reinforcement learning (6)

Using the above function, we get the values of Q for the cells in the table.

When we start, all the values in the Q-table are zeros.

There is an iterative process of updating the values. As we start to explore the environment, the Q-function gives us better and better approximations by continuously updating the Q-values in the table.

Now, let’s understand how the updating takes place.

Introducing the Q-learning algorithm process

An introduction to Q-Learning: reinforcement learning (7)

Each of the colored boxes is one step. Let’s understand each of these steps in detail.

Step 1: initialize the Q-Table

We will first build a Q-table. There are n columns, where n= number of actions. There are m rows, where m= number of states. We will initialise the values at 0.

An introduction to Q-Learning: reinforcement learning (8)
An introduction to Q-Learning: reinforcement learning (9)

In our robot example, we have four actions (a=4) and five states (s=5). So we will build a table with four columns and five rows.

Steps 2 and 3: choose and perform an action

This combination of steps is done for an undefined amount of time. This means that this step runs until the time we stop the training, or the training loop stops as defined in the code.

We will choose an action (a) in the state (s) based on the Q-Table. But, as mentioned earlier, when the episode initially starts, every Q-value is 0.

So now the concept of exploration and exploitation trade-off comes into play. This article has more details.

We’ll use something called the epsilon greedy strategy.

In the beginning, the epsilon rates will be higher. The robot will explore the environment and randomly choose actions. The logic behind this is that the robot does not know anything about the environment.

As the robot explores the environment, the epsilon rate decreases and the robot starts to exploit the environment.

During the process of exploration, the robot progressively becomes more confident in estimating the Q-values.

For the robot example, there are four actions to choose from: up, down, left, and right. We are starting the training now — our robot knows nothing about the environment. So the robot chooses a random action, say right.

An introduction to Q-Learning: reinforcement learning (10)

We can now update the Q-values for being at the start and moving right using the Bellman equation.

Steps 4 and 5: evaluate

Now we have taken an action and observed an outcome and reward.We need to update the function Q(s,a).

An introduction to Q-Learning: reinforcement learning (11)

In the case of the robot game, to reiterate the scoring/reward structure is:

  • power = +1
  • mine = -100
  • end = +100
An introduction to Q-Learning: reinforcement learning (12)
An introduction to Q-Learning: reinforcement learning (13)

We will repeat this again and again until the learning is stopped. In this way the Q-Table will be updated.

Python implementation of Q-Learning

The concept and code implementation are explained in my video.

Subscribe to my YouTube channel For more AI videos : ADL .

At last…let us recap

  • Q-Learning is a value-based reinforcement learning algorithm which is used to find the optimal action-selection policy using a Q function.
  • Our goal is to maximize the value function Q.
  • The Q table helps us to find the best action for each state.
  • It helps to maximize the expected reward by selecting the best of all possible actions.
  • Q(state, action) returns the expected future reward of that action at that state.
  • This function can be estimated using Q-Learning, which iteratively updates Q(s,a) using the Bellman equation.
  • Initially we explore the environment and update the Q-Table. When the Q-Table is ready, the agent will start to exploit the environment and start taking better actions.

Next time we’ll work on a deep Q-learning example.

Until then, enjoy AI ?.

Important: As stated earlier, this article is the second part of my “Deep Reinforcement Learning” series. The complete series shall be available both in articles on Medium and in videos on my YouTube channel.

If you liked my article, please click the ? to help me stay motivated to write articles. Please follow me on Medium and other social media:

An introduction to Q-Learning: reinforcement learning (14)
An introduction to Q-Learning: reinforcement learning (15)
An introduction to Q-Learning: reinforcement learning (16)

If you have any questions, please let me know in a comment below or on Twitter.

Subscribe to my YouTube channel for more tech videos.

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

If this article was helpful, .

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

ADVERTIsem*nT

An introduction to Q-Learning: reinforcement learning (2024)

FAQs

What is Q-learning reinforcement learning? ›

Q-learning is a machine learning approach that enables a model to iteratively learn and improve over time by taking the correct action. Q-learning is a type of reinforcement learning. With reinforcement learning, a machine learning model is trained to mimic the way animals or children learn.

How do I start Q-learning? ›

Here's how the Q-learning algorithm would work in this example:
  1. Initialize the Q-table: Q = [ [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], ...
  2. Observe the state: ...
  3. Choose an action: ...
  4. Execute the action: ...
  5. Update the Q-table: ...
  6. Repeat steps 2-5 until the agent reaches the goal state: ...
  7. Repeat steps 1-6 for multiple episodes:
Apr 27, 2023

What is the math behind Q-learning? ›

The mathematical equation behind q-learning is the Bellman Equation. Q-learning, in its continuous efforts to find the optimal policy, leverages a systematic approach to quantify the quality, or 'Q-value,' of taking a specific action in a particular state.

What is the difference between deep Q-learning and reinforcement learning? ›

While regular Q-learning maps each state-action pair to its corresponding value, deep Q-learning uses a neural network to map input states to pairs via a three-step process: Initializing Target and Main neural networks. Choosing an action. Updating network weights using the Bellman Equation.

Why is Q-learning unstable? ›

This instability comes from the correlations present in the sequence of observations, the fact that small updates to Q may significantly change the policy of the agent and the data distribution, and the correlations between Q and the target values.

What are the limitations of Q-learning? ›

Disadvantages of Q-Learning

As the number of states or actions increases, the size of the Q-table grows exponentially. This can make Q-learning impractical for environments with very large or continuous state or action spaces due to the enormous amount of memory and computation required.

Is Q-learning model-free? ›

Q-learning is a model-free algorithm in the sense that it has no transition model — the model of the environment to learn from — therefore the agent finds the best way to navigate the environment by its predictions.

What is the alternative to Q-learning? ›

VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning both in tabular implementations and deep RL agents on Atari-57 games.

What is an example of a game with reinforcement learning? ›

In reinforcement learning, the agent, like the dog, is guided by rewards to maximize its performance. One of the earliest examples of RL in gaming is in Backgammon, where a neural network-based agent was trained to play the game using the TD-gammon algorithm.

What is the difference between R learning and Q-learning? ›

Q-learning (Watkins, 1989) is a method for optimizing (cumulated) discounted reward, making far-future rewards less prioritized than near-term rewards. R-learning (Schwarz, 1993) is a method for optimizing average reward, weighing both far-future and near-term reward the same.

Why is Q-learning biased? ›

The overestimation bias occurs since the target maxa0∈A Q(st+1,a0) is used in the Q-learning update. Because Q is an approximation, it is probable that the approximation is higher than the true value for one or more of the actions. The maximum over these estimators, then, is likely to be skewed towards an overestimate.

Is Q-learning a reinforcement algorithm? ›

Q-learning is therefore a reinforcement learning algorithm that seeks to find the best action to take given the current state. It is considered non-policy because the Q-learning function learns actions that are outside the current policy, such as taking random actions, and therefore a policy is not required.

What is better than reinforcement learning? ›

Both deep learning and reinforcement learning have their advantages and disadvantages. For example, deep learning is good at recognizing patterns in data, whereas reinforcement learning is good at figuring out the best way to achieve a goal.

What is the classic Q-learning algorithm? ›

The Q-Learning algorithm works like this: Initialize all Q-values, e.g., with zeros. Choose an action a in the current state s based on the current best Q-value. Perform this action a and observe the outcome (new state s').

What is the difference between V and Q in reinforcement learning? ›

Key Difference between Q-Function and Value Function

The Q function takes both the state and the action as input, while the value function only takes the state as input. This means that the Q function can be used to learn an optimal policy, while the value function can only be used to evaluate different policies.

What is the Q value in deep reinforcement learning? ›

Deep Q Learning uses the Q-learning idea and takes it one step further. Instead of using a Q-table, we use a Neural Network that takes a state and approximates the Q-values for each action based on that state.

What is the Q * algorithm? ›

The Q algorithm is part of a system, termed the Maryland Refutation Proof Procedure System (MRPPS), which incorporates both the Q algorithm, which performs the search required to answer a query, and an inferential component, which performs the logical manipulations necessary to deduce a clouse from one or two other ...

Top Articles
Spinach Bacon Quiche Recipe [with Video] - Savory Nothings
Ham and Cheese Quiche
Ceton Village Diggy
Romans 2 Esv
Notorious CT After-Hours Club Raided, Nets 3 Arrests, More To Come, Police Say
123Movies Kingsman Secret Service
Walmart Academy Core Test Questions And Answers
Haktuts Free Spins Link 2020
City Of Dreams Hosts The Biggest Movie Premiere In American History - Haute Living
Brown-eyed girl's legacy lives
Lohikeitto (Finnish Salmon Soup) Recipe on Food52
Keith Niebuhr Twitter
Iad To Hyd Google Flights
Mahjong Undress Party
Mr Benson Avancemos
Final Exam Schedule Liberty University
Accuweather Radar Michigan
Paccar PX-7 Problems Not Many People May Know
7 Best Jointers Reviewed, Tested and Compared in 2024 | House Grail
VesalBlood ALTERNE: Diesem Fernen Traum - Ri47
Nm Ose
South Bend Weather Underground
Chico Ca Craigslist
Jeep Österreich| Mopar | Vernetzte Dienste - Uconnect
Best Car Wash Soap for 2022
Misou Nail Spa
Robert Moses State Park ocean water temperature today | NY, United States temp
Student Choice Odysseyware
Nicole Webb Facebook
Craigslist Pets Sac
Top 15 Easy Cold Appetizers
Nails Latinas, Wenkbrauw- of wimperbehandeling naar keuze Spare 52% in Zaanstreek-Waterland mit Social Deal
Обзор открытых наушников Sanag Z66 pro с высокой автономностью
Zack Fairhurst Snapchat
Brent Yorgey - Solved Kattis problems
Tf2 Cosmetics Tester
Beacon Schneider Gibson County
Black Adam Showtimes Near Marcus Valley Grand Cinema
Chicktok App
My Mother Your Mother Lives Across The Street, Prayer For A Chess Player Crossword
The best things to do on your next road trip to Rockingham
Shamokin Dispensary Menu
24 Hours Body Massage Near Me
Pwc Trader Florida
Unknown Venmo Charges on Bank Statement
DETERMINING USER RESPONSE TO NOTIFICATIONS BASED ON A PHYSIOLOGICAL PARAMETER专利检索- ...使用诱导响应的专利检索查询-专利查询网
Kingpin Parking Reviews
The Second Amendment Bible
Ssndob Cm New Domain
Vera Bradley Factory Outlet Sunbury Photos
Latest Posts
Article information

Author: Tuan Roob DDS

Last Updated:

Views: 5509

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Tuan Roob DDS

Birthday: 1999-11-20

Address: Suite 592 642 Pfannerstill Island, South Keila, LA 74970-3076

Phone: +9617721773649

Job: Marketing Producer

Hobby: Skydiving, Flag Football, Knitting, Running, Lego building, Hunting, Juggling

Introduction: My name is Tuan Roob DDS, I am a friendly, good, energetic, faithful, fantastic, gentle, enchanting person who loves writing and wants to share my knowledge and understanding with you.