Skip to content

Deep Q-learning in Python

I am going to use deep q-learning (DQN) to train an agent on the cartpole problem in this tutorial. Deep q-learning is reinforcement learning and means that we create a neural network model that is updated with values for the current state and q-values (quality of actions).

Reinforcement learning (RL) is learning by trial and error, an agent recieves rewards when it performs good actions and the goal of an agent is to maximize the total reward. Reinforcement learning is applied on a markov decision process (MDP).

Markov Decision Process (MDP)

I am creating a simple neural network model with keras functional API, it has 3 layers and the hidden layer has 32 nodes. The model is updated in batches after each episode by randomly selecting timesteps from experience stored in memory. Rewards is updated by applying a discount factor (gamma). A discount factor determines how important future rewards is, a discount factor of 0 means that current rewards is most important while a discount factor of 1 means that a long-time reward is most important.

A neural network tries to depict an animal brain, it has connected nodes in three or more layers. A neural network includes weights, a score function and a loss function. A neural network learns in a feedback loop, it adjusts its weights based on the results from the score function and the loss function. A simple neural network includes three layers, an input layer, a hidden layer and an output layer. More than 3 layers is often referred to as deep learning.

Problem and Libraries

A pole is attached to a cart that moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over, the agent can apply a force from left or right and a reward of 1 is provided for every timestep that the pole remains upright. An episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center or after 200 timesteps. The cartpole problem is part of the gym library, i am also using the following libraries: os, math, random, numpy and keras.

Code

The cartpole-v0 (version 0) problem is considered solved if the average reward is 195 or more over 100 consecutive trials. The agent can be trained and evaluated, the model is saved to disk after each training session. A neural network needs a lot of training, I have trained the model over 3 session with 1000 episodes in each session. The model is loaded from disk before a training session and does not have to start from zero in each session. The full code for the DQN agent is shown below.

# Import libraries
import os
import math
import random
import gym
import numpy as np
import keras

# This class is used to store experience in memory
class ExperienceReplay():

    # Create a new instance
    def __init__(self, capacity:int=10000):
        self.capacity = capacity
        self.memory = []
        self.position = 0

    # Push a transition to memory
    def push(self, transition:()):
        
        # Add a new item if we have capacity
        if (len(self.memory) < self.capacity):
            self.memory.append(None)

        # Update memory
        self.memory[self.position] = transition
        self.position = (self.position + 1) % self.capacity

    # Get a random batch
    def sample(self, batch_size:int=20):
        return random.sample(self.memory, batch_size)

    # Get the length of the memory array
    def __len__(self):
        return len(self.memory)

# Get a DQN model
def get_model(env, alpha:float=0.001) -> keras.models.Model:

    # Load a model if we have saved one
    if(os.path.isfile('models\\dqn_cartpole.h5') == True):
        return keras.models.load_model('models\\dqn_cartpole.h5')

    # Create layers (Functional API)
    inputs = keras.layers.Input(shape=(env.observation_space.shape[0],), dtype='float32', name='input_layer') # Input layer (None, 4)
    outputs = keras.layers.Dense(32, activation='relu', name='hidden_layer')(inputs) # Hidden layer
    outputs = keras.layers.Dense(env.action_space.n, activation='linear', name='output_layer')(outputs) # Output layer (2 actions)

    # Create a model from input layer and output layers
    model = keras.models.Model(inputs=inputs, outputs=outputs, name='dqn_model')

    # Print model
    print()
    print(model.summary(), '\n')

    # Compile the model
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=alpha), loss='mse')

    # Return a model
    return model

# Get an action (0:Left, 1:Right)
def get_action(model, state:(), epsilon:float=1.0) -> int:
    return random.randint(0, 1) if (random.random() <= epsilon) else np.argmax(model.predict(state))

# Update the model
def update(memory, model, batch_size:int=32, gamma:float=0.95):

    # Return if we don't have enough batches
    if (len(memory) < batch_size):
        return

    # Get batches at random
    batches = memory.sample(batch_size)

    # Loop batches
    for current_state, next_state, action, reward, done in batches:
        
        # Update the reward
        if(done == False):
            reward = (reward + gamma * np.amax(model.predict(next_state)[0]))

        # Get Q values for the current state
        q_values = model.predict(current_state)

        # Update Q values with reward
        q_values[0][action] = reward

        # Train the model
        model.fit(current_state, q_values, batch_size=1, epochs=1, verbose=0)

# Exploration rate
def get_epsilon(t:int, min_epsilon:float, divisor:int=25) -> float:
    return max(min_epsilon, min(1, 1.0 - math.log10((t + 1) / divisor)))

# Train a model
def train():

    # Variables
    episodes = 1000
    timesteps = 200
    total_score = 0

    # Create an environment
    env = gym.make('CartPole-v0')

    # Create memory
    memory = ExperienceReplay(20000)

    # Get a model
    model = get_model(env, 0.001)

    # Loop episodes
    for episode in range(episodes):

        # Start episode and get initial observation (reshaped)
        current_state = np.reshape(env.reset(), [1, env.observation_space.shape[0]])

        # Get an updated exploration rate
        epsilon = get_epsilon(episode, 0.01, 100)
        
        # Reset score
        score = 0

        # Loop timesteps
        for t in range(timesteps):

            # Get an action
            action = get_action(model, current_state, epsilon)

            # Perform a step
            # next_state: (position, velocity, angle and angular velocity)
            next_state, reward, done, info = env.step(action)

            # Reshape the state
            next_state = np.reshape(next_state, [1, env.observation_space.shape[0]])

            # Save a transition in memory
            memory.push((current_state, next_state, action, reward, done))
        
            # Update current state
            current_state = next_state

            # Update score
            score += reward
            total_score += reward
          
            # Check if we are done (game over)
            if done:
                print('Episode {0}, Score: {1}, Timesteps: {2}, Epsilon: {3}'.format(episode+1, score, t+1, epsilon))
                break

        # Update the model
        update(memory, model)
       
    # Close the environment
    env.close()

    # Save the model (Make sure that the folder exists)
    model.save('models\\dqn_cartpole.h5')

    # Print final score
    print()
    print('--- Evaluation ---')
    print('Average score: {0}'.format(total_score / episodes))
    print('Episodes: {0}'.format(episodes))
    print()

# Evaluate the performance of a model
def evaluate():

    # Variables
    episodes = 100 # Number of episodes
    timesteps = 200 # Number of timesteps in each episode
    total_score = 0

    # Create an environment
    env = gym.make('CartPole-v0')

    # Get a model
    model = get_model(env)

    # Loop episodes
    for episode in range(episodes):

        # Start episode and get initial observation (reshaped)
        state = np.reshape(env.reset(), [1, env.observation_space.shape[0]])

        # Reset score
        score = 0

        # Loop timesteps
        for t in range(timesteps):

            # Render the environment
            env.render(mode='human')

            # Get an action from the model
            action = np.argmax(model.predict(state))

            # Perform a step
            # next_state: (position, velocity, angle and angular velocity)
            state, reward, done, info = env.step(action)

            # Reshape the state
            state = np.reshape(state, [1, env.observation_space.shape[0]])
        
            # Update score
            score += reward
            total_score += reward

            # Check if we are done (game over)
            if done:
                print('Episode {0}, Score: {1}, Timesteps: {2}'.format(episode+1, score, t+1))
                break

    # Close the environment
    env.close()

    # Print final score
    print()
    print('--- Evaluation ---')
    print('Average score: {0}'.format(total_score / episodes))
    print('Episodes: {0}'.format(episodes))
    print()

# The main entry point for this module
def main():

    # Train the model
    #train()

    # Evaluate the model
    evaluate()

# Tell python to run main method
if __name__ == "__main__": main()

Training

Episode 707, Score: 200.0, Timesteps: 200, Epsilon: 0.15058058620310055
Episode 708, Score: 200.0, Timesteps: 200, Epsilon: 0.149966742310231
Episode 709, Score: 200.0, Timesteps: 200, Epsilon: 0.14935376481693352
Episode 710, Score: 200.0, Timesteps: 200, Epsilon: 0.14874165128092476
Episode 711, Score: 200.0, Timesteps: 200, Epsilon: 0.14813039927023364
Episode 712, Score: 184.0, Timesteps: 184, Epsilon: 0.14752000636314366
Episode 713, Score: 200.0, Timesteps: 200, Epsilon: 0.1469104701481344
Episode 714, Score: 200.0, Timesteps: 200, Epsilon: 0.14630178822382567
Episode 715, Score: 200.0, Timesteps: 200, Epsilon: 0.1456939581989194
Episode 716, Score: 200.0, Timesteps: 200, Epsilon: 0.1450869776921444
Episode 717, Score: 200.0, Timesteps: 200, Epsilon: 0.14448084433219988
Episode 718, Score: 200.0, Timesteps: 200, Epsilon: 0.14387555575769972
Episode 719, Score: 200.0, Timesteps: 200, Epsilon: 0.14327110961711742
Episode 720, Score: 157.0, Timesteps: 157, Epsilon: 0.14266750356873148
Episode 721, Score: 200.0, Timesteps: 200, Epsilon: 0.14206473528057095
Episode 722, Score: 200.0, Timesteps: 200, Epsilon: 0.1414628024303609
Episode 723, Score: 200.0, Timesteps: 200, Epsilon: 0.14086170270546916
Episode 724, Score: 168.0, Timesteps: 168, Epsilon: 0.14026143380285305
Episode 725, Score: 183.0, Timesteps: 183, Epsilon: 0.13966199342900631
Episode 726, Score: 197.0, Timesteps: 197, Epsilon: 0.1390633792999063
Episode 727, Score: 44.0, Timesteps: 44, Epsilon: 0.1384655891409622
Episode 728, Score: 192.0, Timesteps: 192, Epsilon: 0.13786862068696282
Episode 729, Score: 189.0, Timesteps: 189, Epsilon: 0.1372724716820254
Episode 730, Score: 200.0, Timesteps: 200, Epsilon: 0.1366771398795441
Episode 731, Score: 200.0, Timesteps: 200, Epsilon: 0.13608262304213958
Episode 732, Score: 198.0, Timesteps: 198, Epsilon: 0.13548891894160808
Episode 733, Score: 200.0, Timesteps: 200, Epsilon: 0.13489602535887202
Episode 734, Score: 176.0, Timesteps: 176, Epsilon: 0.13430394008392943
Episode 735, Score: 194.0, Timesteps: 194, Epsilon: 0.13371266091580514
Episode 736, Score: 200.0, Timesteps: 200, Epsilon: 0.1331221856625011
Episode 737, Score: 200.0, Timesteps: 200, Epsilon: 0.13253251214094852
Episode 738, Score: 200.0, Timesteps: 200, Epsilon: 0.13194363817695842
Episode 739, Score: 200.0, Timesteps: 200, Epsilon: 0.13135556160517425
Episode 740, Score: 200.0, Timesteps: 200, Epsilon: 0.13076828026902376
Episode 741, Score: 197.0, Timesteps: 197, Epsilon: 0.13018179202067182
Episode 742, Score: 187.0, Timesteps: 187, Epsilon: 0.1295960947209729
Episode 743, Score: 200.0, Timesteps: 200, Epsilon: 0.12901118623942476
Episode 744, Score: 200.0, Timesteps: 200, Epsilon: 0.12842706445412122
Episode 745, Score: 200.0, Timesteps: 200, Epsilon: 0.12784372725170712
Episode 746, Score: 200.0, Timesteps: 200, Epsilon: 0.12726117252733116
Episode 747, Score: 187.0, Timesteps: 187, Epsilon: 0.12667939818460128
Episode 748, Score: 200.0, Timesteps: 200, Epsilon: 0.12609840213553858
Episode 749, Score: 200.0, Timesteps: 200, Epsilon: 0.12551818230053347
Episode 750, Score: 200.0, Timesteps: 200, Epsilon: 0.1249387366082999
Episode 751, Score: 200.0, Timesteps: 200, Epsilon: 0.12436006299583158
Episode 752, Score: 200.0, Timesteps: 200, Epsilon: 0.1237821594083578
Episode 753, Score: 200.0, Timesteps: 200, Epsilon: 0.12320502379929943
Episode 754, Score: 200.0, Timesteps: 200, Epsilon: 0.12262865413022594
Episode 755, Score: 200.0, Timesteps: 200, Epsilon: 0.12205304837081177
Episode 756, Score: 193.0, Timesteps: 193, Epsilon: 0.12147820449879354
Episode 757, Score: 200.0, Timesteps: 200, Epsilon: 0.12090412049992727
Episode 758, Score: 185.0, Timesteps: 185, Epsilon: 0.12033079436794647
Episode 759, Score: 200.0, Timesteps: 200, Epsilon: 0.11975822410451964
Episode 760, Score: 200.0, Timesteps: 200, Epsilon: 0.11918640771920863
Episode 761, Score: 200.0, Timesteps: 200, Epsilon: 0.11861534322942713
Episode 762, Score: 200.0, Timesteps: 200, Epsilon: 0.11804502866039945
Episode 763, Score: 200.0, Timesteps: 200, Epsilon: 0.1174754620451195
Episode 764, Score: 200.0, Timesteps: 200, Epsilon: 0.11690664142431006
Episode 765, Score: 200.0, Timesteps: 200, Epsilon: 0.11633856484638239
Episode 766, Score: 189.0, Timesteps: 189, Epsilon: 0.115771230367396
Episode 767, Score: 200.0, Timesteps: 200, Epsilon: 0.11520463605101905
Episode 768, Score: 183.0, Timesteps: 183, Epsilon: 0.11463877996848804
Episode 769, Score: 200.0, Timesteps: 200, Epsilon: 0.1140736601985689
Episode 770, Score: 200.0, Timesteps: 200, Epsilon: 0.11350927482751816
Episode 771, Score: 200.0, Timesteps: 200, Epsilon: 0.112945621949043
Episode 772, Score: 200.0, Timesteps: 200, Epsilon: 0.11238269966426384
Episode 773, Score: 200.0, Timesteps: 200, Epsilon: 0.11182050608167504
Episode 774, Score: 200.0, Timesteps: 200, Epsilon: 0.11125903931710734
Episode 775, Score: 200.0, Timesteps: 200, Epsilon: 0.11069829749368976
Episode 776, Score: 200.0, Timesteps: 200, Epsilon: 0.11013827874181159
Episode 777, Score: 200.0, Timesteps: 200, Epsilon: 0.10957898119908571
Episode 778, Score: 200.0, Timesteps: 200, Epsilon: 0.10902040301031102
Episode 779, Score: 200.0, Timesteps: 200, Epsilon: 0.10846254232743557
Episode 780, Score: 200.0, Timesteps: 200, Epsilon: 0.10790539730951965
Episode 781, Score: 200.0, Timesteps: 200, Epsilon: 0.10734896612269973
Episode 782, Score: 200.0, Timesteps: 200, Epsilon: 0.10679324694015202
Episode 783, Score: 200.0, Timesteps: 200, Epsilon: 0.10623823794205656
Episode 784, Score: 200.0, Timesteps: 200, Epsilon: 0.10568393731556158
Episode 785, Score: 200.0, Timesteps: 200, Epsilon: 0.10513034325474746
Episode 786, Score: 200.0, Timesteps: 200, Epsilon: 0.10457745396059204
Episode 787, Score: 200.0, Timesteps: 200, Epsilon: 0.10402526764093545
Episode 788, Score: 200.0, Timesteps: 200, Epsilon: 0.1034737825104447
Episode 789, Score: 200.0, Timesteps: 200, Epsilon: 0.1029229967905797
Episode 790, Score: 200.0, Timesteps: 200, Epsilon: 0.10237290870955851
Episode 791, Score: 200.0, Timesteps: 200, Epsilon: 0.10182351650232346
Episode 792, Score: 200.0, Timesteps: 200, Epsilon: 0.10127481841050645
Episode 793, Score: 139.0, Timesteps: 139, Epsilon: 0.10072681268239625
Episode 794, Score: 180.0, Timesteps: 180, Epsilon: 0.10017949757290368
Episode 795, Score: 200.0, Timesteps: 200, Epsilon: 0.09963287134352972
Episode 796, Score: 200.0, Timesteps: 200, Epsilon: 0.099086932262331
Episode 797, Score: 184.0, Timesteps: 184, Epsilon: 0.09854167860388763
Episode 798, Score: 184.0, Timesteps: 184, Epsilon: 0.09799710864927058
Episode 799, Score: 200.0, Timesteps: 200, Epsilon: 0.09745322068600859
Episode 800, Score: 200.0, Timesteps: 200, Epsilon: 0.09691001300805646
Episode 801, Score: 200.0, Timesteps: 200, Epsilon: 0.09636748391576233
Episode 802, Score: 200.0, Timesteps: 200, Epsilon: 0.09582563171583647
Episode 803, Score: 200.0, Timesteps: 200, Epsilon: 0.09528445472131908
Episode 804, Score: 200.0, Timesteps: 200, Epsilon: 0.09474395125154877
Episode 805, Score: 200.0, Timesteps: 200, Epsilon: 0.09420411963213149
Episode 806, Score: 189.0, Timesteps: 189, Epsilon: 0.09366495819490928
Episode 807, Score: 200.0, Timesteps: 200, Epsilon: 0.09312646527792956
Episode 808, Score: 200.0, Timesteps: 200, Epsilon: 0.09258863922541383
Episode 809, Score: 200.0, Timesteps: 200, Epsilon: 0.09205147838772776
Episode 810, Score: 200.0, Timesteps: 200, Epsilon: 0.09151498112135026
Episode 811, Score: 200.0, Timesteps: 200, Epsilon: 0.09097914578884403
Episode 812, Score: 200.0, Timesteps: 200, Epsilon: 0.09044397075882471
Episode 813, Score: 200.0, Timesteps: 200, Epsilon: 0.08990945440593179
Episode 814, Score: 200.0, Timesteps: 200, Epsilon: 0.08937559511079873
Episode 815, Score: 200.0, Timesteps: 200, Epsilon: 0.08884239126002336
Episode 816, Score: 200.0, Timesteps: 200, Epsilon: 0.08830984124613883
Episode 817, Score: 200.0, Timesteps: 200, Epsilon: 0.0877779434675845
Episode 818, Score: 200.0, Timesteps: 200, Epsilon: 0.08724669632867699
Episode 819, Score: 200.0, Timesteps: 200, Epsilon: 0.0867160982395816
Episode 820, Score: 200.0, Timesteps: 200, Epsilon: 0.08618614761628329
Episode 821, Score: 200.0, Timesteps: 200, Epsilon: 0.08565684288055919
Episode 822, Score: 200.0, Timesteps: 200, Epsilon: 0.08512818245994958
Episode 823, Score: 200.0, Timesteps: 200, Epsilon: 0.08460016478773014
Episode 824, Score: 200.0, Timesteps: 200, Epsilon: 0.08407278830288423
Episode 825, Score: 200.0, Timesteps: 200, Epsilon: 0.08354605145007488
Episode 826, Score: 200.0, Timesteps: 200, Epsilon: 0.08301995267961781
Episode 827, Score: 200.0, Timesteps: 200, Epsilon: 0.08249449044745338
Episode 828, Score: 200.0, Timesteps: 200, Epsilon: 0.08196966321511989
Episode 829, Score: 200.0, Timesteps: 200, Epsilon: 0.08144546944972653
Episode 830, Score: 200.0, Timesteps: 200, Epsilon: 0.08092190762392604
Episode 831, Score: 200.0, Timesteps: 200, Epsilon: 0.08039897621588898
Episode 832, Score: 200.0, Timesteps: 200, Epsilon: 0.07987667370927609
Episode 833, Score: 200.0, Timesteps: 200, Epsilon: 0.07935499859321238
Episode 834, Score: 200.0, Timesteps: 200, Epsilon: 0.07883394936226129
Episode 835, Score: 154.0, Timesteps: 154, Epsilon: 0.07831352451639795
Episode 836, Score: 194.0, Timesteps: 194, Epsilon: 0.0777937225609836
Episode 837, Score: 149.0, Timesteps: 149, Epsilon: 0.07727454200674
Episode 838, Score: 174.0, Timesteps: 174, Epsilon: 0.0767559813697235
Episode 839, Score: 200.0, Timesteps: 200, Epsilon: 0.07623803917129968
Episode 840, Score: 134.0, Timesteps: 134, Epsilon: 0.0757207139381183
Episode 841, Score: 186.0, Timesteps: 186, Epsilon: 0.07520400420208784
Episode 842, Score: 200.0, Timesteps: 200, Epsilon: 0.07468790850035045
Episode 843, Score: 200.0, Timesteps: 200, Epsilon: 0.0741724253752577
Episode 844, Score: 200.0, Timesteps: 200, Epsilon: 0.07365755337434499
Episode 845, Score: 178.0, Timesteps: 178, Epsilon: 0.0731432910503077
Episode 846, Score: 161.0, Timesteps: 161, Epsilon: 0.07262963696097646
Episode 847, Score: 200.0, Timesteps: 200, Epsilon: 0.07211658966929302
Episode 848, Score: 200.0, Timesteps: 200, Epsilon: 0.07160414774328616
Episode 849, Score: 198.0, Timesteps: 198, Epsilon: 0.0710923097560473
Episode 850, Score: 200.0, Timesteps: 200, Epsilon: 0.07058107428570726
Episode 851, Score: 200.0, Timesteps: 200, Epsilon: 0.07007043991541217
Episode 852, Score: 200.0, Timesteps: 200, Epsilon: 0.06956040523329987
Episode 853, Score: 170.0, Timesteps: 170, Epsilon: 0.06905096883247697
Episode 854, Score: 200.0, Timesteps: 200, Epsilon: 0.06854212931099501
Episode 855, Score: 168.0, Timesteps: 168, Epsilon: 0.0680338852718273
Episode 856, Score: 155.0, Timesteps: 155, Epsilon: 0.06752623532284674
Episode 857, Score: 161.0, Timesteps: 161, Epsilon: 0.0670191780768018
Episode 858, Score: 200.0, Timesteps: 200, Epsilon: 0.0665127121512945
Episode 859, Score: 195.0, Timesteps: 195, Epsilon: 0.06600683616875769
Episode 860, Score: 161.0, Timesteps: 161, Epsilon: 0.06550154875643233
Episode 861, Score: 152.0, Timesteps: 152, Epsilon: 0.06499684854634524
Episode 862, Score: 163.0, Timesteps: 163, Epsilon: 0.06449273417528723
Episode 863, Score: 200.0, Timesteps: 200, Epsilon: 0.06398920428479038
Episode 864, Score: 200.0, Timesteps: 200, Epsilon: 0.0634862575211067
Episode 865, Score: 200.0, Timesteps: 200, Epsilon: 0.06298389253518577
Episode 866, Score: 200.0, Timesteps: 200, Epsilon: 0.0624821079826533
Episode 867, Score: 200.0, Timesteps: 200, Epsilon: 0.06198090252378974
Episode 868, Score: 170.0, Timesteps: 170, Epsilon: 0.06148027482350815
Episode 869, Score: 200.0, Timesteps: 200, Epsilon: 0.06098022355133359
Episode 870, Score: 191.0, Timesteps: 191, Epsilon: 0.06048074738138154
Episode 871, Score: 180.0, Timesteps: 180, Epsilon: 0.05998184499233672
Episode 872, Score: 200.0, Timesteps: 200, Epsilon: 0.05948351506743277
Episode 873, Score: 176.0, Timesteps: 176, Epsilon: 0.05898575629443026
Episode 874, Score: 139.0, Timesteps: 139, Epsilon: 0.05848856736559693
Episode 875, Score: 186.0, Timesteps: 186, Epsilon: 0.057991946977686726
Episode 876, Score: 200.0, Timesteps: 200, Epsilon: 0.05749589383191933
Episode 877, Score: 200.0, Timesteps: 200, Epsilon: 0.057000406633959555
Episode 878, Score: 200.0, Timesteps: 200, Epsilon: 0.05650548409389744
Episode 879, Score: 177.0, Timesteps: 177, Epsilon: 0.05601112492622817
Episode 880, Score: 200.0, Timesteps: 200, Epsilon: 0.05551732784983132
Episode 881, Score: 200.0, Timesteps: 200, Epsilon: 0.05502409158795207
Episode 882, Score: 200.0, Timesteps: 200, Epsilon: 0.05453141486818025
Episode 883, Score: 200.0, Timesteps: 200, Epsilon: 0.05403929642243144
Episode 884, Score: 200.0, Timesteps: 200, Epsilon: 0.05354773498692689
Episode 885, Score: 200.0, Timesteps: 200, Epsilon: 0.05305672930217453
Episode 886, Score: 200.0, Timesteps: 200, Epsilon: 0.05256627811294923
Episode 887, Score: 200.0, Timesteps: 200, Epsilon: 0.05207638016827365
Episode 888, Score: 142.0, Timesteps: 142, Epsilon: 0.051587034221398986
Episode 889, Score: 200.0, Timesteps: 200, Epsilon: 0.051098239029786274
Episode 890, Score: 200.0, Timesteps: 200, Epsilon: 0.0506099933550872
Episode 891, Score: 146.0, Timesteps: 146, Epsilon: 0.05012229596312523
Episode 892, Score: 196.0, Timesteps: 196, Epsilon: 0.04963514562387694
Episode 893, Score: 184.0, Timesteps: 184, Epsilon: 0.049148541111453614
Episode 894, Score: 200.0, Timesteps: 200, Epsilon: 0.04866248120408234
Episode 895, Score: 200.0, Timesteps: 200, Epsilon: 0.04817696468408805
Episode 896, Score: 141.0, Timesteps: 141, Epsilon: 0.047691990337874746
Episode 897, Score: 200.0, Timesteps: 200, Epsilon: 0.04720755695590784
Episode 898, Score: 200.0, Timesteps: 200, Epsilon: 0.04672366333269562
Episode 899, Score: 200.0, Timesteps: 200, Epsilon: 0.04624030826677117
Episode 900, Score: 200.0, Timesteps: 200, Epsilon: 0.04575749056067513
Episode 901, Score: 200.0, Timesteps: 200, Epsilon: 0.04527520902093707
Episode 902, Score: 200.0, Timesteps: 200, Epsilon: 0.044793462458058264
Episode 903, Score: 200.0, Timesteps: 200, Epsilon: 0.044312249686494276
Episode 904, Score: 193.0, Timesteps: 193, Epsilon: 0.04383156952463674
Episode 905, Score: 190.0, Timesteps: 190, Epsilon: 0.0433514207947967
Episode 906, Score: 189.0, Timesteps: 189, Epsilon: 0.04287180232318688
Episode 907, Score: 200.0, Timesteps: 200, Epsilon: 0.042392712939904764
Episode 908, Score: 200.0, Timesteps: 200, Epsilon: 0.04191415147891486
Episode 909, Score: 200.0, Timesteps: 200, Epsilon: 0.041436116778032606
Episode 910, Score: 200.0, Timesteps: 200, Epsilon: 0.04095860767890647
Episode 911, Score: 200.0, Timesteps: 200, Epsilon: 0.040481623027001756
Episode 912, Score: 200.0, Timesteps: 200, Epsilon: 0.040005161671583855
Episode 913, Score: 200.0, Timesteps: 200, Epsilon: 0.03952922246570101
Episode 914, Score: 174.0, Timesteps: 174, Epsilon: 0.03905380426616856
Episode 915, Score: 161.0, Timesteps: 161, Epsilon: 0.03857890593355173
Episode 916, Score: 142.0, Timesteps: 142, Epsilon: 0.03810452633214956
Episode 917, Score: 136.0, Timesteps: 136, Epsilon: 0.03763066432997886
Episode 918, Score: 184.0, Timesteps: 184, Epsilon: 0.03715731879875761
Episode 919, Score: 200.0, Timesteps: 200, Epsilon: 0.03668448861388873
Episode 920, Score: 176.0, Timesteps: 176, Epsilon: 0.03621217265444476
Episode 921, Score: 160.0, Timesteps: 160, Epsilon: 0.035740369803150984
Episode 922, Score: 200.0, Timesteps: 200, Epsilon: 0.03526907894637066
Episode 923, Score: 193.0, Timesteps: 193, Epsilon: 0.03479829897408793
Episode 924, Score: 200.0, Timesteps: 200, Epsilon: 0.03432802877989327
Episode 925, Score: 163.0, Timesteps: 163, Epsilon: 0.03385826726096741
Episode 926, Score: 200.0, Timesteps: 200, Epsilon: 0.03338901331806565
Episode 927, Score: 200.0, Timesteps: 200, Epsilon: 0.032920265855502895
Episode 928, Score: 200.0, Timesteps: 200, Epsilon: 0.032452023781137984
Episode 929, Score: 200.0, Timesteps: 200, Epsilon: 0.031984286006358276
Episode 930, Score: 187.0, Timesteps: 187, Epsilon: 0.03151705144606487
Episode 931, Score: 200.0, Timesteps: 200, Epsilon: 0.031050319018657402
Episode 932, Score: 176.0, Timesteps: 176, Epsilon: 0.03058408764601861
Episode 933, Score: 200.0, Timesteps: 200, Epsilon: 0.03011835625350001
Episode 934, Score: 200.0, Timesteps: 200, Epsilon: 0.029653123769906697
Episode 935, Score: 200.0, Timesteps: 200, Epsilon: 0.029188389127482228
Episode 936, Score: 200.0, Timesteps: 200, Epsilon: 0.02872415126189476
Episode 937, Score: 200.0, Timesteps: 200, Epsilon: 0.028260409112221718
Episode 938, Score: 200.0, Timesteps: 200, Epsilon: 0.027797161620935484
Episode 939, Score: 196.0, Timesteps: 196, Epsilon: 0.027334407733889066
Episode 940, Score: 200.0, Timesteps: 200, Epsilon: 0.026872146400301333
Episode 941, Score: 200.0, Timesteps: 200, Epsilon: 0.026410376572743033
Episode 942, Score: 152.0, Timesteps: 152, Epsilon: 0.025949097207122684
Episode 943, Score: 180.0, Timesteps: 180, Epsilon: 0.025488307262671595
Episode 944, Score: 200.0, Timesteps: 200, Epsilon: 0.02502800570193109
Episode 945, Score: 200.0, Timesteps: 200, Epsilon: 0.02456819149073708
Episode 946, Score: 198.0, Timesteps: 198, Epsilon: 0.024108863598207186
Episode 947, Score: 200.0, Timesteps: 200, Epsilon: 0.02365002099672653
Episode 948, Score: 131.0, Timesteps: 131, Epsilon: 0.023191662661933732
Episode 949, Score: 140.0, Timesteps: 140, Epsilon: 0.022733787572707276
Episode 950, Score: 200.0, Timesteps: 200, Epsilon: 0.02227639471115228
Episode 951, Score: 153.0, Timesteps: 153, Epsilon: 0.021819483062586076
Episode 952, Score: 196.0, Timesteps: 196, Epsilon: 0.021363051615525652
Episode 953, Score: 200.0, Timesteps: 200, Epsilon: 0.020907099361673676
Episode 954, Score: 193.0, Timesteps: 193, Epsilon: 0.02045162529590494
Episode 955, Score: 200.0, Timesteps: 200, Epsilon: 0.019996628416253603
Episode 956, Score: 200.0, Timesteps: 200, Epsilon: 0.01954210772389986
Episode 957, Score: 200.0, Timesteps: 200, Epsilon: 0.0190880622231564
Episode 958, Score: 132.0, Timesteps: 132, Epsilon: 0.018634490921455527
Episode 959, Score: 164.0, Timesteps: 164, Epsilon: 0.018181392829336396
Episode 960, Score: 115.0, Timesteps: 115, Epsilon: 0.017728766960431575
Episode 961, Score: 165.0, Timesteps: 165, Epsilon: 0.01727661233145472
Episode 962, Score: 200.0, Timesteps: 200, Epsilon: 0.016824927962187042
Episode 963, Score: 192.0, Timesteps: 192, Epsilon: 0.016373712875465407
Episode 964, Score: 200.0, Timesteps: 200, Epsilon: 0.015922966097169144
Episode 965, Score: 157.0, Timesteps: 157, Epsilon: 0.01547268665620738
Episode 966, Score: 187.0, Timesteps: 187, Epsilon: 0.015022873584506602
Episode 967, Score: 200.0, Timesteps: 200, Epsilon: 0.014573525916998342
Episode 968, Score: 147.0, Timesteps: 147, Epsilon: 0.014124642691606293
Episode 969, Score: 200.0, Timesteps: 200, Epsilon: 0.013676222949234651
Episode 970, Score: 200.0, Timesteps: 200, Epsilon: 0.013228265733755129
Episode 971, Score: 143.0, Timesteps: 143, Epsilon: 0.012780770091995075
Episode 972, Score: 161.0, Timesteps: 161, Epsilon: 0.012333735073725371
Episode 973, Score: 134.0, Timesteps: 134, Epsilon: 0.011887159731648111
Episode 974, Score: 142.0, Timesteps: 142, Epsilon: 0.011441043121384498
Episode 975, Score: 200.0, Timesteps: 200, Epsilon: 0.010995384301463185
Episode 976, Score: 166.0, Timesteps: 166, Epsilon: 0.010550182333308178
Episode 977, Score: 192.0, Timesteps: 192, Epsilon: 0.010105436281226954
Episode 978, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 979, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 980, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 981, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 982, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 983, Score: 171.0, Timesteps: 171, Epsilon: 0.01
Episode 984, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 985, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 986, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 987, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 988, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 989, Score: 133.0, Timesteps: 133, Epsilon: 0.01
Episode 990, Score: 162.0, Timesteps: 162, Epsilon: 0.01
Episode 991, Score: 148.0, Timesteps: 148, Epsilon: 0.01
Episode 992, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 993, Score: 152.0, Timesteps: 152, Epsilon: 0.01
Episode 994, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 995, Score: 121.0, Timesteps: 121, Epsilon: 0.01
Episode 996, Score: 132.0, Timesteps: 132, Epsilon: 0.01
Episode 997, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 998, Score: 200.0, Timesteps: 200, Epsilon: 0.01
Episode 999, Score: 168.0, Timesteps: 168, Epsilon: 0.01
Episode 1000, Score: 126.0, Timesteps: 126, Epsilon: 0.01

--- Evaluation ---
Average score: 142.956
Episodes: 1000

Evaluation

Episode 1, Score: 200.0, Timesteps: 200
Episode 2, Score: 200.0, Timesteps: 200
Episode 3, Score: 200.0, Timesteps: 200
Episode 4, Score: 200.0, Timesteps: 200
Episode 5, Score: 200.0, Timesteps: 200
Episode 6, Score: 200.0, Timesteps: 200
Episode 7, Score: 200.0, Timesteps: 200
Episode 8, Score: 200.0, Timesteps: 200
Episode 9, Score: 200.0, Timesteps: 200
Episode 10, Score: 200.0, Timesteps: 200
Episode 11, Score: 200.0, Timesteps: 200
Episode 12, Score: 200.0, Timesteps: 200
Episode 13, Score: 200.0, Timesteps: 200
Episode 14, Score: 200.0, Timesteps: 200
Episode 15, Score: 200.0, Timesteps: 200
Episode 16, Score: 200.0, Timesteps: 200
Episode 17, Score: 200.0, Timesteps: 200
Episode 18, Score: 200.0, Timesteps: 200
Episode 19, Score: 200.0, Timesteps: 200
Episode 20, Score: 200.0, Timesteps: 200
Episode 21, Score: 200.0, Timesteps: 200
Episode 22, Score: 200.0, Timesteps: 200
Episode 23, Score: 200.0, Timesteps: 200
Episode 24, Score: 200.0, Timesteps: 200
Episode 25, Score: 200.0, Timesteps: 200
Episode 26, Score: 200.0, Timesteps: 200
Episode 27, Score: 200.0, Timesteps: 200
Episode 28, Score: 200.0, Timesteps: 200
Episode 29, Score: 200.0, Timesteps: 200
Episode 30, Score: 200.0, Timesteps: 200
Episode 31, Score: 200.0, Timesteps: 200
Episode 32, Score: 200.0, Timesteps: 200
Episode 33, Score: 200.0, Timesteps: 200
Episode 34, Score: 200.0, Timesteps: 200
Episode 35, Score: 200.0, Timesteps: 200
Episode 36, Score: 200.0, Timesteps: 200
Episode 37, Score: 200.0, Timesteps: 200
Episode 38, Score: 200.0, Timesteps: 200
Episode 39, Score: 200.0, Timesteps: 200
Episode 40, Score: 200.0, Timesteps: 200
Episode 41, Score: 184.0, Timesteps: 184
Episode 42, Score: 200.0, Timesteps: 200
Episode 43, Score: 200.0, Timesteps: 200
Episode 44, Score: 200.0, Timesteps: 200
Episode 45, Score: 200.0, Timesteps: 200
Episode 46, Score: 182.0, Timesteps: 182
Episode 47, Score: 200.0, Timesteps: 200
Episode 48, Score: 200.0, Timesteps: 200
Episode 49, Score: 200.0, Timesteps: 200
Episode 50, Score: 196.0, Timesteps: 196
Episode 51, Score: 200.0, Timesteps: 200
Episode 52, Score: 200.0, Timesteps: 200
Episode 53, Score: 190.0, Timesteps: 190
Episode 54, Score: 200.0, Timesteps: 200
Episode 55, Score: 200.0, Timesteps: 200
Episode 56, Score: 200.0, Timesteps: 200
Episode 57, Score: 200.0, Timesteps: 200
Episode 58, Score: 200.0, Timesteps: 200
Episode 59, Score: 200.0, Timesteps: 200
Episode 60, Score: 200.0, Timesteps: 200
Episode 61, Score: 200.0, Timesteps: 200
Episode 62, Score: 200.0, Timesteps: 200
Episode 63, Score: 200.0, Timesteps: 200
Episode 64, Score: 200.0, Timesteps: 200
Episode 65, Score: 200.0, Timesteps: 200
Episode 66, Score: 200.0, Timesteps: 200
Episode 67, Score: 200.0, Timesteps: 200
Episode 68, Score: 200.0, Timesteps: 200
Episode 69, Score: 200.0, Timesteps: 200
Episode 70, Score: 200.0, Timesteps: 200
Episode 71, Score: 200.0, Timesteps: 200
Episode 72, Score: 188.0, Timesteps: 188
Episode 73, Score: 200.0, Timesteps: 200
Episode 74, Score: 200.0, Timesteps: 200
Episode 75, Score: 200.0, Timesteps: 200
Episode 76, Score: 200.0, Timesteps: 200
Episode 77, Score: 194.0, Timesteps: 194
Episode 78, Score: 200.0, Timesteps: 200
Episode 79, Score: 198.0, Timesteps: 198
Episode 80, Score: 200.0, Timesteps: 200
Episode 81, Score: 200.0, Timesteps: 200
Episode 82, Score: 200.0, Timesteps: 200
Episode 83, Score: 200.0, Timesteps: 200
Episode 84, Score: 200.0, Timesteps: 200
Episode 85, Score: 200.0, Timesteps: 200
Episode 86, Score: 200.0, Timesteps: 200
Episode 87, Score: 200.0, Timesteps: 200
Episode 88, Score: 190.0, Timesteps: 190
Episode 89, Score: 200.0, Timesteps: 200
Episode 90, Score: 200.0, Timesteps: 200
Episode 91, Score: 200.0, Timesteps: 200
Episode 92, Score: 200.0, Timesteps: 200
Episode 93, Score: 200.0, Timesteps: 200
Episode 94, Score: 200.0, Timesteps: 200
Episode 95, Score: 197.0, Timesteps: 197
Episode 96, Score: 200.0, Timesteps: 200
Episode 97, Score: 200.0, Timesteps: 200
Episode 98, Score: 200.0, Timesteps: 200
Episode 99, Score: 198.0, Timesteps: 198
Episode 100, Score: 200.0, Timesteps: 200

--- Evaluation ---
Average score: 199.17
Episodes: 100

Leave a Reply

Your email address will not be published. Required fields are marked *