gym_tictactoe_np.envs.tictactoe_np_env module

class gym_tictactoe_np.envs.tictactoe_np_env.TicTacToeEnv

Bases: gym.core.Env

3D TicTacToe environment without safety checks

The board is stored as a 3x3x3 numpy int array with player tokens. A value of 0 denotes an empty cell, 1 denotes player 1 (‘x’), and -1 denotes player 2 (‘o’).

Actions are given by a 3-element numpy int array with values in {0, 1, 2}. The first number represents the block to move in, second represents the row, and third the column.

action_space = MultiDiscrete([3 3 3])
check_win(action)

Checks if current action wins the game

Parameters

action (numpy.ndarray) – 3-element numpy int array which represents the cell to play on. First element specifies block, second row, and third column.

Returns

done – True if the current action wins the game else False

Return type

bool

static get_available_actions(board)

Utility function that returns currently available moves

Parameters

board (numpy.ndarray) – 3x3x3 numpy int array representing the board state

Returns

available_actions – Nx3 numpy array with the N currently available actions

Return type

numpy.ndarray

static get_empty_board()

Utility function that returns an empty board

Returns

board – 3x3x3 numpy int array of zeros

Return type

numpy.ndarray

metadata = {'render.modes': ['human']}
observation_space = Box([[[-1 -1 -1]   [-1 -1 -1]   [-1 -1 -1]]   [[-1 -1 -1]   [-1 -1 -1]   [-1 -1 -1]]   [[-1 -1 -1]   [-1 -1 -1]   [-1 -1 -1]]], [[[1 1 1]   [1 1 1]   [1 1 1]]   [[1 1 1]   [1 1 1]   [1 1 1]]   [[1 1 1]   [1 1 1]   [1 1 1]]], (3, 3, 3), int64)
render(mode='human', close=False)

Render the environment to the screen

reset()

Reset environment to initial state and return initial observation

Returns

observation – 3x3x3 numpy int array representing the new board state

Return type

numpy.ndarray

reward_range = (0, 1)
step(action)

Execute one time step within the environment

Parameters

action (numpy.ndarray) – 3-element numpy int array which represents the cell to play on. First element specifies block, second row, and third column.

Returns

  • observation (numpy.ndarray) – 3x3x3 numpy int array representing the new board state

  • reward (int) – Reward obtained after current move. 1 if game won else 0

  • done (bool) – True if the game is over else False

  • info (dict) – Additional information for debugging

symbols = '-xo'