gym_tictactoe_np.envs.tictactoe_np_env module

class gym_tictactoe_np.envs.tictactoe_np_env.TicTacToeEnv

Bases: gym.core.Env

3D TicTacToe environment without safety checks

The board is stored as a 3x3x3 numpy int array with player tokens. A value of 0 denotes an empty cell, 1 denotes player 1 (‘x’), and -1 denotes player 2 (‘o’).

Actions are given by a 3-element numpy int array with values in {0, 1, 2}. The first number represents the block to move in, second represents the row, and third the column.

action_space = MultiDiscrete([3 3 3])

check_win(action)

Checks if current action wins the game

Parameters: action (numpy.ndarray) – 3-element numpy int array which represents the cell to play on. First element specifies block, second row, and third column.
Returns: done – True if the current action wins the game else False
Return type: bool

static get_available_actions(board)

Utility function that returns currently available moves

Parameters: board (numpy.ndarray) – 3x3x3 numpy int array representing the board state
Returns: available_actions – Nx3 numpy array with the N currently available actions
Return type: numpy.ndarray

static get_empty_board()

Utility function that returns an empty board

Returns: board – 3x3x3 numpy int array of zeros
Return type: numpy.ndarray

metadata = {'render.modes': ['human']}

observation_space = Box([[[-1 -1 -1] [-1 -1 -1] [-1 -1 -1]] [[-1 -1 -1] [-1 -1 -1] [-1 -1 -1]] [[-1 -1 -1] [-1 -1 -1] [-1 -1 -1]]], [[[1 1 1] [1 1 1] [1 1 1]] [[1 1 1] [1 1 1] [1 1 1]] [[1 1 1] [1 1 1] [1 1 1]]], (3, 3, 3), int64)

render(mode='human', close=False): Render the environment to the screen

reset()

Reset environment to initial state and return initial observation

Returns: observation – 3x3x3 numpy int array representing the new board state
Return type: numpy.ndarray

reward_range = (0, 1)

step(action)

Execute one time step within the environment

Parameters

action (numpy.ndarray) – 3-element numpy int array which represents the cell to play on. First element specifies block, second row, and third column.

Returns

observation (numpy.ndarray) – 3x3x3 numpy int array representing the new board state
reward (int) – Reward obtained after current move. 1 if game won else 0
done (bool) – True if the game is over else False
info (dict) – Additional information for debugging

symbols = '-xo'