gym_tictactoe_np.envs.tictactoe_np_env module
- class gym_tictactoe_np.envs.tictactoe_np_env.TicTacToeEnv
Bases:
gym.core.Env
3D TicTacToe environment without safety checks
The board is stored as a 3x3x3 numpy int array with player tokens. A value of 0 denotes an empty cell, 1 denotes player 1 (‘x’), and -1 denotes player 2 (‘o’).
Actions are given by a 3-element numpy int array with values in {0, 1, 2}. The first number represents the block to move in, second represents the row, and third the column.
- action_space = MultiDiscrete([3 3 3])
- check_win(action)
Checks if current action wins the game
- Parameters
action (numpy.ndarray) – 3-element numpy int array which represents the cell to play on. First element specifies block, second row, and third column.
- Returns
done – True if the current action wins the game else False
- Return type
bool
- static get_available_actions(board)
Utility function that returns currently available moves
- Parameters
board (numpy.ndarray) – 3x3x3 numpy int array representing the board state
- Returns
available_actions – Nx3 numpy array with the N currently available actions
- Return type
numpy.ndarray
- static get_empty_board()
Utility function that returns an empty board
- Returns
board – 3x3x3 numpy int array of zeros
- Return type
numpy.ndarray
- metadata = {'render.modes': ['human']}
- observation_space = Box([[[-1 -1 -1] [-1 -1 -1] [-1 -1 -1]] [[-1 -1 -1] [-1 -1 -1] [-1 -1 -1]] [[-1 -1 -1] [-1 -1 -1] [-1 -1 -1]]], [[[1 1 1] [1 1 1] [1 1 1]] [[1 1 1] [1 1 1] [1 1 1]] [[1 1 1] [1 1 1] [1 1 1]]], (3, 3, 3), int64)
- render(mode='human', close=False)
Render the environment to the screen
- reset()
Reset environment to initial state and return initial observation
- Returns
observation – 3x3x3 numpy int array representing the new board state
- Return type
numpy.ndarray
- reward_range = (0, 1)
- step(action)
Execute one time step within the environment
- Parameters
action (numpy.ndarray) – 3-element numpy int array which represents the cell to play on. First element specifies block, second row, and third column.
- Returns
observation (numpy.ndarray) – 3x3x3 numpy int array representing the new board state
reward (int) – Reward obtained after current move. 1 if game won else 0
done (bool) – True if the game is over else False
info (dict) – Additional information for debugging
- symbols = '-xo'