Sensing |
|
The connect 4 robot platform senses the world through a webcam. This is the only sense that it has available to work out what is going on in the game. The camera captures an image of the scene which the robot's "brain" (a computer) attempts to understand. Knowing that it is looking for a blue connect 4 board against a blue background, the program extracts key features from the scene and interprets them. |
![]() |
As humans we are strongly visual creatures, so we think it is easy to look at a picture of a connect 4 board and work out the state of the game. A computer finds it very difficult to reliably interpret a picture. We make the scene simple (by using a plain blue background behind the board) and provide knowledge of what it is looking for, in order for it to work. Even then it is not perfect. An interesting question is raised by the need to provide the computer with knowledge about what it is looking for. Is the same true for humans? Can we only see things that we know about? |
Technical detailsThe vision element is written in C++ using the Open Computer Vision (OpenCV) libraries. (The OpenCV libraries are an open source project and is available from here if anyone wishes to play around with computer vision.) The Connect 4 robots' vision is a multistage process with two main steps: locating the board and determining the presence of pieces.
An iterative process now starts, which attempts to fit horizontal and vertical lines through the points. (A horizontal line being defined as a line that lies within +/- 10 degrees of horizontal in the image and a vertical line is one that lies within +/-5 of vertical within the image). The process goes as follows:
The result is shown below:
The horizontal and vertical lines found in the step above are projected until they meet each other. The two places where the majority of the lines meet is taken as being the horizontal and vertical vanishing point for the image (see artistic literature on perspective in drawings). Using these two vanishing points as a guide the system then finds the best fit of a 6 by 7 grid (i.e. something matching the shape of a connect 4 board) to the likely circle centres from step 4 above.
These last two steps may seem complex but by knowing it is looking for a regular board, consisting of an array of circles, and then estimating its likely vanishing points, the system can robustly adapt (within reason) to different positions and poses of the connect 4 board within the image seen by the camera. Checking for piecesHaving estimated the location of the board the system checks for the presence of pieces at each of the estimated circle centres of the board. It samples the colour around each circle centre and then determines whether a piece is there or not using a set of rules which were learnt using a decision tree trained on previous images of connect 4 boards. The rules roughly correspond to checking how blue the region is but also take into account varying lighting conditions. The result, shown below, indicates the location of pieces with green 'x's and blue 'e's for empty locations.
The system is colourblind for red and yellow pieces - it only checks if a piece is there or not. Varying lighting conditions can make distinguishing between red and yellow very hard, i.e. a bright spot of reflection off a red piece can make it look yellow, whereas deep shadow on a yellow piece can make it look red. To get round this we (as humans) typically move our heads.The system can't control the placement of the camera so it has to live with reflections and shadows. Thus it is much more reliable if it ignores red and yellow and uses the order of play to determine colour. NB this also means the system can't detect if someone plays the wrong colour. A few checks are performed on the machine's interpretation of the image;
|