Augmented Shanahan: Game Logic

Daniel D. Johnson • 2015-01-23

This is the third part of a three-part writeup on my augmented Shanahan project. If you would like to read about the algorithms behind the augmented reality, you should probably start with part one.

In this final part of the writeup, I will be discussing the augmented Shanahan snake game itself. The game is made of two parts, the client app, which runs on the mobile device, and the server app, which runs on my website server. The client is responsible for doing all of the detection, tracking, and extrapolation that I discussed in the first two parts. It also accepts user input, and renders the current game state. The server receives the user input from each client device, processes the snake game, and then broadcasts the updated game states to each client.

The first step of the client app is obtaining the image data from the device. I use the getUserMedia API to do this. This allows me to capture the camera data to a hidden video element, which I then draw into a canvas element so that I can extract an imageData array for each frame I want to process. Unfortunately, this API is not supported on iOS, so the augmented reality does not work on iOS devices. On devices that do not support this API, I instead simply use a static image I took of the wall as the input.

I tested the app on my HTC One Android phone, using both Firefox and Chrome. Since my phone has both a front-facing and back-facing camera, there was the issue of making sure I got the correct input. In Firefox, the user can choose which camera to use. In Chrome, however, the user does not have control over the camera chosen. However, Chrome supports the MediaStreamTrack.getSources API, which allows me to find the camera that Chrome reports as facing the "environment", and request that camera's output.

For some reason, Chrome's implementation of getUserMedia does not autofocus the camera, so the input is slightly blurry. I thus did the majority of my testing in Firefox, which has a better autofocus implementation.

An unfortunate aspect of this API is that it is very slow compared to the rest of my code. Despite all of the math and image processing I perform to detect and track the windows, simply getting the image data from the camera actually takes longer than all of the rest of my code combined, taking about 100 ms. In addition, loading the image data into the graphics card for WebGL takes approximately 50 ms. This is responsible for the very low frame rate of the augmented reality on mobile devices: it takes almost 200 ms to render a single frame to the screen. I did my best to optimize code, but unfortunately there is not much I can do to speed this up.

The client app uses a slightly modified version of virtualjoystick.js to get user input. I modified it to allow you to drag the virtual joystick base along with your finger if you moved your finger farther than the radius of the joystick. This made it easier to control the snake for my purposes. Each time the client detects a change in the direction of the joystick, it sends the input to the server.

The server app runs on node.js, and communicates with the client using socket.io. The app keeps track of every player's snake, and every 500 ms, it updates the positions of each snake, handles collisions, and then broadcasts the state to each player that is currently connected. When a new device starts the client app, it connects to the server and is assigned a new id. This id persists until the user disconnects from the server. Whenever a new user connects, the server initializes a new snake at a random open position, and whenever a user disconnects, the server removes the player and marks that player's snake as dead. (It does not instantly remove the snake, however. Continuing players have to avoid the dead body until it disappears on its own.) If all players leave, the server resets the game and pauses the timer until a new player arrives.

The application is very responsive on a computer, but it is less responsive on a mobile device, mainly due to the lack of processing power. My first version of the client app had a large delay: whenever a user tried to turn the snake, that turn would not occur until much later. As it turns out, this delay was not due to network delays, as I had originally thought, but was more closely related to the way I was rendering and updating the images. To see how this occurred, here is an example scenario. Assume that the server waits 500ms before advancing the game state, that the client takes 200ms to render a single frame, and that it takes 60ms for data to pass from client to server or vice versa.

Time 0000: Both client and server are at state 1. Client starts to render state 1.
Time 0150: Server advances the state, to state 2, and sends the updated state to the client.
Time 0200: Client finishes rendering state 1. User first sees state 1. Client begins rendering its current state (state 1) again, using new camera data.
Time 0210: Client receives data packet containing state 2, but does not process it because it is busy rendering. (Javascript does not have threading.)
Time 0400: Client finishes rendering state 1, and processes the data packet containing state 2. Client begins rendering state 2.
Time 0450: User, who still sees state 1, tries to turn snake. User input is not processed because client is busy rendering.
Time 0600: Client finishes rendering state 2, and processes the user input, sending the turn to the server. Client begins rendering state 2 again with new camera data.
Time 0650: Server advances the state, to state 3, and sends the updated state to the client.
Time 0660: Server receives the user input, and changes the snake direction.
Time 0710: Client receives data packet with state 3.
Time 0800: Client finishes rendering state 2, begins rendering state 3.
Time 1000: Client finishes rendering state 3, begins rendering it again.
Time 1150: Server advances the state, to state 4, and sends the updated state to the client. (State 4 contains the turned snake.)
Time 1200: Client finishes rendering state 3, begins rendering it again.
Time 1210: Client receives data packet containing state 4.
Time 1400: Client finishes rendering state 3, begins rendering state 4.
Time 1600: Client finishes rendering state 4. User finally sees the turn, three states and more than a second after attempting to turn.

The lack of synchronization and delay in rendering each state update thus make the app seem unresponsive and the game completely unplayable. When the user saw state 1, the server was already almost at state 3, so the user's input took effect not in state 2, but instead in state 4.

In order to make the game more playable, I made a few changes and sacrificed the frame rate. The changes I made were:

I swapped the order of the rendering and tracking. Thus, instead of first getting updated camera data, tracking the wall, and then rendering to the screen, I first render to the screen, then update the camera data and track the wall. This makes the rendering of the state less delayed, but instead delays the camera by one frame.
I inserted breaks in between the various time-intensive phases of the tracking and rendering. Basically, I told the browser to do anything in its queue (like updating the screen with the rendered frame, reading user input, and processing state updates) before continuing with the next phase of tracking. This ensures that the user input and drawing occur as early as possible, instead of blocking until tracking and rendering complete.
I time the tracking and rendering. If the processing is fast enough that it could render another frame before getting a state update, it goes ahead and renders another frame. But if the processing would delay the state update, the app simply waits for the state update. This reduces the frame rate, but ensures that any state updates are drawn as fast as they are received.

The updated timeline looks a bit like this:

Time 0000: Both client and server are at state 1. Client begins rendering state 1.
Time 0050: Client finishes rendering state 1, and starts preprocessing the next render.
Time 0150: Server advances the state, to state 2, and sends the updated state to the client.
time 0200: Client finishes preprocessing the next render. Client does not render state 1 again, as it expects to receive a new state.
Time 0210: Client receives data packet containing state 2, and processes it immediately. Client begins rendering state 2.
Time 0260: Client finishes rendering state 2, and starts preprocessing the next render.
Time 0410: Client finishes preprocessing the render. As it has enough time to render again, it renders state two again with the updated camera view.
Time 0460: Client finishes rendering state 2, and starts preprocessing the next render.
Time 0500: User, who sees state 2, tries to turn snake. User input is not processed because client is busy preprocessing
Time 0560: Client pauses during preprocessing to process user input. User input is sent to server.
Time 0610: Client finishes preprocessing the next render. Client decides it does not have enough time to render again, and begins to idle. Meanwhile, server receives user input, and changes the snake direction.
Time 0650: Server advances the state, to state 3, and sends the updated state to the client. (State 3 contains the turned snake.)
Time 0710: Client receives data packet containing state 3, and processes it immediately. Client begins rendering state 3.
Time 0760: Client finishes rendering state 3, and starts preprocessing next render. User sees the turn, one state later, as expected.

With the changes, the device is only about 100 milliseconds behind the server, as opposed to a second behind as before. That means that, when the user sees a state and tries to turn the snake, that turn is registered before the server advances, and so the turn takes effect at the next state update. Unfortunately, it also means that the frame rate of the application is throttled by the frequency of updates.

All of the code for both the client and the server applications are available on GitHub. As I discussed earlier, you can run the game yourself by visiting this address on a non-iOS device with a back-facing camera. It works best when the wall is evenly-lit, and is more fun when there are multiple people accessing it simultaneously. Enjoy!