Recently, I submitted a paper titled “Learning Graphical State Transitions” to the ICLR conference! In it, I describe a new type of neural network architecture, called a Gated Graph Transformer Neural Network, that is designed to use graphs as an internal state. I demonstrate its performance on the bAbI tasks as well as on some other tasks with complex rules. While the main technical details are provided in the paper, I figured it would be worthwhile to describe the motivation and basic ideas here.
Note: Before I get too far into this post, if you have read my paper and are interested in replicating or extending my experiments, the code for my model is available on GitHub.
Another thing that I’ve noticed is that almost all of the papers on machine learning are about successes. This is an example of an overall trend in science to focus on the positive results, since they are the most interesting. But it can also be very useful to discuss the negative results as well. Learning what doesn’t work is in some ways just as important as learning what does, and can save others from repeating the same mistakes. During my development of the GGT-NN, I had multiple iterations of the model, which all failed to learn anything interesting. The version of the model that worked was thus a product of an extended cycle of trial and error. In this post I will try to describe the failed models as well, and give my speculative theories for why they may not have been as successful.
This summer, I had the chance to do research at Mudd as part of the Intelligent Music Software team. Every year, under the advisement of Prof. Robert Keller, members of the Intelligent Music Software team work on computational jazz improvisation, usually in connection with the Impro-Visor software tool. Currently, Impro-Visor uses a grammar-based generation approach for most of its generation ability. The goal this summer was to try to integrate neural networks into the generation process.
Last semester, I was part of the Clay Wolkin Fellowship at Harvey Mudd. The fellowship consists of a group of students (mostly Engineering majors, but some CS also) who work on interesting electrical-engineering-focused projects. The project I worked on was the “LEG Processor”, an open-source pipelined processor that implements the ARMv5 instruction set and can boot the Linux kernel (3.19) in simulation. We recently published a paper describing our work in the European Workshop for Microelectronics! You can read the paper here. Or read on for a high-level overview of the work I did on the project.
It’s hard not to be blown away by the surprising power of neural networks these days. With enough training, so called “deep neural networks”, with many nodes and hidden layers, can do impressively well on modeling and predicting all kinds of data. (If you don’t know what I’m talking about, I recommend reading about recurrent character-level language models, Google Deep Dream, and neural Turing machines. Very cool stuff!) Now seems like as good a time as ever to experiment with what a neural network can do.
For a while now, I’ve been floating around vague ideas about writing a program to compose music. My original idea was based on a fractal decomposition of time and some sort of repetition mechanism, but after reading more about neural networks, I decided that they would be a better fit. So a few weeks ago, I got to work designing my network. And after training for a while, I am happy to report remarkable success!
This is the second part of a three-part writeup on my augmented Shanahan project. You may want to start with part one. For some background math, you may also want to read the post about my 5C Hackathon entry
From the detection and tracking algorithms, I was able to generate a 2D perspective transformation matrix that mapped coordinates on the wall to pixels in the image. In order to do 3D rendering, however, I needed a 3D transformation matrix. The algorithm I ended up using is based on this post on Stack Exchange, this gist on GitHub, and this Wikipedia page. I will attempt to describe how the algorithm works in the rest of this post.
When I was thinking of ideas for the 5C Hackathon back in November, I thought it would be really cool to do augmented reality on buildings. They are large, somewhat unnoticed features of the environment, and they are relatively static, not moving or changing over time. They seemed like the perfect thing to overlay with augmented reality. And since they are so large, it opens up the possibility for multiple people to interact with the same wall simultaneously.
Note: This is mostly an explanation of how I implemented detection. If you prefer, you can skip ahead to the 3D extrapolation and rendering in part 2, or to the game logic in part 3.
If you instead want to try it out on your own, click here. The augmented reality works on non-iOS devices with a back-facing camera and a modern browser, but you have to be in the Shanahan building to get the full effect. Denying camera access or accessing it on an incompatible device will use a static image of the building instead of using the device camera. Works best in evenly-lit conditions. Image quality and detection may be slightly better in Firefox than Chrome, which doesn’t autofocus very well.
I’ve been at Harvey Mudd College for almost three months now, and I’m enjoying it a lot. So far, I’ve participated in two hackathons: first, MuddHacks, a hardware hackathon, and then last weekend, the 5C Hackathon, a software hackathon that spans the 5 Claremont Colleges (Harvey Mudd, Pomona, Scripps, Pitzer, Claremont McKenna). In this post, I’m going to be talking about the project I made for the second of the two hackathons.
But wait! You haven’t talked about the first one yet! You can’t do a part 2 without doing a part 1! I’d like to tell you all about how I got caught in a time machine that someone built in the hardware hackathon and so my past self posted about the first hackathon in this timeline’s future, but then I’d have to get the government to come brainwash you to preserve national security I’d be lying. So instead I’ll explain that I haven’t actually finished the project from the first hackathon yet. Both of these were 12-hour hackathons, and about 10.5 hours into the first one, I realized that our chosen camera angle introduced interference patterns that made our image analysis code completely ineffective. I’m hoping to find some time this weekend or next to finish that up, and I’ll write that project up then.
After completing the basic design and software for the whiteboard drawing bot, I decided to make an interactive simulator and shape editor to allow people to generate their own commands. I thought it would be cool to share it as well, in case other people wanted to play with the stroking/filling algorithms or use it to run their own drawing robots or do something else with it entirely.
For the simulator, I wrote a parser to parse the command sequence, and then animated it drawing the command out onto the screen. The parser is written with PEG.js, which I'll be discussing a bit later. The parameters for the generation and rendering are controlled using DAT.gui, and the drawing itself is done using two layered canvases: the bottom one to hold the actual drawing that persists from frame to frame, and the top one to render the arms, which are cleared and redrawn each frame. I separated them because I did not want to have to re-render the entire drawing each time the simulator drew anything new.
In order to actually use my whiteboard drawing bot, I needed a way to generate the commands to be run by the Raspberry Pi. Essentially, this came down to figuring out how to translate a shape into a series of increases and decreases in the positions of the two stepper motors.
The mechanical design of the contraption can be abstracted as simply two lines: one bound to the origin point and one attached to the end of the first. The position of the pen is then the endpoint of the second arm:
The arms cannot move freely, however. The first arm can only be at discreet intervals, dividing the circle into the same number of steps as the stepper motor has. The second arm can actually only be in half that many positions: it has the same interval between steps, but can only sweep out half a circle. Furthermore, the second arm's angle is relative to the first's: if each step is 5 degrees and the first arm is on the third step, then the second arm's step zero follows the first arm's angle and continues at 15 degrees.
For the last couple of weeks, I have been working on creating a "whiteboard drawing bot", a Raspberry-Pi-powered contraption that can draw shapes and text on a whiteboard. After four redesigns and about a thousand lines of code, I'm finally finished. Tada!
Anyways, now that I have finished building it, I am going to write a bit about how I did so in a few posts. For this first post, I'm going to be talking about the hardware behind the bot and a little bit of the software on the Raspberry Pi. Then, later, I'll talk about the custom software I wrote to translate shapes and lines into commands for the Pi.