Crowd Computing

Last year, I was a part of the inaugural HackGT. This is an annual hackathon sponsored by Georgia Tech, which seeks to gather programmers from all around the country for one weekend to develop the best app they can. The grand prize is $60,000. The prize drew a lot of interest, but what compelled me to participate was the presence of a variety of big companies with new technologies. One such pre-announced presence was Intel, with an early look at the Edison board I wrote about last week. The board fascinated me, and the ability to hack on one for a weekend before it was even available for purchase ensured my name would be on the HackGT signup list.

Hackathons

If this word is unfamiliar to you, its time to learn. Hackathons are spreading, becoming more frequent at software companies, schools, clubs, even cities (see HackATL) because of their tendency to produce minimum viable product prototypes in a short amount of time. Essentially, a hackathon is just a gathering of programmers with the resources needed for extended programming times. Often these hackathons feature diversions and entertainment to allow for breaks, food and drink so you never need to leave, and caffeine and/or alcohol for those late night coding sessions. At the end of the 24-72 hour span, apps created by the participating teams and individuals are presented to judges in order to determine winners. These winners could be awarded prizes, or have their idea produced, or may even be offered a job.

Crowd Computing

Crowd computing was my HackGT project, done over a 48 hour period with 2 teammates. (See how much more sense that makes after the intro?)  The idea was to create a big data platform on a tiny board. These Edison boards were great, but they lacked space and computational power compared to traditional computers. In theory however, their price meant that there would be many of them. The number of boards combined with a tendency to be used for passive computation made them ripe for use in cloud computing. Essentially, jobs that couldn’t be run on one board could be run on LOTS of boards working together. A simple website would allow for you to enroll your board in the program by installing a tiny script. This script reports to the webserver every couple minutes to verify the availability of resources on the board. When a job submitted to the website needed boards, yours could be chosen, and the un-utilized resources would be used to compute a portion of the job. When a few dozen boards contribute as such, the resultant power is pretty astounding.

Our app leverages the Map Reduce framework common in big data applications, with a tiny twist. since the boards are hardly big enough to function as nodes, we had to use something with a little more power as the master node. The webserver played that role, allowing for mapper scripts to be run on it that distribute data and a reducer script to the Edisons. From there, the boards would each execute the reducer script on their small portion of data, then return the output to the webserver along with an id which denotes which board the data belonged to. In our proof-of-concept demo we used a very simple example. A single Edison would first attempt to sort the entire text of  War and Peace alphabetically in a very short python script. Simply allocating the space for the novel alone was a struggle, and once the sort process began, the ram began to overflow and the board rebooted. This was expected. This task is simply too large for the memory and computational capabilities of the device. For contrast, we uploaded the same task to our webservice, to which we had registered 6  boards. A mapper script was created along the following lines:

def map(text):
words = text.split(' ')
letters = dict()
for word in words:
#map each word to a list by its first letter
letters[word.lower()[0]] .append(word)
return letters

This split the book into 26 arrays by the starting letter (plus a few for symbols) for every word in the book. Now, we had smaller chunks we could work with. The webserver sent a single array of data to each device, along with the index of the array. Since “A” comes first, a machine would receive all the words beginning with “A”, plus an ID of 0. The device also received a short python script, which told it to sort the list, then communicate the results and original ID back to the webserver. This process repeated until all the arrays of words had been sorted and returned. At that point, the web server would run it’s handler, which sorts the lists by ID. Since “A” had an ID of 0, “B” was ID 1, and so on, the result was a completely sorted novel in a short period of time. In our example it took around 15 seconds to sort the entire book. When some of the devices are in use it may take longer to lobby for access to CPU time and memory, but the idea remains the same.

Where are we now?

The code is on my github. It was just recently open-sourced, and there’s a reason it took this long. The code is VERY sloppy. One of the downsides to hackathons is that programming competence tends to decrease with tiredness. After 36 straight hours of working on the code, we began to make VERY bad mistakes. compound that with a teammate leaving in the middle of the night and frustration with new technologies and poor internet connection, and you get a mess. I’m not entirely sure that what is on github will actually work anymore, and I know that what was on the webserver no longer works. However, over the course of the next few weeks, I intend to revisit it and clean up large sections of the code, hopefully producing a live website soon enough. Please feel free to contribute and fork, or just stay tuned for a beta invite if you own an Edison board (and if you don’t you totally should).

Visit the code HERE

That’s all for this week. Next week I will wrap up my discussion on the Edison for now with my latest and current project: “Rest Easy”. Until then, raise a glass and code on.

Advertisements

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s