Interview

Interview Question – Tail

In my hiatus, I finished my last semester at Georgia Tech and began my job search. I had an offer starting the semester, so I wanted to explore my options before the deadline near the end of the year. As such, in the 3 months of school, I participated in over 20 interviews with several companies across the US. During those interviews, I was exposed to a new style of programming: Interview programming (sometimes called “whiteboard coding”). This style of development is so fundamentally different from my experiences in classes and work that I decided to dedicate this post (and possibly a few others to come) to a few of the more interesting question I encountered.

The Problem

Through my interviews, I was asked an assortment of odd questions. Some were complicated: “write an algorithm to determine the shortest sequence of words in order out of a paragraph” and others were downright absurd: “If you had to spend a day as a giraffe, what do you think would be your biggest problem?”. The problem at the center of this post was a seemingly tame one, and possibly my favorite of the many questions I was asked.  It is worth noting that one of the key secrets to most interview questions is to find a clever use for a data structure. While many questions seem to have a simple brute force answer, it is typically a good idea to examine the question for possible uses of data structures (often a hash map). This question was no different.

The problem posed was to implement the linux command “tail” in O(n) time. For those unfamiliar with the command, running tail on a file or stream displays the last n lines (which can be specified with an argument). This is useful for monitoring logs, as new records are often appended to the end. It seems a simple problem to print lines [length-n, length] but that first requires indexing every line of the file. It causes problems with files smaller than n, very very large files, or streams. So we need a way general enough to scale up or down without performance losses.

Hash Map?

Sadly, this time the answer is not a hash map. It is however, another common data structure which can be creatively implemented to solve our problem. A linked list is a very simple data  structure. It is made up of nodes, which are small wrapper objects containing data and a pointer to another node. The list is referenced by storing the node at the top of the chain. This is a convenient structure for our needs because you can keep a list in memory without concerning  yourself with  the contents of the whole list.

So we have start with a node class, containing a line of text and a pointer to another node. This still leaves us with the problem of finding the end of the file, plus we do not have an index. So we need a class to manage the list. This class will be responsible for iterating through the file and creating the nodes for each line. Since we have a controller, we no longer need to  make the entire file into a list. Instead, we can append to our list until the length (tracked by the control class) is equal to the provided n lines of text. From that point on, in addition to appending to the end of the list, we can move the head. Moving the head pointer effectively subtracts a node from the front of the list. Repeating this algorithm through a file or stream should return the expected results regardless of filesize. At the end of the process the list should start a maximum of n lines from the end of the file, ending on the last line. This list can be iterated to print the correct lines. If a file is smaller than the specified length, the whole file will be printed, and should the file be very very huge, only the last lines will be printed without filling memory with the entire file.

The Train Rolls On

This problem had an elegant solution. Think of it as a sliding window or rolling train. The train adds cars until it hits its maximum length, then rolls down the tracks. The contents of the train at the end of the tracks are the desired lines. Visualizations like this can help in an interview, where the questions tend to be more about the way you think, rather than the best solution. Even if you were unable to implement the solution, explaining your approach with imagery shows you understand the problem.

So take some time this week to brush up on your data structures and the everyday struggles of being a giraffe. As always, raise a glass and code on.

Advertisements

Data Structures

Hey guys, sorry I’m a little late with an update and grasping at straws here for a topic, but I have a good excuse. For those of you who don’t know, Atlanta froze over a couple weeks ago.  Instead of being a responsible student, I let my inner child out and spent my week in the snow instead of doing work. As such, I fell behind and had to do 3 projects this week. Also, I have been doing interviews in all my free time, so I’ve got interview questions on the brain.

Strings. Always Strings.

In every single interview, the coding problem involved string manipulation. After the first interview, I intensively studied the string functions. However, after figuring out more intuitive solutions to most of the problems I learned the real important concept is Data Structures. The majority of programmers will solve a problem using a list or an array first, operating explicitly on strings or lists of strings. This allows the code to be understood easily, while still solving the problem. Problem is that they are usually not optimized at all. Using a data structure can exploit the behaviors of the structure, thus increasing efficiency while decreasing readability. For instance. My example data structure will be a hash table. I will demonstrate the verbose problem solution, and the optimized solution.

The Problem

Given a list of words, organize the words into groups by anagram.

Example :

in: [act, cat, tar, banana, rat]

out: [[act cat],[tar,rat],[banana]]

The verbose method of solving this problem involves iterating over each word and each group (nested) to discover where a word fits, or if you need a new group. This takes a very long time, but is easily readable with clear loops and order.

Readable code for anagram grouping

Readable code for anagram grouping

Its not fast, its not elegant, but it is functional.  On the contrary, a solution using a hash map is functional, fast, but very confusing at a glance. A hashmap uses a method called “hash()” to create an indexed key for data. In this case, creating a hash function which is based off of the component letters of a word allows us to map all words of the same hash value (same letters) into a single group. By exploiting the behaviour of the hash table, we can “chain” all these mapped words into lists, then return all non empty lists. With a mapping and lookup time of O(1) and a hashing time of O(n), we have a very fast function.

Moral

The moral of this post is to never let yourself be contained. A problem often has many sides, some of which could save time and resources. Programming is all about finding the unity between a problem and the desired solution in the language of logic and math. While humans are especially gifted at pattern recognition, it takes a special kind to see the reasoning behind the patterns, and teach a computer to recognize them. Sometimes a String problem is not String problem at all.