Through my interviews, I was asked an assortment of odd questions. Some were complicated: “write an algorithm to determine the shortest sequence of n words in order out of a paragraph” and others were downright absurd: “If you had to spend a day as a giraffe, what do you think would be your biggest problem?”. The problem at the center of this post was a seemingly tame one, and possibly my favorite of the many questions I was asked. It is worth noting that one of the key secrets to most interview questions is to find a clever use for a data structure. While many questions seem to have a simple brute force answer, it is typically a good idea to examine the question for possible uses of data structures (often a hash map). This question was no different.
The problem posed was to implement the linux command “tail” in O(n) time. For those unfamiliar with the command, running tail on a file or stream displays the last n lines (which can be specified with an argument). This is useful for monitoring logs, as new records are often appended to the end. It seems a simple problem to print lines
[length-n, length] but that first requires indexing every line of the file. It causes problems with files smaller than n, very very large files, or streams. So we need a way general enough to scale up or down without performance losses.
Sadly, this time the answer is not a hash map. It is however, another common data structure which can be creatively implemented to solve our problem. A linked list is a very simple data structure. It is made up of nodes, which are small wrapper objects containing data and a pointer to another node. The list is referenced by storing the node at the top of the chain. This is a convenient structure for our needs because you can keep a list in memory without concerning yourself with the contents of the whole list.
So we have start with a node class, containing a line of text and a pointer to another node. This still leaves us with the problem of finding the end of the file, plus we do not have an index. So we need a class to manage the list. This class will be responsible for iterating through the file and creating the nodes for each line. Since we have a controller, we no longer need to make the entire file into a list. Instead, we can append to our list until the length (tracked by the control class) is equal to the provided n lines of text. From that point on, in addition to appending to the end of the list, we can move the head. Moving the head pointer effectively subtracts a node from the front of the list. Repeating this algorithm through a file or stream should return the expected results regardless of filesize. At the end of the process the list should start a maximum of n lines from the end of the file, ending on the last line. This list can be iterated to print the correct lines. If a file is smaller than the specified length, the whole file will be printed, and should the file be very very huge, only the last lines will be printed without filling memory with the entire file.
The Train Rolls On
This problem had an elegant solution. Think of it as a sliding window or rolling train. The train adds cars until it hits its maximum length, then rolls down the tracks. The contents of the train at the end of the tracks are the desired lines. Visualizations like this can help in an interview, where the questions tend to be more about the way you think, rather than the best solution. Even if you were unable to implement the solution, explaining your approach with imagery shows you understand the problem.
So take some time this week to brush up on your data structures and the everyday struggles of being a giraffe. As always, raise a glass and code on.