We will deal only with finite graphs. A finite graph is a set of vertices V and a set of edges E. The vertex set V is just a bunch of elements. In the simplest sort of graphs, the edge set E is a subset of the set of all possible two element subsets of V. As a notation:

V^{(2)} = { v ⊆ V | |v|=2 }

and then E ⊆ V. The edge e ∈ E is therefore a set {a,b}, which a ≠ b, and is the edge between vertex a and b. Note that such a definition:

- does not allow a “self loop”, because the set {a,a} is really the set {a}, and

is therefore not two elements; - does not allow multiple edges between {a,b}, because the element {a,b} is either

in E or not in E, it can’t be in E multiple times. That’s just how set theory works; - does not have a direction to the edge, as the set {a,b} and {b,a} are the same set.

A second sort of graph theory selects edges from the cartesian product of the vertices,

E ⊆ V × V.

An element e ∈ E is an ordered pair (a,b), so that that edge is *directed* from a to b. Also, since (a,a) is a permissible pair, self loops are also allowed. These are called *directed graphs*, for short: *digraphs*.

However, multiplicity is still not allowed. To represent multiplicity one can simply reject the fussiness of mathematicians and consider a “multi-set”, in which elements can be in the more than once, or attach a function to the edges, μ:E→ℕ which given an edge evaluates to the number of times that edge is in the graph.

In general, a function w:E→X from edges some set X allows for a *weighted graph*. Sometimes the weight is a real number expressing the desirability of an edge, or, for instance, if the graph represents a road network, the number of miles between junction points (the vertices).

**Trees**

A *tree* is a graph that contains no cycles. Typically it is also *connected*, meaning that for any two vertices v, w ∈ V there is a sequence of edges, a *path*, that begins at v and ends at w. A *cycle* is a path that begins and ends at the same vertex but does not repeat an edge. A graph without cycles is called an *acyclic graph*. A tree is a connected, acyclic graph.

Typically, a tree has a distinguished vertex called the root. Since our vertex sets are finite, and the graph is acyclic, every path from the root ends at a vertex, and that ending vertex is called a *leaf* of the tree. This establishes a direction among edges in the tree, from root to leaf, and in that direction, a parent or ancestor vertex travels towards child or descendent vertex.

Trees are very popular as computer science data structures. One possible reason is the following property: in a tree, for every pair of vertices v, w ∈ V there is a unique path from v to w. Either one is an ancestor of the other, or the path climbs from v to the *least common ancestor* of v and w, and then descends towards w.

In a general graph, with vertices V and edges E, one can pick up a subgraph with the same vertex set V but a subset E^{*} ⊆ E of edges so that E^{*} is a tree. This is called a *spanning tree* for the graph. There are important practical applications of spanning trees. For instance, a communications network is often a graph with multiple interconnections between vertices. Routing is the selection of a spanning tree, and it instructs the network how to forward data from one node to another.

In this course we will learn depth first and breath first searches of graphs to identify spanning trees, as well as deal with the more complicated problem of finding a least weight spanning tree of a graph with weighted edges. Also, we will look at the situation of shortest paths from a starting point, when the length of a path is given by edge weights.

**The origins of graph theory**

The mathematical definition of graphs and their consideration was started by Euler who in 1735 considered the Seven Bridges of Königsberg problem. In the town of Königsberg there were seven bridges and the question was whether one can cross every bridge in a sequence exactly once. Euler abstracted away the details of the land masses and turned the problem into one of nodes and edges, thus founding graph theory.

Once the problem was set into graph theory, the argument ran that in order for there to be a path, whether circular beginning and ending at the same node, or a path beginning and ending at different nodes, any intermediate node which is entered must be exited, hence the number of edges attached to a node must be arranged in pairs. Since the particular graph given by the seven bridges problem had too many vertices with an odd number of edges attached to the node, it is impossible to traverse all bridges.

The *degree* of a vertex is the number of edges that connect to the vertex.

**Eulerian graphs:** A connected graph G = (V,E) has an eulerian path if and only if every vertex is of even degree, except two which are of odd degree.

**Planar graphs, and graphs of higher genus**

I recall the following puzzle which many of you are familiar. Given the two graphs drawn below, they are drawn with a the unfortunate defect that edges must cross. Can these graphs be drawn so that no edges cross?

Graphs that can be drawn in the plane without any edges crossing are called *planar graphs*. The two examples here are not planar — it is not possible to drawn them without one edge crossing another. The first graph has five vertices and 10 edges such that between every pair of vertices has an edge. This is called the *complete graph* on five vertices and is well known among graph theoretists as K_{5}. The second graph has two sets of three vertices, and 9 edges that connect each pair of vertices when taken from the two sets. This is called the *complete bi-partite graph* and is denoted K_{3,3}.

**Kuratowski’s theorem:** Graphs K_{5} and K_{3,3} are not planar. Furthermore, K_{5} and K_{3,3} completely describe the obstruction from planarity for any graph — that is, if neither K_{5} nor K_{3,3} can be found as a subgraph of a graph G, then G is planar.

Although K_{5} and K_{3,3} cannot be drawn in the plane, they can be drawn on the surface of a doughnut. They can be drawn so that there is only one edge crossing. Instead of crossing, pass one edge over the over as follows: cut two holes out of the plane, one on each side of the edge that will stay drawn on the plane. Glue a tube with one circular edge glued to one hole, and the other circular edge to the other hole. Now draw the other edge on the tube so it cross up and over the fixed edge.

Now consider limiting the infinite plane to a disk, in which all of the graph and the tube over pass are inside the disk. Close the disk to a sphere by contracting the edge of the disk to a point. Now you have a sphere with a little handle. Equalize the handle size to the size of the sphere and you now have a doughnut, no which is drawn K_{5} or K_{3,3}

K_{6} and K_{7} can also be drawn on a doughnut. The one “hole” or handle is enough to pass all the edges that might need crossing for these graphs. For the bi-partite graphs, K_{4,4} can also be drawn on a doughnut. However, K_{8} needs a “double doughnut”, two overpasses or holes, and K_{9} needs a triple doughnut. The number of holes is called the genus of a the surface, and the formula for the required genus g for a complete graph K_{n} is:

g = ⌈ (n-3)(n-4)/12 ⌉

**Data structures to represent graphs**

From an Abstract Data Type point of view, a graph presentation should be able to answer the following two queries:

- Is there an edge (i,j)?
- Enumerate all edges leaving (entering) vertex i?

On possible representation is a linked list of all edges. However, if the number of edges is e, then the time to answer these questions is Θ(e), as the entire linked list must be traversed.

A second possible representation is the *adjacency-matrix*. For v vertices, store in a v by b matrix a 1 in location (i,j) if there is an edge between i and j, else store a 0. This gives Θ(1) query time for edge existence, and Θ(v) query time to enumerate all incoming (outgoing) edges, as one must iterate over the entire row (column) of the matrix.

A drawback of the matrix approach is that planar graphs are frequent, and planar graphs have very few edges as a function of vertices. In fact, for a planar graph |E| ≤ 3 |V| – 6. For this reason, an adjacency-matrix representation of a planar graph will be overwhelmingly filled with zero’s and is considered a wasteful representation.

A third possibility is the *adjacency-list*. For each vertex i keep a linked list of j’s such that (i,j) is and edge. It is very efficient then to enumerate edges leaving a vertex. For undirected graphs, i will also have to be noted on j’s linked list. To answer the existence of an edge (i,j), the time will depend on the length of i or j’s list, and could be as large as O(v).

**Understanding graphs**

Give a graph, what can we say about it, in summary?

We can understand a graph in terms of its properties. For instance, we have already seen that certain graphs are Eulerian, and that certain graphs are planar. There is a simple algorithm that finds an Eulerian cycle if one exists. Non-planar graphs have a minimum genus for their embedding. A formula provides at least some insight into the genus in the case of complete graphs and complete bipartite graphs.

We will explore graphs using two algorithms: breath first search and depth first search. These algorithms will organize the vertices and edges into a spanning tree, and label the edges as back, forward, tree or cross. One property of a graph is that it be acyclic, it have no paths that lead from a vertex back to that vertex (for directed graphs). Depth first search provides a simple way to determine if a graph is acyclic: a graph is acyclic if a depth first search gives no back edges.

If a graph is acyclic, DFS will also provide a way of labeling the vertices with distinct positive integers such that arrows always go towards higher numbers. This labeling is not always unique: more than one labeling might satisfy the constraint that arrows always go towards higher numbers. However if there is a cycle then it is obviously not possible to have such a labeling.

If there are cycles, the graph can be understood as a decomposition into *strongly connected components (SCC)* and acyclic interactions between SCC’s that can be topologically sorted. A strongly connected component is a maximal set of vertices among the vertices in a directed graph such that for any two vertices in the set there is a directed path from one to the other, and back again. SCC can be discovered using a method of two DFS’s, first on the original graph, and the second on a simply modified version on the original graph, done in a specific ordering.

While all of these properties are efficiently discovered, graph theory can also present problems at the other extreme of complexity. A Hamiltonian Cycles is a path of edges in a graph starting from a vertex, returning to that vertex, and visiting exactly once every other vertex in the graph. While very similar in flavor to Eulerian path, this problem is NP-Complete.

**Conclusion**

This was a quick overview of some of the intriguing facts about pure graph theory. In this course we will stay with graphs that more directly model simple physical systems, generally of interaction or interconnection. However, I wanted to also indicate that graph theory has is its own mathematical field of study, and some of the algorithms for practical problems in graph theory have been discovered by practitioners of the pure theory.

]]>An assertion is a logical statement that should evaluate to true. Assertions can be just a mind-amge, at most a notation or comment in the coade. They have become so popular that now some languages have assert statements, which are actual code that evaluates the assertion. If the assertion evaluates false the code flow might be terminated or an exception thrown.

There are three types of assertions of particular interest: pre-conditions, post-conditions, and loop invariants.

A code block, say W, must assume various things are true in order that it function properly. One can assert before the code block those truths as a logical statement, and call it a pre-condition for the code. An assertion does not make it so – it is an attempt to make clear what needs to be made true before block W runs.

The block W accomplishes something – it sets the world into a new state, and further code can rely on that state. One can make an assertion modeling that state and place it at the end of block W, and this is called a post-condition. Just as with a pre-condition, asserting a post-condition does not make it true, the block W makes it true. In fact, the promise of W is that if the world is such that the pre-condition is true before the block runs, then it will take action such that the post-condition will be true once the block has run.

For instance, the code:

```
int f( struct * pair ) {
if ( pair->a < pair->b ) {
t = pair->a ;
pair->a = pair->b ;
pair->b = t ;
}
// ASSERT: pair->a >= pair->b
}
```

has transformed the very weak pre-condition that pair is a struct containing to integers a and b to the post-condition that a is the larger of the two, and b the smaller (or they are the same value).

Consider placing the fraction a/b into lowest terms. The code looks as follows

```
// ASSERT: a, b are integers, b!=0
{ code block U }
// ASSERT: 0 < b <= a, and s is the sign of (the original) a/b
{ code block V }
// ASSERT: d is the gcd(a,b) (function gcd's post-condition)
{ code block W }
// ASSERT: a/b is in lowest terms
```

**"I'VE BEEN SPEAKING PROSE ALL MY LIFE WITHOUT KNOWING IT"**

The method of assertions is implicit each time a programmer references a pointer (or, for Java programers, a reference). A null pointer cannot be dereferenced. It will cause the program to crash, or the Java system to throw an exception. The programmer mentally must have, or should have, an assertion of non-null

before the use of the pointer.

Null pointer exceptions (segmentation faults in C) are very common because of the failure to satisfy this pre-condition to pointer use. That means there was a failure of the post-condition leading up to the pointer use.

```
W
// assert pointer p non-null (even, p is a non-null object of
// type T)
dereference of p
```

The assertion is the post-condition of W and the pre-condition to pointer use.

In this situation the method of assertions reminds the programmer that there should be a good and hopefully simple reason why the pointer is non-null. It encourages the programmer to recognize the logic that achieves the assertion, and even encourages that the program structure lend itself to simplify the logic that leads towards the assertion.

**LOOP INVARIANTS**

The method of assertions can be applied to the correctness of loops through a type of assertion called the loop invariant.

Given a loop:

```
while CONDITION:
code block W
```

one aspect is that the code block W is repeatedly called. If the code block W is conceived as a transformer from pre-conditions to post-conditions, since the post-condition of one run of W must match the pre-condition for the following run, the post and pre-conditions must be the same assertion.

Since this is an assertion that does not change, it is called invariant, and hence the name loop invariant. Importantly, the assertion need not be true inside of W; in fact, the way the methods works is in order to make progress towards completion of the loop, the code in W takes a tentative step forward, perhaps making the

invariant false. The following code in W fixes up the situation to restore the invariant.

If well done, then the correctness of the looping follows effortlessly. One element that has not been considered is the condition for termination. The loop invariant plays no role in that. Termination is taken as a separate issue, and a separate proof is provided as to why the loop will terminate.

This is a very smart thing to do. Although termination might seem a simple matter, that not only contradicts the numerous experiences programmers have with infinitely spinning loops, but the theoretical observation that the solution to a problem can be encoded in the termination or not of a program. There is no reason why termination

is a simpler problem than body correctness and in fact they are of equal difficulty.

For instance, one might a question in diophantine equations: given a formula in several variables with integer coefficients, are there integer assignments to the variables that make the formula evaluate to zero?

One solution is to walk through all integer assignments checking if that assignment yields zero. If it does, the loop breaks. Whether or not there is a solution is then exactly the problem of whether or not the loop terminates.

The loop invariant, as an assertion, is placed before the first W, between consecutive runs of W, and after that last W: A W A W ... W A. In the code flow, however, it is placed before the while line and after the last line in W:

```
Assert L.I.
while CONDITION:
code block W
Assert L.I.
```

Getting right the first assertion is generally done by setting up a simple to the point of trivial situation. Often asserting the truth of one or zero elements. It is a bit different than following loop invariant assertions because this assertion, and this assertion alone, you don't get to assume a previous assertion of the loop invariant. The programmer has to make it true out of other assertions.

Since the condition does not (should not!) mess with the loop invariant, it is good enough to check the invariant at the bottom of the loop in order to assure it is true at loop exit. Care might also be needed if the code block W uses a break statement to exit from within the block. One should put a loop invariant assertion just before each break.

**RELEVANCE CONDITIONS**

It can be difficult to come up with a proper loop invariant. It is not uncommon, and certainly not unwise, to "write to the invariant", that is, favor an approach to the problem that lends itself to invariants. Among the requirements of the invariant is that it captures simply the problem requirements, that it can be disturbed but easily reestablished by some forward step in the loop, and that it is relevant to the problem.

Of these, I have found that the relevance requirement is the hardest to capture mathematically. For instance, the statement: the current data is a permutation of the original data. It is not reasonable to check that this is true. It would be computationally expensive, and generally unnecessary as the condition can be checked by eye.

I call "relevance conditions" parts of the loop invariant that refer back to the original problem, and I am not surprised when those conditions are complex, and in practice will not be checked explicitly by any assertion logic.

**EXAMPLES OF LOOP INVARIANTS**

A simple example is finding the maximum of n elements in an array A. The problem is recast as finding finding the maximum of the first i elements in an array A of n elements, max(A,i.n). The loop invariant is that variable m == max(A,i,n).

The loop invariant is established before entering the while loop by setting m := A[0], then m == max(A,1,n).

Assuming m == max(A,i,n), a pass through the loop body first sets i := i+1. There are two possibilities, if A[i+1] <= m, then m == max(A,i+1,n) and the loop invariant is still true. Else A[i+1] > m, and is the new maximum. Setting m := A[i+1] we reestablish that m == max(A,i+1,n).

One pass through the loop is now completed.

Termination is that i increments each time through the loop and the loop terminates when i equals n. Then m == max(A,n,n), which is the desired result. Note that when setting up for the loop, we used that there is no maximum to the empty set of numbers, hence n must be at least 1.

Please see the algorithm animation of this for a visualization of the algorithm.

Code for additional examples is at git hub. If you wish, you can fork these and let the make file guide you to compile and run the examples. The program reduce.c is show a straight-line program using assertions. Splitter.c demonstrates a loop invariant for arranging data in an

array, splitting around a value into large and small values.

**Exercise**

Try writing the loop invariant for the partition algorithm proposed in the Introduction to Algorithms book.

]]>This is a simple introduction to certain complexity classes, hence a “petting zoo”. The actual complexity zoo can be found here.

Algorithms are mathematically precise, step-wise procedures for solving problems. The subject of algorithms is concerned with not only how to solve a problem, but how to solve it efficiently. Efficiency often means in the smallest number of steps, hence the least amount of time as measure in the algorithm’s actual run time.

The step counting approach to algorithm analysis requires a bit of reduction, common sense, and experience. Until the algorithm is completely written down in code, we can’t be precise about the number of steps taken, which might be the actually number of lines of source code run; and to some degree it doesn’t matter. What one coder might do in 5 lines of code, another might do in 3. Therefore the analysis tends to focus on just a few crucial lines of code, and counting the number of times those lines are run. Sort of like counting distance run by counting laps, the number of times the runner passes a distinguished location on the track, not counting the crossing of every foot marker (if you can imagine this track also being marked foot by foot). Even in a vague pseudo-code language, these lines can be identified and counted.

An important realization is that we cannot think reasonably about a single problem, we have to think about an infinite set of problems in a problem family. We cannot reasonably as the efficiency of the problem “does 2+2=4?” The solution to this problem is for the machine to just output “Yes”. By extension, any problem family with a finite set of problems such as: “does 2+2=4?, does 5+3=7?, does 4+6=10?” can be reduced to a finite table, “Yes, No, Yes” in this case, the the universal solution is to number the problems in the family and look up the answer by problem number in the table. And then there really is no theory of complexity actually. There is nothing to talk about … it’s all identify the problem by number, and look up the answer in the table.

So our concern is families with infinite problem instances, so that we have to calculate the solution. It can’t be pre-calculated, because the table of pre-calculated answers would never be finished, so we could never start our solution. You can ponder this a bit. It’s very fun, simple, but pretty enlightening about what is a proper algorithm.

Since the family is infinite, the size of description of the instances must grow without bound. It is impossible to specify one out of an infinite collection using a fixed number of symbols from a fixed set of symbols. We do not allow the symbol set to expand, we fix it at, say, the letters A through Z and some punctuation, including some specialized mathematical symbols, or even fix the symbol set to be {0,1}, and express everything coded in binary. This is a natural point of view for those interested in computers. In a computer it is all zero’s and one’s.

A measure of the description of the instance is fixed, typically denoted n, and the number of steps to the solution of the problem is given as a function of n. It is taken as a given that the determinant of problem complexity, the run time, will be a function of description size. Often we write T(n), the time for a problem of size n.

There is a bit of subtlety, because perhaps T(n) depends on which problem of size n. The simplest approach to complexity takes T(n) to be the maximum over all problems of size n. This is called *worst case analysis.* It has the advantage of being simplest, and also being conservative in stating the qualities of the algorithm. But in practice it might be too pessimistic.

There are two other subtleties in discussion the solution to a problem. The algorithm might be permitted to give wrong answers, a certain percentage of the time. In the case of a yes or no answer, the algorithm will be strictly wrong: it will answer yes when the fact is the solution is no, etc. This error can be one sided in the sense that it might always answer no when the solution is no, but might answer yes or no when the solution is yes. In this case, only the yes answer give confidence. If the answer is yes, the solution is yes. If the answer is no, although the solution is most likely on, certain yes instances answer no as well.

The error might be by approximation. The solution might be a number, and the algorithm might give an answer within a percentage of that number. The run time for these algorithms might be parameterized by both n, the problem size, and some error parameter: either the probably of a wrong answer, or an error tolerance.

The second subtlety is that modern algorithms can be probabilistic. The computer might “guess” a solution. This is modeled by giving the machine access to a coin-flipping box that it might query on some steps, and the random result of the box will direct the code flow. In this situation, the output is something similar to a random variable. Rather than a single result, it is a function based on the problem instance given and the stream of coin flips. The random variable is conceptualized by thinking how the result varies over varying coin flip streams, and this can be made into a probability space.

I would like to clarify this further. Suppose the algorithm A on problem x give result y with random input r, A(x,r)=y. The random input r is a sequence of coin flips, e.g. r = <0,0,1,0,1,1,0,…,1>. Consider now the set R = { r’ | A(x,r’)=A(x,r) }. This is an event in the probability space of all finite coin flip sequences and hence has a collective probably, the probability that on input x the algorithm A gives output y.

That is how A becomes a random variable. Technically speaking, random variables give values in the reals, where our algorithm A gives outputs that can be very arbitrary. This prevents, for instance, “expectations” of A, which is a weighted sum over the outputs, weighted by event probabilities. However, the random variable viewpoint is still important, in my opinion.

**The complexity petting zoo**

The function T(n) (let’s fix worse case, deterministic algorithms) needs to be squinted at for its most prominent features. The actual number is not going to be that significant. It is a step count, and the exact value will depend how how the algorithms was put into code, and how the steps in the code got counted. This varies in ways which do not inform us much about the operation of the algorithm. What does inform us, is the order of growth of T(n). Is it, for instance, never more than a fixed value, denoted T(1)?

*T(1) algorithms*

The problem “what is the first number in a given sequence of numbers?” can be answered in a fixed amount of time. The exact amount of time will depend on the hardware, the memory speed, how the data was represented. But in essence, there is a fixed amount of time needed to answer any among the instances in this problem family. These are the T(1) algorithms.

*T(n) algorithms*

A sample problem might be, given a sequence of n numbers, what is the smallest among these numbers? The algorithm that walks along the sequence, looking at each number in turn, and remembering the smallest so far, runs in time T(n). The steps we are counting would be the compound step of getting the i-th number, deciding if it is small than some current smallest, and replacing the current smallest if so. If we call this 3 steps or 20, that only changes the function T(n) from 3n to 20n. In any case, this problem is linear.

A linear problem runs in time T(n) a represents a problem for which all the data must be looked at, but that there isn’t much more to the solution beyond looking at the data item. There is a computation that occurs after the data item is considered, but per data item, this consideration takes a constant amount of time.

*T(n ^{2}) algorithms*

Consider now a sorting algorithm, such as selection sort. It works by selecting from the n original numbers the smallest number. It does this by making (n-1) comparisons. In effect, let’s magically grab the smallest number and then confirm this be comparing it to the other n-1 numbers. Selection sort has no magic so it discovers the smallest rather than non-deterministically choosing it (lucky guess each time!) but no matter. The sort continues by repeating on the remaining n-1 numbers, and so on. In the end each pair of numbers has been compared, for a count of (n choose 2) = n(n-1)/2 comparisons. We step count only the comparisons and the rest will be either the constant work of comparison or the linear work of putting the result away into an array. So this is an T(n^{2}) algorithm, and it comes from a complexity in which all pairs of items are being compared.

As simple as this is, the algorithm does actually do some optimization. Once the largest item is identified, for instance, no further comparisons are done against that item. The nature of the item infers that if a<b then b>a, so when b is considered, the comparison against a is taken for granted. Else there would be n(n-1) comparisons. Note that, however, this is still quadratic in run time. Even so idiotic an algorithm as to not notice that a<b implies b>a might run as fast, from the point of view of this vague T(n^{2}) notation, as a more clever algorithm that does take advantage of this implication.

*T(n log n) algorithms*

In the case of sorting, further implications on order can be made, and those implications can reduce the number of comparisons needed to complete the sort. Our step counting is counting only the steps of comparison, and this gives an accurate count of the number of overall steps, so the step count of comparisons gives the resulting algorithm run time.

Let’s say that by magic among the n number we choose a number of middle size. We then sweep through the n-1 other numbers identifying which are smaller and which are larger than the chosen number. We now can imply, without doing additional comparisons, that any number smaller than the middle number is smaller than any number larger than the middle number. The n-1 comparisons to split the set implies the result for something like n^{2}/4 other comparisons: any number from the n/2 small numbers is known to be smaller than any number from the n/2 large numbers. All that is needed now is to compare between the small numbers and between the large numbers, but never across those two collections.

I’ll skip the math, but the result is that in total, to completely categorize what is smaller than what, T(n log n) comparisons will result by exploiting this economy.

**From algorithm efficiency to problem complexity**

So far what has been measured is the run time of a solution to a problem. It is sort of an engineering result, but actually a scientific result. Rather than saying something about the world, it says something about what we build to manipulate the world. In trying to create better and better algorithms, however, we do touch something about the world.

In the of sorting, we built and algorithm that uses T(n log n) comparisons, but we can also show that at least T(n log n) comparison are absolutely necessary. While the solution makes comparisons, the problem requires them. To untangle a permutation of n numbers, we can show that unless T(n log n) comparisons are made, there will be too much left unknown about the tangle; the tangle can be only partially untangled for lack of the needed comparisons.

This seeking the most efficient algorithm discovers a quality in the problem itself, and is the complexity of the problem. That complexity is a scientific fact, and belongs to the nature of the problem, to the natural mathematical intricacy of the problem, and cannot be argued around.

In the case of sorting, the argument is that the set of numbers can be tangled in one of n! possible ways. That the knot cannot be untangled unless the precise tangle is identified (in fact, the act of untangling will describe which was the starting tangle), and that the best that can be done (in a certain model of computation) is to extract an description of the starting tangle one bit at a time. Since there are n! possible tangles, if one were to assign each an integer, the integer would need to be log n! = n log n bits long. That is the argument. It is very general in order that it work against any number of particular untangling strategies. It sort of ignores the strategies and works with a more basic understanding.

**The complexity petting zoo continued**

*T(n ^{3}) algorithms*

The algorithms with efficiencies of the form T(n^{k}) for an integer k are an important class. We have seen so far linear, k=1, and quadratic, k=2. The n log n algorithms like between k=1 and k=2, in fact, they lie between k=1 and k=1+epsilon, for any positive (non-zero) epsilon, no matter how small. There are algorithms with run times in this fractional region, for instance multi-dimensional search algorithms running in time T(n^{3/2}). The time for sorting, n log n, is less than any of those, but greater than linear.

An popular example of a cubic, k=3, algorithm is the naive implementation of the multiplication of two n by n matrices. We will ignore (although really we shouldn’t) the actual cost of multiplying two integers. Integers are operated on by algorithms, and these algorithms have run times based on the number of digits, or bits, in the integer. Let’s give our machines a gift and consider such operations unit time.

The naive matrix multiplication algorithm fills in each of the n^{2} locations in the resulting matrix using a sequence of n multiply-and-add steps. The result is a T(n^{3}) algorithms.

This is the speed of the algorithm, but it is not the complexity of the problem. In fact, no one knows the complexity of matrix multiplication exactly. That is, it is unknown what is the smallest s for which a T(n^{s}) algorithm exists that can multiply two n by n matrices, assuming unit cost arithmetic.

Strassen multiplication gives an algorithm of time T(n^{k}) with k = log_{2} 7 = 2.8-ish. So it is known that the exponent is surely less than 3, and most certainly more then 2, but it is unknown exactly.

*T(2 ^{n}) algorithms and NP*

Algorithms that run in time T(n^{k}) may or may not be practical, depending on k. For large k, the increase in time resource grows very quickly with problem size, so that if one intends to solve an increasingly large problem, the time requirement soon outstrips resource. For k=4, for instance, doubling the the problem size will require 16 times more time. A processor running at 5 Gigahertz would need to go to 90 Gigahertz to solve in the same time a problem twice as large. More or less, it takes 10 years for processors to get 10 times as fast. The collection of all problems with complexity T(n^{k}) for any k is called P, the class of polynomial time problems.

That said, beyond all of these complexities lie even more combinatorial challenging problems — those with exponential run times. Some of these problems have exponential time solution algorithms but unknown complexity. That is, much faster algorithms might exist. Other of these problems are of exponential complexity, meaning it is not possible to solve the problem in less than exponential time by any algorithm.

Because I am interested in introducing the class NP, I will talk as an example a problem with an exponential time solution, but of unknown complexity. Let A be a set of n integers. The question is, is there a subset of A that sums to zero. For instance, if A = { 1, 4, -3, 2, 9, -7 } then the subset { 1, 2, -3 } sums to zero. This problem is called the *subset sum problem* and its known solution is exponential.

An algorithm that solves this problem is: form one by one each and every of the 2^{n} subsets of A, and see if the subset sums to zero.

It is exponential time just to walk through the trials. However the problem is of the form guess-and-check. Each candidate can be quickly tested whether it solves the problem. By quickly tested, this means the test problem is in P, that is, it can be tested in time T(n^{k}) for some integer k. Not all problems of exponential complexity have this simple verification property. However, a large number of practical problems which are not in P do have this property. Hence the class is given a name, NP, for non-deterministic polynomial time.

By the way, P and NP might be the same class. In the previous paragraph I should have said “a large number of practical problems which are not known to be in P do have this property”, because for all we know, guess-and-check problems always are in P. That there is some master method for guessing that takes the guess work out of guessing, so to speak. The problem: does P=NP? is one of the big unsolved puzzles of this age.

To rephrase: is it enough that there be a polynomial time algorithm for recognizing a solution to a problem in polynomial time to insure that the solution can be found deterministically in polynomial time, given that the solution space is exponential in input size? It does seem that that is unlikely — to guess a solution through luck or intuition is unlike calculating it methodically. However, in all cases so far, the power of randomization has eventually been shown to be expedient but not necessary. De-randomization techniques exist in many cases that turn lucky guesses into methodical and efficient step-wise processes.

The reason for the use of the word “non-deterministic” in NP is that, rather than a methodical enumeration of all 2^{n} possibilities, equivalently albeit more magically, the algorithm can go right to the winning possibility by flipping n coins in sequence. Hence the viewpoint of NP being a randomized algorithm, in each each coin flip is “lucky”. Assuming that of the 2^{n} possibilities exactly one is the answer, in the deterministic case the solution is found in about 2^{n}/2 trials, and in the randomized case, one finds the solution is found one trial with probability 1/2^{n}, and the expected number of trials until success is 2^{n}/2.

**Randomized, practically “efficient” algorithms, the class BPP**

Algorithms in P are often associated with *tractable* computation, whereas algorithms of longer run times (especially exponential run times) are considered *intractable*. However, what is truly tractable and intractable seems to fit better within the larger framework of generally correct computation. A fast, generally correct algorithm is considered more practical than a slow completely correct algorithm.

Because of the laws of probability, an algorithm that is somewhat correct can be boosted to an algorithm that is almost always correct with very slight overhead. If an algorithm can give an answer with probability 2/3, then running that algorithm i times gives an answer with probability 1-(1/3)^{i}. Note that the possibility of error diminishes exponentially in the added run time devoted to the repetitions, i.e., the value of i.

The class BPP is the set of problems that can be solved with a polynomial time randomized algorithm where the probability of a correct answer is 2/3 or greater. How the probability is counted is as follows. For an input x, the algorithm A gives output A(x,r), where r is the series of coin tosses. Take A(x,.) as a random variable from the space of coin tosses to outputs. Consider the set of coin tosses that give the correct answer y for A(x,.): R = { r | A(x,r)=y }. The probability of a random r being in the set R is the probability that the algorithm gives the correct answer.

In short, the BPP problem is randomized, and picking a random sequence of coin tosses is likely to give the right answer. (Although 2/3 does not see to be a strong definition of “likely”, recall how we can repeat the algorithm i times, which would change only the constant in the run time, and get an exponentially decreasing chance of making a wrong conclusion.)

The relationship between BPP and NP is unknown. Both BPP and NP contain P. If a problem is solvable exactly in polynomial time, then it is solvable with probability greater than 2/3; and it can be solved by guess-and-check but simply calculating the answer. Also, both BPP and NP are contained in exponential time; running through all possible coin flips, one can calculate which answer is give over 2/3 the time, or can wait for the correct sequence of coin flips to give the correct answer. However, some problems might be BPP but not NP, and NP but not BPP.

]]>The interval tree is a red-black tree augmented with a field that, for every node n in the tree, contains the value (max i.high) where the maximum is taken over all nodes that are descendents of n, including n itself. The red-black tree has the nodes ordered by the i.low values.

It is efficient to maintain this information: on insert, first propagate any new maximums from the insertion leaf back to the route along the path of the insertion; for rotates, the value of exactly one node needs to be recalculated, and that can be done by taking the maximum of n.left.high, n.right.high and n.high. On deletes, rotation is handled as above, deletion will require climbing the tree back from the deleted node, recalculating.

The search algorithm is as follows: to search for an intersection with query interval q against all intervals in the tree rooted at node n,

- first check if q intersects the interval contained in n;
- if not, check if n has a left child, and if not recurse on n.right;
- if it has, check if n.left.max < q.low and if so recurse on n.right;
- else recurse on n.left.

The surprising thing about this algorithm, to my mind, is that implication that if q is not fully to the right of all the nodes on the left subtree, then it must intersect something in the left subtree.

If any of the conditions for going to the right are satisfied, then it is obviously impossible for there to be an intersection of q with an interval in the left subtree. However, oddly, barring obvious impossibility, if there is an intersection, it will be in the left subtree.

The proof is as follows. We maintain the assertion that if q intersects some interval in the tree, it intersects some interval descending from the current node n. Begin by setting n to the root so the statement is trivially true.

If q intersects n.i, we are done and an intersection is found and we are done. Else we continue on the assumption that if there is an intersection there must be at least one in either the left or right subtrees.

If n.left is empty, the presumed intersection must be in the right subtree. So continuing with n.right maintains the assertion.

The case that q.low > n.left.max is very similar: q is fully to the left of the rightmost extending interval among all intervals in the left subtree of n, so it cannot intersect with any of those. Therefore if there is an intersection among the intervals in the right subtree of n.

Finally, if q.low ≤ n.left.max then either q intersects with the interval that achieves this n.left.max, i.e. an interval i’ such that i’.high = n.left.max, or it does not. If it does, then continuing searching in the left subtree of n maintains the assertion since i’ is in the left subtree of n.

If it does not, then this implies q.high < i’.low, so q is fully to the left of some interval in the left subtree of n. As the tree is ordered by low endpoints of intervals, q is fully to the left of everything in the right subtree of n. Therefore if the presumed intersection cannot be among the intervals in the right subtree of n, and therefore must be among the intervals in the left subtree of n.

So even when written out, the proof is sort of long. It depends on the ordering of the tree in the last step.

]]>As an example of *complexity theoretic* ways of thinking, since hashing gives O(1) access, we can get access to all n items in the hash table in time O(n). This must mean that hashing cannot contain sorting as a subproblem, since we need O(n log n) to sort. However, trees have O(log n) access times, hence access to all n items takes time O(n log n), enough time to have sorted the items. And indeed, the additional operations supported by trees, such as successor, are equivalent to sorting. A tree of data implicitly sorts the data, a hash table does not.

Randomized analysis isn’t a change to our notion of a program, but to how we analyse programs. Deterministic analysis, it isn’t really called that, it is called *worst-case analysis, *is the usual model. The run time of a program might depend on the specific input offered. Each time there is a choice, due to the conditioning of the input, we choose the worst alternative. The result might not be exactly correct for any input, but is guarenteed to overestimate the resource consumed by any input. Randomized analysis puts a probability distribution on the inputs and gives the *mathematical expectation *of the run time, given this distribution.

Randomization is a very important and deep concept. We do not know if randomization is anything but a pyschological crutch: we do not know if there as any quantifiable advantage to a randomized algorithm over a non-randomized algorithm. On the other hand, we know lots of problems which are solvable with randomized algorithms and not by deterministic algorithms (but we have not ruled out that someday a deterministic algorithm for the problem will be found). In a practical sense, randomized algorithms are simple, plentiful, and generally perferred by the working computer scientists. For this reason, they have an important part in this course.

]]>The model requires that all we can determine about the elements we wish to sort is to compare them, using some black-box, for size. In many cases, however, we can sort without ever comparing the elements. If we knew in advance something about the data, we might sort very quickly by taking advantage of special qualities of the elements. Linear time sorts are of this form.

]]>The randomized version of Quicksort effectively shuffles the data before sorting it. Each run of Quicksort will shuffle the data differently. This makes Quicksort a new sort of algorithm, at least new from the perspective of the algorithms presented in this course. Randomized Quicksort does not run the same way given the same input on multiple runs. Random coins make choices for the algorithm. Its runtime on an input depends both on the input and the outcome of the coin flips. While (unrandomized) Quicksort is average case O(n log n), with the average taken over different inputs, Randomized Quicksort is average case O(n log n) with the average taken of different runs on the same input.

]]>Every programer has encountered and been suprised by problems of this sort. For instance, this is *always false:*

sizeof(X)>-1

Since sizeof is unsigned, it promotes -1 to unsigned and does unsigned comparisons.

However, just being careful of distinguishing signed and unsigned integers is not enough. The following program shows that compairing x to y versus (x-y) to 0 gives different results. This can be attributed to integer overflow, but the overflow is occuring during subtraction!

]]>[burt@lee ~/temp]$ more tryoverflow.cint

main(int argc, char * argv[]) {

int i, j ;

i = 0x7fffffff ;

j = 0×80000000 ;

printf(“i= %d, j= %dn”, i, j ) ;

if ( (i+1)==j ) printf(“i+1==jn”) ;if ( i>j ) printf(“i>jn”) ;

else printf(“i0 ) printf (“(i-j)>0n”) ;

else printf(“(i-j)j

(i-j)<=0

[burt@lee ~/temp]$