rtybase: December 2011

Wednesday, December 14, 2011

Jail Break Quiz

Everything was quiet and peaceful today, until Google Reader brought to my attention this puzzle from plus.maths.org.

So, I said, why not writing a little Java program to resolve this? Here is the code:


public class Puzzle1 {
 static public final int MAX = 100;

 static public void printDoors(boolean[] doorIsOpen) {
  int count = 0;
  for (int i = 0; i < doorIsOpen.length; i++) {
   if (doorIsOpen[i]) {
    System.out.print("o");
    count++;
   }
   else System.out.print("c");
  }
  System.out.println(" =" + count + " open doors");
 }

 static public void main(String[] args) {
  boolean[] doorIsOpen = new boolean[MAX];
  
  for (int i = 1; i <= MAX; i++) {
   for (int j = i; j <= MAX; j+=i) {
    doorIsOpen[j-1] = !doorIsOpen[j-1];
   }
   printDoors(doorIsOpen);
  }
 }
}

The result was "10 prisoners have escaped", as per the last line printed by the program:


occoccccoccccccoccccccccocccccccccco
ccccccccccccoccccccccccccccocccccccc
ccccccccocccccccccccccccccco 
=10 open doors

The most interesting part of the puzzle, though, is the "why" one. Have a look at the sequence; 1st position (cell in this case) contains "o" (from "open"), 4 - "o", 9 - "o", 16 - "o", 25 - "o", 36 - "o", 49 - "o", 64 - "o", 81 - "o", 100 - "o". Basically, every number of type q²≤100 corresponds to an "o". Now, this is weird ... but not for a long time.

Let's stick with an easier example (10 rather than 100):

o	o	o	o	o	o	o	o	o	o
o	c	o	c	o	c	o	c	o	c
o	c	c	c	o	o	o	c	c	c
o	c	c	o	o	o	o	o	c	c
o	c	c	o	c	o	o	o	c	o
o	c	c	o	c	c	o	o	c	o
o	c	c	o	c	c	c	o	c	o
o	c	c	o	c	c	c	c	c	o
o	c	c	o	c	c	c	c	o	o
o	c	c	o	c	c	c	c	o	c

1st cell (column from the left) is visited one time and remains opened and unvisited after that. Also number 1 has one divisor {1}.
2nd cell is visited two times (one "o" and one "c") and remains closed and unvisited after that. Also number 2 has two divisors {1, 2}.
3rd cell is visited two times (one "o" and one "c") and remains closed and unvisited after that. Also number 3 has two divisors {1, 3}.
4th cell is visited three times (one "o", one "c" and one "o") and remains opened and unvisited after that. Also number 4 has three divisors {1, 2, 4}.
...

It turns out that the number of visits is equal to the number of divisors the number, corresponding to the cell (column from the left), has. Here is the formula and this statement is easy to prove (e.g. using induction).

Now, if the number of divisors is even (we have a whole number of pairs {"o", "c"}), because the sequence always starts with "o" then it must end with "c".
However, if the number of divisors is odd (we have a whole number of pairs {"o", "c"} plus one extra {"o"}), the sequence ends with "o".

So far so good, but when the number of divisors is an odd number? According to the formula mentioned above the number
τ(n)=(e₁+1)⋅(e₂+1)⋅(e₃+1)⋅...⋅(e_k+1)
is odd when each of these numbers e₁, e₂, e₃,...,e_k is even, but this means that n is of type q². Q.E.D.

Monday, December 12, 2011

K-Means Clustering Algorithm

This post is dedicated to K-Means Clustering Algorithm, used for unsupervised learning. Here is a short introduction into the unsupervised learning subject. This video shows the algorithm at work.

So, how does it work?

1. First of all we have the number K (the number of clusters) and the data set X₁,...,X_m as inputs, K < m and X_j ∈ ℜⁿ for ∀j=1..m. Considering that clustering is used as a task in data mining (e.g. including text), it is a nice exercise indeed to formalise a problem in a way where data is represented by vectors :)

2. At this step we randomly initialise K cluster centroids μ₁,...,μ_K, μ_j ∈ ℜⁿ for ∀j=1..K. The recommended way is to randomly initialise the centroids from the data set X₁,...,X_m and make sure that μ_i≠μ_j for ∀ i≠j.

3. At this step we execute the loop:


Repeat {
  //cluster assignment step
  For i=1..m {
    Find the closest centroid for X_i, i.e.
    min{||X_i-μ_j||}, ∀j=1..K, e.g. it is μ_t;
    Assign c_i=t;
    //in this case c_i is the index of 
    //the closest centroid for X_i
  }

  //move centroids step, if a particular cluster 
  //centroid has no points assigned to it, 
  //then eliminate it
  For j=1..K {
    old_μ_j = μ_j;
    μ_j = average (mean) of all X_i where c_i=j, ∀i=1..m;
  }
}

This loop ends when all the centroids stop "moving", i.e. ||old_μ_j - μ_j|| < ε, ∀j=1..K, where ε is an error we are happy with (e.g. 0.01 or 0.001).

This is pretty much the algorithm. However, in this form, there is a risk to get stuck in a local minima. By local minima I mean the local minima of the cost function:
J(X₁,...,X_m,c₁,...c_m,μ₁,...,μ_K) = (1 ⁄ m)⋅∑||X_i – μ_{c_i}||², sum is for i=1..m

In order to address this, the algorithm (steps 2, 3) are executed several times (e.g. 100). Each time the cost function (J(...)) is computed and the clustering with the lowest J(...) is picked up. E.g.


For p=1..100 {
  Perform step 2;
  Perform step 3;
  Calculate cost function J(...);
}
Pick up clustering with the lowest cost J(...);

In order to make it more interactive, I and my son spent couple of hours implementing a JavaScript version of this algorithm using <canvas> tag from HTML5 (my son studies HTML at school, so it was an interesting exercise). Here is the link to the page (excuse me and my son for our non OOP approach to JavaScript :)). If you want to have a look at the sources, feel free to download that page (we didn't provide comments, but hopefully this article explains everything). We tested it with Google Chrome and Internet Explorer 9 (if you have problems with IE9, please consult the following link).

o	o	o	o	o	o	o	o	o	o
o	c	o	c	o	c	o	c	o	c
o	c	c	c	o	o	o	c	c	c
o	c	c	o	o	o	o	o	c	c
o	c	c	o	c	o	o	o	c	o
o	c	c	o	c	c	o	o	c	o
o	c	c	o	c	c	c	o	c	o
o	c	c	o	c	c	c	c	c	o
o	c	c	o	c	c	c	c	o	o
o	c	c	o	c	c	c	c	o	c

o	o	o	o	o	o	o	o	o	o
o	c	o	c	o	c	o	c	o	c
o	c	c	c	o	o	o	c	c	c
o	c	c	o	o	o	o	o	c	c
o	c	c	o	c	o	o	o	c	o
o	c	c	o	c	c	o	o	c	o
o	c	c	o	c	c	c	o	c	o
o	c	c	o	c	c	c	c	c	o
o	c	c	o	c	c	c	c	o	o
o	c	c	o	c	c	c	c	o	c

Wednesday, December 14, 2011

Jail Break Quiz

Monday, December 12, 2011

K-Means Clustering Algorithm

Favourite Web Sites and References

o	o	o	o	o	o	o	o	o	o
o	c	o	c	o	c	o	c	o	c
o	c	c	c	o	o	o	c	c	c
o	c	c	o	o	o	o	o	c	c
o	c	c	o	c	o	o	o	c	o
o	c	c	o	c	c	o	o	c	o
o	c	c	o	c	c	c	o	c	o
o	c	c	o	c	c	c	c	c	o
o	c	c	o	c	c	c	c	o	o
o	c	c	o	c	c	c	c	o	c