Search and genetic algorithms

Categories: Featured

mutation

Two nights before he disappeared, he came to my house. “I’ve cracked it!”, he kept saying. He was talking about genetic algorithms, quantum teleportation… he said he was about to change everything: science, medicine, religion. He wouldn’t have left that, Sam. He wouldn’t have left you – Tron Legacy

Most of the time that watch movies that talks about science and fiction the amount of no relation terms that they use are enormous, but this weekend watching Tron Legacy (again) I heard a term that is real and some time ago took my attention: Genetic Algorithm.

Yes they do exists and they are far from being something new, in fact they where proposed by Holland at 1975 [XXX]. To understand what a GA is we need to go a little back a little bit and look at what Evolutionary Computation (EC). Evolutionary Computation abstracts the Darwin evolutionary principles and apply them into algorithms, which is used to search for the optimal solution to a problem[2].

Scandinavia and Deepa[2] uses a search algorithm to explain the EC concept: “In a search algorithm, a number of possible solutions to a problem are available and the task to find the best solution in a fixed amount of time.[…]The key aspect distinguishing an evolutionary search algorithm from such traditional algorithms is that it is population-based. Through the adaptation of successive generations of a large number of individuals, an evolutionary algorithm performs an efficient directed search. Evolutionary search is generally better than random search[…]”.

It is important to notice that a Genetic Algorithm is not the only procedure of a Evolutionary Computation, we also have genetic programming, evolutionary strategies and evolutionary programming, where “the basic difference between the paradigms lie in the nature of the representation schemes, the reproduction operators and selection methods”.

So for the simplicity I will keep with the genetic algorithm, so before going to my idea of “where we can apply that”, it is important to define how a genetic algorithm works. For those that know the subject forgive me, because I will simplify a book subject in few lines; but I will try to capture the essence.

Imagine that you have a function F(y), and this function can be decomposed in independent blocks. Those blocks can or not can be used to produce the function result; for example imagine F(y) <- a || b || c || d .

The result of the function F(y) will return true when a OR b OR c OR d is true, let’s suppose that a, b, c and d are not simple operator but they are also functions. It is also implicit that as soon as one of the them is true the function F(y) can return true and the other ones doesn’t to be evaluated.

Now let’s assume that a, b, c and d are operations that consumes time and it would be important to know which order of evaluation is the optimal one: here is where a genetic algorithm comes handy, but how?

A genetic algorithm contains a bit representation of its inner functions, so F(y) would have 4 bits saying if a, b, c or d are executed; the easiest way is to image the bits as a DNA sequence for the function.

So for each execution of F(y) I would have a different DNA sequence or bit sequence; so a genetic algorithm would be executed in this way:

 

ga execution

  1. The data for the F(y) is populated, so it could be execute;
  2. The function DNA – the bit array – is varied randomly. In this case the sequence that a, b, c, d will be evaluated;
  3. The function is executed;
  4. Based on the result of F(y) with a DNA-N a selection is applied, does it exists and children are created for this DNA or not?

As you can see the difference is in the step 2 and 4 are the important one. The step 2 creates a slightly different version of F(yen) and the step 4 is the natural selection of that function with a random DNA.

This loop keeps going till we don’t have more DNAs or it doesn’t worth to progress. But at the end we will have the most optimal F(yen) for a given population. It is important to notice that the DNA will be optimal for a population X, but it can be not optimal for another population.

So depending on the problem the selection should keep going along the life of the function.

The next question is: all this talk for what? Which problem do you want to solve with such crazy idea?

Well this kind of computation has a lot of application fields, but I believe that it also could be applied to a general problem that we have in the “normal” business world: search. Your search engines gets more and more complicate along the time, the natural steps are getting out of a normal sql query to a proper indexing system, but what to do after that?

How do I know which index is better? Do I create a big index? Do I split my indexes? If so which one to use?

I will give an example: imagine that your application has a auto-complete text field, and that text field executes a search in 10 different lists, all the lists are related, so list 2 depends on list 1 and so on.

Well there is a lot of ways to execute this query, but ultimately you will need to search in all the lists; a common approach would be to search in list 1 then list 2 and so on. Well but how do you know if it is the best approach? Maybe the big amount of the results are at the list 4 and list 7, so you are spending you time looking at the list 1, list 2, list 3, list 5 and list 6 for almost nothing.

My idea was to use a genetic algorithm to realize which one is the best approach to such algorithms. The question is: how to do that? Well I will not answer it here because I don’t have one answer, but I will add some extra thoughts about it:

  1. Do we do it online or offline?
  2. How do we compare the results?
  3. How can we provide such a function that really unusable as doesn’t cause bugs?

Well I will finish here, but will keep this idea in mind, maybe I can get some concept out of it. And I will do that listening to Kansas – Hold on.

[1] Holland, John H (1975), Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor

[2] S. N. Sivanandam, S. N. Deepa. Introduction to genetic algorithms, page 1-2.

«
»

    Leave a Reply

    Your email address will not be published. Required fields are marked *