Updated: September 2018 

PONG – A Path Entropy Inspired Solution

Most of the machine learning tasks are using a concept of “error” (also called loss in deep learning or fitness in genetic programming or reward in reinforcement learning) in order to quantify the difference between the expected and actual performance of an AI system. For instance, if an artificial neural network \(F\) is expected to output 1 for the input \({ i }_{ 1 }\) and 0 for input \({ i }_{ 2 }\), the error \(e\) over the 2 samples can be defined using the L1 norm: \( e=\left| F\left( { i }_{ 1 }-1 \right) \right| +\left| F\left( { i }_{ 2 }-0 \right) \right| \). This so-called “supervised learning” approach is used in the vast majority of deep learning systems and has proven to be very effective. Nevertheless, it has one major drawback: it requires to explicitly provide to the system a number of examples where the input and output are known. As a result, in the field of deep learning, this requirement has led to the creation of an entire new industry and new job profile (“AI trainer”) only dedicated to the generation of those training data.

In practice, there are many cases where this training data is unavailable, for instance, how do you define the training data for a “thinking machine”? One approach consists in using a more general behavioral metric. For example, consider the case of the Pong game where the intelligent agent must control a bouncing ball with a pad. The traditional approach would be to train a neural network with a number of examples of input images of the pad game and the corresponding move of the pad. A less supervised approach was proposed by Wissner Gross. He says that an intelligent behavior consists in maximizing its future options. For this, he introduces the concept of “path entropy” which in this case would translate into “At one time step, how should I move the pad so that I have the most options for my next pad move?”. In practice this is not easy to compute, so we used an approximation which consists in saying that the pad should move in such a way that the ball is equally present over the entire pong game. We are actually replacing the path-entropy by a pure entropy \(E\) which can be easily computed using the ball probability presence \(P\) at the location \(\left( { x }_{ n }^{ t },{ y }_{ n }^{ t } \right)\) for the bin \(n\) over \(t\) time steps using \(E=\sum _{ n,t }^{ }{ P\left( { x }_{ n }^{ t },{ y }_{ n }^{ t } \right) } \times \ln { \left( P\left( { x }_{ n }^{ t },{ y }_{ n }^{ t } \right) \right) } \).

Now that we have a non-supervised (or at least “less-supervised”) performance metric, we still need to define an “engine” for our smart pong player. Based on our experience in this field, we chose to use a Linear Genetic Programming approach. With this approach, the source code of the software used to move the pad is written automatically using operators inspired by genetic selection. The software has access to the position of the ball and to the position of the pad, it returns the displacement of the pad. The source code (so-called “individual”) is generated using an intermediate language and converted in x64 machine code using our custom-made real-time compiler. We used individuals of up to 300 lines, with 256 populations (one per core) of 50 individuals, each population running on a HP BladeCenter system populated with 16 Proliant blades. The source code was evolved using mutation, crossover and migration operators. After 10 millions assessed individuals, we ended up with a fascinating solution, shown in the animation. After simplification of the source code, it corresponds to moving the pad at each time step according to the equation:

\(\Delta p=\frac { 1 }{ x } +2-\frac { p+y }{ x } +x-p-y\),

where \(\Delta p\) is the displacement to apply to the pad, \(\left( x,y \right) \) is the position of the ball and \(p\) is the position of the pad. Even knowing this equation, it is not obvious to understand why this solution works. By the way, this complexity demonstrates the power of using machine learning in general and genetic programming in particular. Indeed, it enables to develop system which complexity exceeds our understanding capabilities.

To conclude, we successfully solved Pong without explicitly indicating to the system that the pad should not miss the ball. Instead, we used the variability of the ball position across the board as the sole indication of performance. It is an interesting result since this approach can be used each time that it is impossible to express numerically a performance, which is typically the case of any high-level cognitive activity.

Updated: August 2018 

FinalSpark Webcast on “FREEDOM OF ACTION”, June 27th, 2018

SinoSwiss Artificial Intelligence Event 2017 EPFL Switzerland

Updated: November 29th, 2016 

A new programming language

In the Vision section we talked about our goal to create a “Thinking Machine”, that is, an artificial intelligence capable of passing the Turing test. If you have not yet read the Vision section, we propose that you do so now.

Initially we started by designing a new class of programming language that we thought was more appropriate to implement Linear Genetic Programming. This new language has the property that each instruction can be transformed, up to a certain extent, continuously into a different instruction. A scalar factor alpha controls the transformation. We coined this new programming language CPL (Continuous Programming Language). As an example, consider the instruction x=1. With alpha = 0, x is left unchanged (equivalent to the NOP instruction). For alpha=0.5,  x is the mean between the current value of x and 1. Finally for alpha=1, x is equal to 1.

After extensive testing, we demonstrated that CPL has the property we expected. The program can change an instruction into another one (mutation or crossover) with a limited impact on the program output. As an illustration, the error hyper-surface below shows the error (inverse of fitness) of a program when the alphas of 2 instructions are changed. The drawback of CPL is that it also hugely increases the search space as the genetic engine must find both the right instructions and the right alpha values. We therefore decided to put this approach aside for the moment and reverted to a more standard language where all alphas equal to 1.

surface
Hypersurface errorError surface of a CPL based program. The surface clearly shows the continuous behavior of the error as a function of changing instructions.

Linear Genetic Programming

We then refined our genetic engine (mutations, crossover, migrations).  To make sure that we have already reached state of the art performance, we benchmarked our engine against a standard classification problem (see Brameier and Banzhaf “A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining”, 2001) and ran our genetic engine on the medical data set. The performance testing against the benchmark improved the genetic engine in many ways. We were able to use metagenetic methods to tune the many genetic parameters (mutation rate, population sizes, etc) . We were also able to visualize the populations to get a sense of what was going on as they evolved.

The picture below  is a graphic representation of an entire population: each column represents a program, each color represents a different class of instruction (arithmetic, memory, flow control, etc) and the individuals are sorted based on their fitness, with the best one on the full left. We can see for instance that in this population the best individuals are similar (horizontal lines of a single color on the left indicate that the individuals, i.e. vertical lines, are similar) and that individuals with low fitness are tend to fewer number of instructions as indicated by the black spikes on the right side of the plot.

individuals
Graphical representation of program instructions against fitness

At this stage one of the open questions was: how likely is it that LGP (Linear Genetic Programming) will evolve code implementing the concept of neurons and getting us to an artificial intelligence? From our experience, we realized that this was extremely unlikely. We therefore decided to help the system by embedding the concept of neurons into the basic system. For this purpose, we introduced the concept of ”metaneuron”, a structural element whose behavior (meaning its relaxation and learning properties) would be defined by an LGP evolved program. The concepts of metaneurons is commonly used in neural networks for artificial intelligence. However, in our case the concept is much more elaborate as it is combined with LGP.

As a first check, we tested the genericity of the approach by training a multilayer network on the XOR problem. Although we were able to find solutions for the XOR problems, we also observed that the metaneurons were not collaborating as in a neural network to solve problems. Several improvements were necessary, such as imposing a grid topology (see figure below) with 8 neighbor connections (this assures that information has to travel into and through the network to reach the output).

Next, we checked for cases of chaotic behavior, situations where changing boundary conditions, for instance one single connection weight, would dramatically increase the system error. Given that healthy biological neural networks are very robust to changes that naturally occur in all physical systems we want our system to have the same characteristic. The next requirement was defined as the capacity of “true” learning. This, although obvious, is extremely important as in many instances we found that the LGP programs just secretly stored several internal states useful when executing the program. Instead of learning, the programs just tested the various sets of pre-stored states until the error was good.

network
Example of a grid topology needed to implement metaneuron behavior

To avoid this pitfall effectively we used a training set and a distinct test set. Consequently we designed the training set with 8 Boolean 2-inputs functions f1(i,j), f2(i,j)… f8(i,j). For instance, f1 could be the XOR problem. The genetic engine has then to find a code such that the metaneurons can learn any of those functions. Once the learning program is found, its learning performance is then assessed using a test set made of 8 other Boolean functions f9,..f16. The number of training sets and training set size has of course a direct impact on computational complexity and required us to increase the computation speed by several orders of magnitude.

Basic computational speedup was reached by hardcoding the relaxation function in C++ (using integrate and fire) and only evolved the learning program with LGP. Using several hours of computing on over 100 cores, the system found learning algorithms for different network sizes (3×3, 4×4, and 5×5).  The found learning programs vastly outperformed state of the art optimization techniques.  They were 2,300 times faster than the genetic algorithmic search and 100,000 times faster than a random search.

curves
Performance of LGP approach against Random Search and Genetic Algorithms

Is our approach to reach artificial intelligence disruptive? Yes, it is. Are our results disruptive? Yes, they are. We are therefore motivated to take this to the next level. Future works will focus on extending this approach to more complex problems by using more neurons, more complex topology and constructing more powerful instructions for our LGP language. At this stage we are still far from having a thinking machine in our hands, however, we feel that the our tools become more and more powerful and that we are on the right track to tackle the problem of artificial intelligence.

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text. captcha txt