utorok 23. augusta 2016

Developing Widelands's AI with genetic / evolutionary algorithm

Developing Widelands's (strategic game) AI with genetic / evolutionary algorithm

Disclaimer: This is experimental work and might never make it into release.

Content of this blogspot:

* Introduction of widelands
* Current status of AI in (pre-)B19
* Idea
* Neuron
* AI training

Introduction of Widelands

Widelands is Settlers II clone, open source strategy game, available for all platforms. Please visit widelands homepage for more info.


Current status of AI in (pre-)B19

I significantly reworked AI in 2015 & 2016, this is between releases 18 and 19. Size of code (as located in src/ai/) grew significantly, 2-fold or more. In addition to new functionality and logic, the code contains a lot of numbers, see at this chunk of code (no need to study it deeply):

// score here is a compound of various input values
// usually resources in vicinity, but when enemy is nearby
// additional bonus is added
if (bf->enemy_nearby) {
prio += bf->military_loneliness / 3;
prio += (20 - bf->area_military_capacity) * 10;
prio -= bo.build_material_shortage * 50;
prio -= (bf->military_in_constr_nearby + bf->military_unstationed) * 50;
} else {
if (bf->near_border) {
prio += 50;
prio -= bo.build_material_shortage * 150;
} else {
prio -= bo.build_material_shortage * 500;  // prohibitive
}
prio -= (bf->military_in_constr_nearby + bf->military_unstationed) * 150;
prio += (5 - bf->own_military_sites_nearby_()) * 15;
}
prio += bf->unconnected_nearby * 50;
prio += bf->unowned_land_nearby * resource_necessity_territory_ / 100;
prio += bf->unowned_mines_spots_nearby * resource_necessity_mines_ / 100;
prio +=
  ((bf->unowned_mines_spots_nearby > 0) ? 35 : 0) * resource_necessity_mines_ / 100;
prio += bf->rocks_nearby / 2;
prio += bf->water_nearby;
prio += bf->distant_water * resource_necessity_water_needed_ / 100;
prio += bf->military_loneliness / 10;
prio += bf->trees_nearby / 3;
if (bf->portspace_nearby == ExtendedBool::kTrue) {
if (num_ports == 0) {
prio += 100;
} else {
prio += 25;
}
}


First a broad view: this is part of construct_building() function, which in addition to many other things, goes over empty spaces ("buildable fields") and buildable buildings and scores all combinations. The "prio" is a score for building x field. This chunk of code is for military sites.

As you can notice a lot of variables and a lot of numbers (perhaps we can use word WEIGHT) are employed here.

Idea

Such development and tweaking is very time consuming. You need to compile - run - watch (game and logs) - re-think - edit - compile and again and again. As now there is a lot of discussion about AI (mainly neural networks) everywhere, so I decided that it would be very convenient if I could use some AI algorithm to simplify the development.

Neural networks - no go for now

First where I looked at was neural networks. I have no personal experience with this, but based on what I read on internet, it would be very complicated. At least memory requirements would be huge. Consider map of 512x512, each field having multiple attributes, about 50 types of buildings, roads between them, dependencies (material flows) between them, warfare aspects. Looks like a huge task. Moreover it seems to me that no implementation of neural network for a game like this exists yet.

Lightweight approach

In fact I don't think it is necessary to dump and scrap current AI, but we can use lightweight approach to use some AI technique to help with only small portion of AI. This is what I decided for.

If you look at above code snippet, you there is a lot of numbers there. And idea is to use randomness and natural selection to come up with best AI. So in other words, the evolutionary algorithm is not to be used to play the game, but to help pick best parameters for the game's AI.

Neurons

This is inspiration from neural networks.

Two things can be mentioned in relation to NN:

Activation curves

Well, if you look at example above, most variables have linear impact on the score. But when I was reading about neural networks I noticed strong emphasis on non-linearity of activation curve. And this is something that looks very interesting and useful here. But I also realized, that computation of such curves is CPU demanding and CPU use is already a problem here. The solution is mentioned below.

Neurons

Well, my AI uses word NEURON but it misuses it a bit. Neuron in my implementation is just an object the receives inputs in range (0 - 20) and returns outputs in range (-100 - 100). As number of possible outputs is limited to 21, the neuron contains array with all outputs and no calculation is done in realtime.

Following predefined curves are available for neurons:

{   0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100}
{   0,  0, 1,  2,  4, 6, 9, 12, 16, 20, 25, 30, 36, 42, 49, 56, 64, 72, 81, 90, 100}
{   0, 17, 25, 32, 38, 44, 49, 53, 58, 62, 66, 70, 74, 78, 81, 84, 88, 91, 94,7, 100}
{ 100, 97,  94, 91, 88, 84, 81, 78, 74, 70, 66, 62, 58, 53, 49, 44, 38, 32, 15, 17,   0}
{ 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10,  5, 0}
{-100,-99, -98,-96,-94,-92,-88,-83,-73,-55,  0, 55, 73, 83, 88, 92,94, 96, 98, 99, 100}

The curve is modified by weight parameter (-100 - 100), that is multiplicator of above curve (and divided by 100), so if we will use weight -20, values from first curve will range from 0 to -20.

What is a bit of work for developer is to trim a variable to get (mostly) in range 0 - 20, if the value is out of range, outputs corresponding to 0 or 20 is returned by neuron. So I have to modify inputs so that range is used as much as possible, but got rarely out of it.

So entire AI can be "dumped" (and used for initialization) as:

const std::vector<int8_t> input_weights =
      { 86,  70, -52,  30, -25,  30,  30,  -5,  15,  80,
        30, -98,  30, -60,  93, -68, -34, -29,  30,  73,
        30, -12,  30, -93,  30,  30,  30,  80,   6,  30,
       -30,   9, -18,   3,  30, -58,  30,  30,  30}

const std::vector<int8_t> input_func =
      {  4,   1,   5,   1,   0,   1,   1,   0,   1,   3,
         1,   1,   1,   4,   1,   2,   3,   5,   1,   2,
         1,   2,   1,   3,   1,   1,   1,   3,   1,   1,
         4,   2,   2,   0,   1,   1,   1,   1,   1}


So it is a bunch of numbers. Every neuron is initialized with a pair taken from both vectors. And the code where these values are used can look like:

field.military_score_ += management_data.neuron_pool[0].get_result_safe(field.military_loneliness / 50);
field.military_score_ += management_data.neuron_pool[4].get_result_safe((field.area_military_capacity + 4) / 5);
field.military_score_ +=
management_data.neuron_pool[8].get_result_safe((field.military_in_constr_nearby + field.military_unstationed));
field.military_score_ +=
management_data.neuron_pool[12].get_result_safe(field.own_military_sites_nearby_());
field.military_score_ +=
management_data.neuron_pool[13].get_result_safe(field.military_stationed);
field.military_score_ +=
management_data.neuron_pool[1].get_result_safe(static_cast<uint8_t>(soldier_status_) * 3);

For sake of simplicity, I will use word DNA when talking about function type and weights used to initialize neurons.

AI training

Well very basic setup for this kind of stuff is to start a game (AI players only), with very high number of AI players with slightly changed "DNA". Afterwards to pick one or two best performers and use their DNA to breed new bunch of AI players for next game.

But here it is not so simple. Number of players is limited to 8, game even in high speed can take about 1 hour. Than copy/pasting of DNA of winner (being dumped into terminal) into source, then compilation is needed and new game must be started and run.

So when AI is initialized, it takes given arrays with DNA information and randomly changes about 10% of neurons. This is done for every computer player separately, of course.

Measuring of performance of AI

Well, the game provides few types of graphs (territory curves, military strength curves), so I can have some rough indication which AI is working best, or at least a better as in previous iteration. But for purposes described lower, I generates simple formula how to measure performance of AI:

score = 3 * military_sites + buildable_fields + 10 * production_sites + 10 * mines + 3 * military_strength;

score = military_sites + buildable_fields + 6 * production_sites + 10 * mines + 2 * military_strength + 4 * lost_soldiers + 8 * warehouses + 5 * my_strength_minus_average_enemies_strength;

In-game DNA mutation

Well, due to low number of players I was looking for ways how to enable modification of DNA of AI players during game. Above I presented a score, this is what I use to watch how AI is coping. Currently the logic is:

if score / score_40_minutes_ago < 1.1:
modify_DNA()

This allow the game itself (without my intervention) to change DNA of weak performers. But until one critical point - when AI starts fight together. Here it is much more difficult to come up with score that would indicate "good enough performance".

Final words

Well this is open-end endeavor - I have no idea if this ever be good enough for release, but it is fun to work on :)

Part II




































Žiadne komentáre:

Zverejnenie komentára