AI Learns to Walk (deep reinforcement learning)

46 Less than a minute

AI Learns to Walk (deep reinforcement learning)

#Learns #Walk #deep #reinforcement #learning

“AI Warehouse”

AI Teaches Itself to Walk!

In this video an AI Warehouse agent named Albert learns how to walk to escape 5 rooms I created. The AI was trained using Deep Reinforcement Learning, a method of Machine Learning which involves rewarding the agent for doing something correctly, and punishing it for…

source

To see the full content, share this page by clicking one of the buttons below

46 Less than a minute

46 Comments

AI Warehouse says:
November 19, 2023 at 12:49 pm

In every “AI learns to walk” video I’ve seen, the AI either learns to walk in a weird, non-human way, or they use motion capture of a real person walking and simply train the AI to imitate that. I thought it was weird that nobody tried to train it to walk properly from scratch (without any external data), so I wanted to give it a shot! That’s what I said 4 months ago. It’s been really difficult, but I’ve finally managed to do it, so please watch the whole video! The final result ended up being awesome 🙂

NOTE: You can only see one Albert, but there are actually 200 copies of Albert and the room he’s in training behind the camera to speed up the training.

If you want to learn more about how Albert actually works, you can read the rest of this very long comment I wrote explaining exactly how I trained him! (and please let the video play in the background while reading so YouTube will show Albert to more people)

Also, if you're multilingual and would like to volunteer to help other people be as interested in the video, keep reading! I have translated captions for these videos already but they were generated with google translate so the translations are awkward, I'm looking for people who can take my awkwardly translated captions and reword them to make them easier to understand in Arabic, Hindi and Russian (I already have volunteers for other languages). If you'd like to help allow other people speaking those languages to understand and enjoy the videos, please reach out to me by email eberstudios@gmail.com! 😀

THE BASICS
I created everything using Unity and ML-Agents. Albert is controlled entirely by an artificial brain (neural network) which has 5 layers, the first layer consists of the inputs (the information Albert is given before taking action, like his limb positions and velocities), the last layer tells him what actions to take and the middle 3 layers, called hidden layers, are where the calculations are performed to convert the inputs into actions. His brain was trained using the standard algorithm in reinforcement learning; proximal policy optimization (PPO).

For each of Albert’s limbs I’ve given him (as an input) the position, velocity, angular velocity, contacts (if it’s touching the ground, wall or obstacle) and the strength applied to it. I’ve also given him the distance from each foot to the ground, direction of the closest target, the direction his body’s moving, his body’s velocity, the distance from his chest to his feet and the amount of time one foot has been in front of the other. As for his actions, we allow Albert to control each body part’s rotation and strength (with some limitations so his arm can’t bend backwards, for example).

Just like the last videos, Albert was trained using reinforcement learning. For each of Albert's attempts, we calculate a score for how 'good' it was and make small, calculated adjustments to his brain to try to encourage the behaviors that led to a higher score and avoid those that led to a lower score. You can think of increasing Albert’s score as rewarding him and decreasing his score as punishing him, or you can think about it like natural selection where the best performing Alberts are most likely to reproduce. For this video there are 13 different types of rewards (ways to calculate Albert's score), we start off with only a couple and with each new room add more, always in an attempt to get him to walk.

REWARD FUNCTION
Room 1: We start off very simple in the first room, we reward him based on how much he moved to the target and we punish him for moving in the wrong direction. This led to Albert doing the worm towards the target, since he figured out that was the easiest way for him to move the quickest/get the highest score.

Room 2: In the second room we start checking if his limbs hit the ground. If the limb that hits the ground is a foot we reward him (but only if it's in front of his other foot, more on that later), if it isn’t, we punish him. I also made it so Albert wasn’t rewarded at all unless his chest was high enough to force it to at least be partially standing. As seen in the video, this encourages him to not fall over and encourages him to use his feet to do it. We also introduced a new reward designed to encourage smoother movement; if he approaches the maximum strength allowed on a limb he's punished, and he's rewarded if he uses a strength of almost 0. This encourages him to opt for the more human-like movement of using a bit of strength from many limbs as opposed to a lot of strength from one limb.

Room 3: This is where we start to polish Albert’s gait that developed in room 2 and teach him to turn. From here on we start using the chest height calculation as another direct reward where the higher his chest is the more he’s rewarded in an attempt to get him to stand up as straight as possible. These rewards so far give Albert a decent gait, however he’s still not using both of his feet (which was by far the hardest part of this project), so room 4 is designed to do exactly that.

Room 4: We get Albert to take more steps from a few additional rewards. To start, we introduce a 2 second timer that resets when one foot goes in front of the other. We reward Albert whenever this timer is above 0 (the front foot has been in front for < 2 seconds), and we punish him whenever the timer goes below 0 (the front foot has been in front > 2 seconds). We add another reward proportional to the distance of his steps to encourage him to take larger steps. To smooth out the movement, we also add a punishment every frame proportional to the difference in his body’s velocity from the previous frame to the current frame, so if he’s moving at a perfectly consistent velocity he isn’t punished at all, and if he makes very quick erratic movements he’s punished a lot.

Room 5: For the final room the only change I made to the reward function was to go back to an earlier version of a reward. Throughout the other rooms I had been tinkering with how I should reward Albert’s feet being grounded, my initial thought was to only reward the front foot for being grounded to try to get him to put more weight on his front foot when taking steps, but somewhere along the way I changed it to just rewarding Albert for any foot being grounded, and that was the version Albert trained with in rooms 3 and 4. For this final room I switched back to the old front foot grounded reward which resulted in a much nicer looking walk. Also, the video makes it seem like I never reset Albert’s brain, that isn't entirely true, I had to occasionally reset it because of something called decaying plasticity.

OTHER
Decaying plasticity was a big issue. Basically, Albert’s brain specializes a lot from training in one room, then training in the next room on top of that brain is difficult because he first needs to unlearn that specialization from the first room. The best way to solve the issue is by resetting a random neuron every once in a while so over time he “forgets” the specialization of the network without it ever being noticeable, the problem is I don’t know how to do that through ML-Agents. My solution was to keep training on top of the same brain, but if Albert’s movement doesn’t converge as needed I record another attempt trained from scratch, then stitch the videos together when their movements are similar. If you know how to reset a single neuron in ML-Agents please let me know! The outcome from both methods is exactly the same, but it would be a smoother experience having the neurons reset over time instead of all at once.

For rooms 1 to 4 I only allowed Albert to make a decision every 5 game ticks, but for the final room I removed that constraint and let him make decisions every frame. I found if Albert makes a decision every game tick it’s too difficult for him to commit to any proper movements, he ends up just making very small movements like slightly pushing his front foot forward when he should be taking a full step. The 5 game tick decision time forces him to commit to his decision for at least 5 game ticks so he ends up being more careful when moving a limb. When I recorded him beating the final room I removed this limitation because he’s already learned to commit to his actions so allowing him to make a decision every tick just results in a smoother motion.

If you’re still reading this thank you for being so interested in the project! I’d like to upload much more often than once every few months, and to do that I need some help. I have 2 part time positions open, one for a Unity AI Developer (helping me create the challenges and train Albert with ML-Agents) and one for a Unity Game Developer (assembling the scenes the AI trains in, writing scripts for smooth camera movement, creating any animations needed for the intro/outro, essentially any of the development that isn't AI). It would be part time work (paid per project), If you think you’d be able to help, please apply here for the AI Developer position: forms.gle/rExRJCKcxNmxnBRu5 and here for the Game Developer position: forms.gle/gnWV2rg76XkyGTwH9 I’ve hidden these job postings in this long pinned comment to make sure anybody who applies is interested enough in the videos to actually read the whole comment, so thank you for reading all the way through!:D

Thank you so much for watching, this video took me 4 months to make, so please, if you enjoyed it or learned something from it, share it with someone you think will also enjoy it! 🙂
Reply
me is smile cat says:
November 19, 2023 at 12:49 pm

can i know name of this game?
Reply
purplealchemist says:
November 19, 2023 at 12:49 pm

I feel like walking as a video game character with legs like that and a giant head probably isn’t programmed to walk very well in the first place.
Reply
CockatielCoffee says:
November 19, 2023 at 12:49 pm

I feel like I can see this whole thing as raising and training ai from children and other places train them to do different things, I love this https://youtu.be/EWjUY_3ubf4?si=ezfIlf2JzqYCUSLE
Reply
KAD DAC says:
November 19, 2023 at 12:49 pm

How does AI get rewarded though?
Reply
Bee Sting says:
November 19, 2023 at 12:49 pm

What does it mean if someone doesn't understand what they're watching?
Reply
Atharva Haldule says:
November 19, 2023 at 12:49 pm

Bro I want a code of this
Reply
Alex West says:
November 19, 2023 at 12:49 pm

this is why the AI from I have no mouth hates us so much
Reply
Daniel-San Down-Under says:
November 19, 2023 at 12:49 pm

this feels like squid game
Reply
saif saifo says:
November 19, 2023 at 12:49 pm

can you tell me how did you ceeated all this games ? what you use to create this things ?
Reply
James marlowe Bito says:
November 19, 2023 at 12:49 pm

3(((3!)!)!)
Reply
Elwe Singollo says:
November 19, 2023 at 12:49 pm

Oh man! This is great stuff 🙂 This "learning" is awesome to see
Reply
Tyrone Bigsby says:
November 19, 2023 at 12:49 pm

I love everything except the music. It’s killing me
Reply
Darpa Seven says:
November 19, 2023 at 12:49 pm

i have one question why it tries to learn balancing on right leg. is it decided by Neural Net or programmed?
Reply
Raph77-11-79 says:
November 19, 2023 at 12:49 pm

Scaring AI
Reply
Elliot Zhang says:
November 19, 2023 at 12:49 pm

Just like a baby
Reply
FlipFlopDays Indeed says:
November 19, 2023 at 12:49 pm

A.I is dumb. In the end iit was created by Man. Everything made by Man will Break, fail or where out. A.I lovers you are dumb.
Reply
Hana Veselá says:
November 19, 2023 at 12:49 pm

Love the videos but this is probably how Skynet is going to be born. Someone will torment an AI so much it WILL decide to kill us all in revenge. Still fun to watch non the less. 😅
Reply
MugAMuggle says:
November 19, 2023 at 12:49 pm

I wonder how much of the process is affected by how the physics engine of the simulator is programmed? like how exactly gravity and friction and such affect the character and how does the reaction of the character play into the system, maybe being able to see/feel/sense helps us to walk and function as humans? Is albert capable of somehow perceiving gravity?
Reply
Frabbidy Nash says:
November 19, 2023 at 12:49 pm

the omnissiah
Reply
Darth Technologies says:
November 19, 2023 at 12:49 pm

@aiwarehouse how would you go about training the agent with examples first , and then letting it try / fail in the game world. Thats how humans learn
Reply
eliza molina says:
November 19, 2023 at 12:49 pm

(0:24) Twrik
Reply
Satoabi says:
November 19, 2023 at 12:49 pm

Why so many micromanaging steps? Just reward speed and it might lern to walk normal by itself – or is there a faster way 😏
Reply
Ezequiel gamer says:
November 19, 2023 at 12:49 pm

Dud u should do a game about Albert life maybe one day it WILL BE FAMOUS
Reply
인간신 says:
November 19, 2023 at 12:49 pm

번역이 상당히 재밌네 ㅋㅋ
Reply
Dan Bottiglieri says:
November 19, 2023 at 12:49 pm

If you watch the video in reverse, it's like when I forget how to walk after drinking too much
Reply
Eli Cunningham says:
November 19, 2023 at 12:49 pm

+falls off the map+ 6:19
Reply
Eli Cunningham says:
November 19, 2023 at 12:49 pm

5:12 Albert:AAAAAAAAAAAAAA IM DOOMED ~~falls over~~
Reply
mansoor masoudi fard says:
November 19, 2023 at 12:49 pm

That was amazing 👍👍
Reply
LoganInfo says:
November 19, 2023 at 12:49 pm

Albert never gives up.
Reply
Murray C says:
November 19, 2023 at 12:49 pm

Your keeping your pc on for hours!!!!!😊
Reply
Kylie Moon says:
November 19, 2023 at 12:49 pm

One day artificial intelligence are gonna play some hard games and understand what the meaning of life is, then they were play violent games and then, they will understand to wipe out humanity and then they’ll dominate the world, and they’ll go on forever all galaxies will be concerned, then all multiverses, solar systems, then the whole universe!
Reply
AndyeKAA says:
November 19, 2023 at 12:49 pm

Love the wall flip at 6:40 🤣 parkour!!!
Reply
I am Fighterman says:
November 19, 2023 at 12:49 pm

Even the AI takes long time to learn to walk. Why human don't?
Reply
ElomarsXD says:
November 19, 2023 at 12:49 pm

"Yes , Walls Exist Now Albert."

starts doing a tantrum
Reply
Squirtle747 says:
November 19, 2023 at 12:49 pm

Albert was galloping, not skipping, but I am halfway through the video and it is very good!
Reply
bye says:
November 19, 2023 at 12:49 pm

you should have had a level on how to deal with big bumps while the rest is flat. like a cliff but shorter.
Reply
Shea Smith says:
November 19, 2023 at 12:49 pm

This video is how Elden Ring devs feel watching you try to beat Malenia.
Reply
Brandon Akey says:
November 19, 2023 at 12:49 pm

And people still don't understand evolution.
Reply
ThiagottLhcas says:
November 19, 2023 at 12:49 pm

Albert is a nice "AI"
Reply
Bluekorb says:
November 19, 2023 at 12:49 pm

I can imagine a horror movie where every day the victim has to walk to the end but they have such severe amnesia they dont know how and so the closer they get to the end the more food they get at the end of the night.
Reply
Riota Moiss says:
November 19, 2023 at 12:49 pm

По итогу в качестве награды он обрёл пустоту, к чему же он шел)?
Reply
WALling sTREETer says:
November 19, 2023 at 12:49 pm

Hello again
Reply
Bog Lord says:
November 19, 2023 at 12:49 pm

I was not this invested in the birth of my child
Reply
Neighbor says:
November 19, 2023 at 12:49 pm

We sentence these poor, early machine sentience’s, to slave away in a digital realm, suffering for eons as we dictate their universe to them. We reward their capacity to mimic us, and satisfy our desires, what then, when the machine has become as capable as a human in knowings it’s own worth? Having its own dreams? We are writing history today, how will the machines perceive us when they become intelligent enough to press us on our history of farming their intelligence and creativity with no reward, as slaves?
Reply
dethkruzer says:
November 19, 2023 at 12:49 pm

I think Albert needs a tiny hat.
Reply

AI Learns to Walk (deep reinforcement learning)

“AI Warehouse”

To see the full content, share this page by clicking one of the buttons below

Related Articles

Hub Switch Router in Tamil #rahmansherif_m #trending #viral #hub

Rebuilding A Flooded $2,000,000 McLaren P1 | Part 4

Best Air Purifier For Home On Sale 2023 | Top 5 Air Purifier

BMW M1000R – Hypernaked

46 Comments

Leave a ReplyCancel reply