Machine Learning with Phil
Machine Learning with Phil
  • 162
  • 1 639 699
Use This Framework to Get Started with Reinforcement Learning
You can download the framework from here:
github.com/philtabor/ProtoRL
Learn how to turn deep reinforcement learning papers into code:
Get instant access to all my courses, including the new AI Applications course, with my subscription service. $29 a month gives you instant access to 40 hours of instructional content plus access to future updates, added monthly.
Discounts available for Udemy students (enrolled longer than 30 days). Just send an email to sales@neuralnet.ai
www.neuralnet.ai/courses
Or, pickup my Udemy courses here:
Deep Q Learning:
www.udemy.com/course/deep-q-learning-from-paper-to-code/?couponCode=DQN-JUNE-22
Actor Critic Methods:
www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/?couponCode=AC-JUNE-22
Curiosity Driven Deep Reinforcement Learning
www.udemy.com/course/curiosity-driven-deep-reinforcement-learning/?couponCode=ICM-JUNE-22
Natural Language Processing from First Principles:
www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP1-JUNE-22
Just getting started in deep reinforcement learning? Check out my intro level course through Manning Publications.
Reinforcement Learning Fundamentals
www.manning.com/livevideo/reinforcement-learning-in-motion
Here are some books / courses I recommend (affiliate links):
Grokking Deep Learning in Motion: bit.ly/3fXHy8W
Grokking Deep Learning: bit.ly/3yJ14gT
Grokking Deep Reinforcement Learning: bit.ly/2VNAXql
Come hang out on Discord here:
discord.gg/Zr4VCdv
Website: www.neuralnet.ai
Github: github.com/philtabor
Twitter: MLWithPhil
Переглядів: 2 193

Відео

Deep Q Learning for Malware: Black Hat Reinforcement Learning
Переглядів 1,4 тис.7 місяців тому
In June of 2023, researchers from Switzerland showed how they can use Deep Q Learning to deploy encryption malware while evading detection by a machine learning algorithm. Here's the paper I referenced in the video arxiv.org/abs/2306.15559 Learn how to turn deep reinforcement learning papers into code: Get instant access to all my courses, including the new AI Applications course, with my subsc...
Using Ollama for Local Large Language Models
Переглядів 2,1 тис.7 місяців тому
We can use ollama.ai to run open source large language models locally. I can't promise that the output will be up to ChatGPT standards, but in a pinch, it will do. You can run the latest Mistral, Mixtral, Llama and Codellama models locally, without an H100 cluster. Learn how to turn deep reinforcement learning papers into code: Get instant access to all my courses, including the new AI Applicat...
Artificial Intelligence Writes Assembly Code | Alpha Dev Explained
Переглядів 2,4 тис.Рік тому
A recent paper by Deep Mind detailed their new reinforcement learning agent: Alpha Dev. Alpha Dev learned to write assembly code to optimize sorting algorithms that humans had long thought they had squeezed every ounce of performance out of. They were wrong. Learn how it works in this video. You can find the paper here: doi.org/10.1038/s41586-023-06004-9 You can find a written article with more...
Reinforcement Learning Still A Viable Path To AGI
Переглядів 3,3 тис.Рік тому
While I have come out in support of John Carmack as a dark horse candidate to implement the world's first AGI, we can't discount one of the OGs of reinforcement learning: Richard Sutton. In a recent paper he outlined the path to AGI, and in the paper we go over today we'll talk about the common requirements for an interdisciplinary approach to an intelligent agent. I'm referencing this paper he...
Here's How Deep Mind Coded N Step Deep Q Learning
Переглядів 7 тис.Рік тому
Here's How Deep Mind Coded N Step Deep Q Learning
Watch GTC 2023 and Win a Free RTX4280
Переглядів 818Рік тому
Watch GTC 2023 and Win a Free RTX4280
VIM is a Modern Python IDE
Переглядів 14 тис.Рік тому
VIM is a Modern Python IDE
How to Code Hindsight Experience Replay | Deep Reinforcement Learning Tutorial
Переглядів 4,4 тис.Рік тому
How to Code Hindsight Experience Replay | Deep Reinforcement Learning Tutorial
John Carmack will Develop True Artificial Intelligence. Here is Why
Переглядів 7 тис.Рік тому
John Carmack will Develop True Artificial Intelligence. Here is Why
Is Apple's M2 Max Good for Machine Learning?
Переглядів 37 тис.Рік тому
Is Apple's M2 Max Good for Machine Learning?
When and How to Ask Programming Questions
Переглядів 793Рік тому
When and How to Ask Programming Questions
I've Been Doing This Wrong The Whole Time ... The Right Way to Save Models In PyTorch
Переглядів 3,1 тис.Рік тому
I've Been Doing This Wrong The Whole Time ... The Right Way to Save Models In PyTorch
I Asked ChatGPT To Write an Actor Critic Agent ...
Переглядів 2,4 тис.Рік тому
I Asked ChatGPT To Write an Actor Critic Agent ...
How to Write Cleaner Python Code Right Now
Переглядів 1,9 тис.Рік тому
How to Write Cleaner Python Code Right Now
I Didn't Know You Can Do This With the Type Keyword
Переглядів 2,3 тис.Рік тому
I Didn't Know You Can Do This With the Type Keyword
Implementing an Open Source Agent57
Переглядів 2,2 тис.Рік тому
Implementing an Open Source Agent57
NVIDIA Wants You To Have A Free GPU
Переглядів 2,6 тис.Рік тому
NVIDIA Wants You To Have A Free GPU
DeepMind Makes Prototyping Papers Easy with ACME
Переглядів 3,5 тис.2 роки тому
DeepMind Makes Prototyping Papers Easy with ACME
Getting Started with VIM as a Python Editor
Переглядів 45 тис.2 роки тому
Getting Started with VIM as a Python Editor
Watch GTC and win a free GPU
Переглядів 1,6 тис.2 роки тому
Watch GTC and win a free GPU
How I learned to stop worrying and love Artificial Super Intelligence
Переглядів 2 тис.2 роки тому
How I learned to stop worrying and love Artificial Super Intelligence
Mastering Robotics with Hindsight Experience Replay | Paper Analysis
Переглядів 5 тис.2 роки тому
Mastering Robotics with Hindsight Experience Replay | Paper Analysis
Why I'm Not Putting My New Course On Udemy
Переглядів 7 тис.2 роки тому
Why I'm Not Putting My New Course On Udemy
Proximal Policy Optimization is Easy with Tensorflow 2 | PPO Tutorial
Переглядів 12 тис.2 роки тому
Proximal Policy Optimization is Easy with Tensorflow 2 | PPO Tutorial
Basic Hyperparameter Tuning in DeepMinds ACME Framework
Переглядів 2 тис.2 роки тому
Basic Hyperparameter Tuning in DeepMinds ACME Framework
Getting Started with Encryption in 2022
Переглядів 1,3 тис.2 роки тому
Getting Started with Encryption in 2022
How to Code RL Agents Like DeepMind
Переглядів 6 тис.2 роки тому
How to Code RL Agents Like DeepMind
Announcing the RTX 3090 Winner
Переглядів 8972 роки тому
Announcing the RTX 3090 Winner
I can't believe I'm giving this away....
Переглядів 1,6 тис.2 роки тому
I can't believe I'm giving this away....

КОМЕНТАРІ

  • @matveyshishov
    @matveyshishov 15 хвилин тому

    "I use ED on Flexowriter via UART on my RISC-V, while standing in full scuba gear in a hammock" Vim? A great little cli editor. "I write Rust/haskell in Vim/emacs on Arch btw" -> a loser.

  • @Kalaitzo_
    @Kalaitzo_ 2 дні тому

    Hello! I would like to use SAC as shown above in order to beat the Walker2D Environment. I played around with the hyperparameters of the agent based on some research papers for said environment but i cant seem to reach a score above 600-700. I believe that the main reason is the reward scale because I can't understand clearly how it impacts the agent. Also I found out that if I increase the max size that effects negatively the score and the agent cant learn. Any advice on how to proceed. Thank you in advance and thank you for the great content!

  • @walterjonathan8947
    @walterjonathan8947 2 дні тому

    Hello Phil, I could not find the repo, please direct me where to find it

  • @walterjonathan8947
    @walterjonathan8947 3 дні тому

    Hi, Phil kindly help me with the code for your work. The link there down below does not give me the code (or guide me on how to use the link that you gave).

  • @kuimisk
    @kuimisk 13 днів тому

    real nice

  • @devvidit4740
    @devvidit4740 22 дні тому

    I literally loved your videos! That's some amazing work you do. It will help out so many people so immensely!!

  • @saleemmuhammad-l7i
    @saleemmuhammad-l7i 24 дні тому

    Can someone tell me the compatible python version foe this video?

  • @moji962
    @moji962 Місяць тому

    thank for sharing your insight.

  • @Srinivaskoti222
    @Srinivaskoti222 Місяць тому

    Can malware detection project be done using PPO or not.

  • @WilliamChen-pp3qs
    @WilliamChen-pp3qs Місяць тому

    How would it perform compare with HER (hindsight experience replay)?

  • @ljbwonline
    @ljbwonline Місяць тому

    I've been using dueling double DQN in my latest attempt to make a self-driving car in GTA V. I'd like to try this framework out at some point.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil Місяць тому

      How are you using GTA V as an environment? Got a link to some code I can use?

  • @BobGamble
    @BobGamble Місяць тому

    With your videos I was able to setup vim as you did. Other videos were outdated or didn't quite have it. Thank you!

  • @BobGamble
    @BobGamble Місяць тому

    Great setup. Thanks for sharing. I use vim for everything I can.

  • @sounakmojumder5689
    @sounakmojumder5689 Місяць тому

    HI, did anyone run this in google colab? is there any problem with spawning

  • @SurajBorate-bx6hv
    @SurajBorate-bx6hv Місяць тому

    I am getting OpenGL related error when trying to visualize by setting evalute flag true.

  • @charizard_baller69
    @charizard_baller69 Місяць тому

    UT Austin has an online AI masters. Any thoughts on to the program?

  • @josephzhu5129
    @josephzhu5129 2 місяці тому

    nice tutorial thanks Phil, just subscribed udemy to learn your courses there. however here I encountered two problems, might because gym updated configuration recently, bring it up for other's reference: 1, observation = env.reset() should change to observation = env.reset()[0], 2, lunar lander environment is returning 5 values now, added istruncated, so step line should change to observation_, reward, done, istruncated, info = env.step(action)

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      Yup. I added appendices to deal with this. It must not be clear, so I shall fix it.

    • @josephzhu5129
      @josephzhu5129 2 місяці тому

      @@MachineLearningwithPhil perfect, thanks!

  • @anus4618
    @anus4618 2 місяці тому

    your videos are good. I'm trying to implement a actor-crtic algorithm for modellimg a process. My process has input flow rate, concentration of species and the output is pH.I'm struggling to implement since i'm a beginner. Kindly make a video tutorial of how to implement actor-critic for a process modelling. It will be helpful for students like us to follow and learn.

  • @ytnowayyy
    @ytnowayyy 2 місяці тому

    Anyone can suggest a way to use this for website optimization (CRO - optimizing fonts and colors, etc)

  • @oscarrcortinac5213
    @oscarrcortinac5213 2 місяці тому

    I dont understand why in your critis_loss (47:14), the "returns" is defined as the advantage + value? Where in the paper is mentioned?

    • @oscarrcortinac5213
      @oscarrcortinac5213 2 місяці тому

      NVM, after reading and doing some simple math and research, I understood your logic. You are adding the values with the advantages to extract only the V_targets which in this case you are calling the returns and then you just follow the MSE they mentioned in the paper between V_target and V_theta.

  • @user-vc5ny6rn7g
    @user-vc5ny6rn7g 2 місяці тому

    Hello Dr.Phil, I was testing your framework, specifically the dqn and ppo test files, and I found some abnormal behavior. I wanted to know if I could write to you in a specific place about what is happening or if I simply wrote it to you in the github issues

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      I saw some issues come through on GitHub. Was that you?

    • @user-vc5ny6rn7g
      @user-vc5ny6rn7g 2 місяці тому

      @@MachineLearningwithPhil yes

    • @user-vc5ny6rn7g
      @user-vc5ny6rn7g 2 місяці тому

      @@MachineLearningwithPhil I keep checking what it could be, I'm seeing if it is the prioritized buffer but apparently it isn't either, I checked the policy and apparently it isn't either. There are several things that could be the cause of this behavior.

  • @user-sf1gv5hm5c
    @user-sf1gv5hm5c 2 місяці тому

    I really loved your point of view very much, it falls within what is called in psychology "positive reinterpretation" as one of the methods of copy with stresses, with your sometimes based on scientific and logical evidence, thank you from the bottom of the heart. Dr. Entesar Abdul Salam Karman

  • @qiaomuzheng5800
    @qiaomuzheng5800 2 місяці тому

    ‘The rewards attribution problem...and the result is just poor learning, nobody getting anything done. Typical right?’ So funny XD

  • @qiaomuzheng5800
    @qiaomuzheng5800 2 місяці тому

    Maximum appreciations!

  • @qiaomuzheng5800
    @qiaomuzheng5800 2 місяці тому

    Huge fan here. Will you be considering doing some model based RL such as the DreamerV3 in the future?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      It's on the roadmap. If you want updates you can sign up for a free course on my website.

  • @eduardorosentreter
    @eduardorosentreter 2 місяці тому

    Hello, I'm having difficulties running docker-compose, it's my first time on Linux, I decided to change to a real operating system but I don't know if I'm doing things right or not, when I do docker-compose it's telling me that does not find .env, that file is not in the directory.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      There's been an update. We have a more user friendly v0.2 on the way. Stay tuned for a video.

    • @eduardorosentreter
      @eduardorosentreter 2 місяці тому

      @@MachineLearningwithPhil Hello Dr, thank you very much for the response, at the moment I am testing both your agent 57 repository and your new framework, that of agent 57 I was able to train an agent only for some reason it got stuck on my 7th epoch and I decided to go to something more basic, your new framework, I am aware of its updates.

  • @michaelmueller9635
    @michaelmueller9635 2 місяці тому

    I think using Neovim instead of Vim is a much better choice. Especially when you go for the plugin ecosystem.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      I just installed it the other day and I'm getting to know it. So far so good.

    • @michaelmueller9635
      @michaelmueller9635 2 місяці тому

      @@MachineLearningwithPhil Steep, but very rewarding learning curve ;-)

  • @eduardorosentreter
    @eduardorosentreter 2 місяці тому

    It will be great when they manage to combine reinforcement learning with quantum computing, most likely if they achieve it the jump in learning times will be brutal.

  • @user-vc5ny6rn7g
    @user-vc5ny6rn7g 2 місяці тому

    Muchisimas gracias por este contenido de gran valor ❤

  • @user-ud6pz4vh5k
    @user-ud6pz4vh5k 2 місяці тому

    Thank you, Doctor. I would like to know the differences between this framework and the stable-baselines library.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      I always found stable baselines to be hard to use. This is meant to be more user friendly, and I'm committed to adding new algorithms.

  • @ArbaazTanveer
    @ArbaazTanveer 2 місяці тому

    The rendering is not working,the window shows empty and its not responding after sometime,the training worked well though😢

  • @brucetepke8150
    @brucetepke8150 2 місяці тому

    Evolution strategies are sometimes used for the same types of problems as reinforcement learning. Is there a reason to prefer RL over ES, other than the popularity bandwagon?

    • @pabloe1802
      @pabloe1802 2 місяці тому

      In RL, you more or less control the learning process. For example, using Expected Sarsa is better than Q learning if you don't want your robot to fall off the cliff. ES not so much control of it, more like try an error

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      ES, as I recall, is used to find solutions to optimization problems. If that's the case, it can be used as an alternative to gradient descent to optimize the weights of the network. RL just refers to the idea that we use rewards (as opposed to training labels) to estimate the optimal Bellman equation.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      Also, are you the same Bruce Tepke from WVU?

    • @brucetepke8150
      @brucetepke8150 2 місяці тому

      @@MachineLearningwithPhil I am.

  • @giorgos13ization
    @giorgos13ization 2 місяці тому

    Really great content and a good choice (in my opinion) following the path to Deep RL and QC.

  • @Alex-zr7wr
    @Alex-zr7wr 2 місяці тому

    I'll have an opportunity to use this in the coming week. So exciting that the framework is finally ready!

  • @josepha8415
    @josepha8415 2 місяці тому

    Do you work at FAANG

  • @vivekpadman5248
    @vivekpadman5248 2 місяці тому

    Another Dr. phil video yay ❤...

  • @quintinbowman7993
    @quintinbowman7993 3 місяці тому

    Ok I’m 37 and I’m serious considering should I go for a PhD or should I just stop at a masters

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 3 місяці тому

      I would stop at masters and get a job. PhD is a lot of extra work and the benefits are unclear.

  • @TTB-to9yk
    @TTB-to9yk 3 місяці тому

    me ha sido muy util, muchas gracias

  • @einsteinsapples2909
    @einsteinsapples2909 3 місяці тому

    9:35 "The advantage is just a measure of the goodness of each state", that is not correct. The advantage is a measure of how much better a particular action is compared to the average action taken from the same state.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 2 місяці тому

      Yes, thank you for the correction. I hope you can forgive the mistake; they're easy to make when the camera is recording

  • @jose-alberto-salazar-jimenez
    @jose-alberto-salazar-jimenez 3 місяці тому

    Thanks for update. I wonder how would you go about loading the model, once trained, for "testing"? I've tried for example, loading the model state to q_eval, setting q_eval to ".eval()" mode, using "with torch.no_grad()", then getting the predictions with "model(observation)", but the model/agent doesn't perform as how it was doing during training (it does really bad in comparison).

  • @jose-alberto-salazar-jimenez
    @jose-alberto-salazar-jimenez 3 місяці тому

    I have a question... Say, one trains a model, and save its model state for later use... How would one go about loading the model state and performing testing of the agent?.... I've tried coding something (following what I've found on the internet, being, in a nutshell, loading the model state, changing it to eval model, then with torch no grad, selecting the actions greedily), which during training does pretty well at the end of its training (learning was expected), but when I try testing (for instance, to show others its performance), it performs horribly... can anybody help me?

  • @sounakmojumder5689
    @sounakmojumder5689 3 місяці тому

    hi sir, could you please tell me the pytorch version you have used in this script

  • @martinfunkquist5342
    @martinfunkquist5342 3 місяці тому

    I get the following error when doing "critic_loss.backward()": "RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [256, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead." Has anyone else encountered it?

  • @sounakmojumder5689
    @sounakmojumder5689 3 місяці тому

    did someone got this kind of broadcasting error in pytorch, running this code(not exactly copy) UserWarning: Using a target size (torch.Size([100, 100])) that is different to the input size (torch.Size([100])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. critic_loss = F.mse_loss(q1_old_policy.view(-1),q_target) + F.mse_loss(q2_old_policy.view(-1),q_target),

  • @FullSimDriving
    @FullSimDriving 3 місяці тому

    Hi Phil, I am exploring RL for autonomous driving in my channel by trying to outsource RL to stable baselines3, however I very quickly got into library conflicts in SB3. Now I was looking for ground up builds of RL and found this video. Thanks. I will try to implement it. If you have any other tips, please reach out. Hopefully my videos describe what I would like to do. Thanks.Vadim

  • @veniciussoaresdasilva6614
    @veniciussoaresdasilva6614 4 місяці тому

    Hi @MachineLearningwithPhil, do you plan do some video with multiplos agent as we can see on recent papers like Seek and Hidden from OpenAI?

  • @DANIELCOLOMBARO
    @DANIELCOLOMBARO 4 місяці тому

    Thank you for the informative content! Unfortunately, as of April 2024, the code throws the following error: " File "...\main.py", line 31, in <module> action, prob, val = agent.choose_action(observation) File "...\ppo_torch.py", line 136, in choose_action state = T.tensor([observation], dtype=T.float).to(self.actor.device) ValueError: expected sequence of length 4 at dim 2 (got 0)".I know 'CartPole-v0' is outdated but updating it does not solve the problem.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil 4 місяці тому

      The newest gym returns the observation and info from reset. It also returns new observation, reward, done, truncated, info from the step function.

  • @MrLazini
    @MrLazini 4 місяці тому

    Great videos Phil. One minor improvement you might consider is Loudness normalization to increase the clarity of your voice, as it is kind of low compared to other audio outputs. Keep up the great work!

  • @clapdrix72
    @clapdrix72 4 місяці тому

    @MachineLearningwithPhil will this work with a custom tf_py_environment or py_environment?

  • @broimnotyourbro
    @broimnotyourbro 4 місяці тому

    Try running the 70b llama3 model, see how it goes. On my Mac Studio M2 Ultra it runs faster than ChatGPT. On my PC with a 4090 it's a couple seconds per token. It's night and day (but the PC is faster with smaller models like llama3:8b.) A 4090 doesn't have enough VRAM to run large models.

    • @broimnotyourbro
      @broimnotyourbro 4 місяці тому

      There is literally no comparable system for a large model like this for less than $20K. (Cost of the Studio Ultra with 128GB RAM is ~$5K) I wish Nvidia would ship a card with more VRAM, but they won't because it would cannibalize the sales of $25K data center cards. A 4090 with 64GB RAM would scream.