Browse Course Material

Course info, instructors.

  • Prof. Peter Szolovits
  • Prof. David Sontag

Departments

  • Electrical Engineering and Computer Science
  • Health Sciences and Technology

As Taught In

  • Artificial Intelligence
  • Human-Computer Interfaces
  • Medical Imaging
  • Public Health

Learning Resource Types

Machine learning for healthcare, lecture 16: reinforcement learning, part 1.

Dr. Johansson covers an overview of treatment policies and potential outcomes, an introduction to reinforcement learning, decision processes, reinforcement learning paradigms, and learning from off-policy data.

Speaker: Fredrik D. Johansson

Lecture 16: Reinforcement Learning slides (PDF)

  • Download video
  • Download transcript

facebook

You are leaving MIT OpenCourseWare

presentation on reinforcement learning

CS234: Reinforcement Learning Spring 2024

presentation on reinforcement learning

Announcements

  • The poster session will be from 11:30am-2:30pm in the Huang Foyer (area outside of NVIDIA auditorium).

Course Description & Logistics

  • Lectures will be live every Monday and Wednesday: Videos of the lecture content will also be made available to enrolled students through canvas.
  • Lecture Materials (videos and slides) : All standard lecture materials will be delivered through modules with pre-recorded course videos that you can watch at your own time. Each week's modules are listed in the schedule and can be accessed here , and will be posted by the end of Sunday before that week's class. Guest lectures will be presented live and recorded for later watching. Recordings will be available to enrolled students through Canvas.
  • 1:1 office hours: Students can sign up for 1:1 office hours with faculty and CAs. These will all be appointment-based so that students need not to wait in queue. See our calendar for times and sign up links. [Office hour schedules will be posted by the end of Tuesday on week 1] --> here . These may be offered in person but will definitely be offered via zoom. -->
  • Problem session practice: We will also make available optional additional problem session questions and videos to provide additional opportunities to learn about the material.
  • Quizzes: Instead of a large high-stakes midterm, there will be four quizzes over the course. We will drop the lowest score of Quiz 1-3.
  • Project: There will be no final project.
  • --> Platforms: All assignments and quizzes will be handled through Gradescope, where you will also find your grades. We will send out links and access codes to enrolled students through Canvas. You can find Winter 2023 materials here.
    -->
    Tabular MDP planning


    [Assignment 1 Released] Apr 8
    Apr 9
    Apr 10 Apr 11 Apr 12 Apr 13 Apr 14 Lecture Materials


    -->
    --> Policy Evaluation


    5:30pm-6:30pm
    Q learning and Function Approximation

    Assignment 1


    [Assignment 2 Released]
    Apr 15
    Apr 16 Apr 17 Apr 18 Apr 19
    Apr 20 Apr 21 Lecture Materials




    [Quiz 1]

    --> Policy Search 1



    Policy Search 2

    Apr 22
    Apr 23
    Apr 24 Apr 25 Apr 26 Apr 27 Apr 28 Lecture Materials




    [Quiz 1]

    --> Policy Search 3



    Offline RL 1

    Assignment 2

    Apr 29
    Apr 30
    May 1 May 2 May 3 May 4 May 5 Lecture Materials


    [Assignment 2]

    --> Offline RL 2



    Midterm
    [Assignment 3 Released]
    Project Proposal

    May 6
    May 7
    May 8 May 9 May 10 May 11 May 12 Lecture Materials


    [Assignment 2]

    --> Offline RL 3



    Exploration 1

    May 13
    May 14
    May 15 May 16 May 17 May 18 May 19 Lecture Materials


    [Assignment 2]

    --> Exploration 2



    Exploration 3

    Assignment 3
    May 20
    May 21
    May 22 May 23 May 24 May 25 May 26 Lecture Materials


    [Assignment 2]

    --> Multi-Agent Game Playing



    Guest Lecture


    Project Milestone
    May 27
    May 28
    May 29 May 30 May 31 Jun 1 Jun 2 Lecture Materials


    [Assignment 2]

    --> Memorial Day:
    No Class

    In-class Quiz

    Jun 3 Jun 4 Jun 5 Jun 6 Jun 7 Jun 8 Jun 9 Lecture Materials
    Value Alignment

    Poster Session

    Jun 10 Jun 11 Jun 12 Jun 13 Jun 14 Jun 15 Jun 16 Final Project Report
    • Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available for free here and references will refer to the final pdf version available here .
    • Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds. [ link ]
    • Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig.[ link ]
    • Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [ link ]
    • David Silver's course on Reinforcement Learning [ link ]

    Grade Breakdown

    • Assignment 1: 10%
    • Assignment 2: 18%
    • Assignment 3: 18%
    • Midterm: 25%
    • Course Project: 24%
    • Proposal: 1%
    • Milestone: 2%
    • Poster Presentation: 5%

    Late Day Policy

    • You can use 5 late days total.
    • A late day extends the deadline by 24 hours.
    • You are allowed up to 2 late days for assignments 1, 2, 3, project proposal, and project milestone, not to exceed 5 late days total. You may not use any late days for the project poster presentation and final project paper. For group submissions such as the project proposal and milestone, all group members must have the corresponding number of late days used on the assignment, and if one or more members do not have a sufficient amount of late days, all group members will incur a grade penalty of 50% within 24 hours and 100% after 24 hours, as explained below.
    • If you use two late days and hand an assignment in after 48 hours, it will be worth at most 50%. If you do not have enough late days left, handing the assignment within 1 day after it was due (adjusting for the late days used) will be worth at most 50%. No credit will be given to assignments handed in after 24 hours they were due (adjusting for any late days. E.g. if you use 2 late days, then after this policy applies 24 hours after your 2 late days, e.g. after 72 hours). Please contact us if you think you have an extremely rare circumstance for which we should make an exception. This policy is to ensure that feedback can be given in a timely manner.
    • There will be one midterm and one quiz. See the schedule for the dates.
    • Exams will be held in class for on-campus students.
    • Conflicts : If you are not able to attend the in class midterm and quizzes with an official reason, please email us at [email protected] , as soon as you can so that an accommodation can be scheduled. (Historically this is either to ask you to take the exam remotely at the same time, or to schedule an alternate exam time).
    • Notes for the exams : You are welcome to bring a 1-sided 1 (letter sized) page of handwritten notes to the midterm. For the quiz you are welcome to bring a double sided (letter sized) page of handwritten notes. No calculators, laptops, cell phones, tablets or other resources will be allowed.

    Assignments and Submission Process

    • Assignments : See Assignments page where all the assignments will be posted.
    • Computing Resources : We will have some cloud resources available for later assignments.
    • Submission Process : The submission instructions for the assignments can also be found on the Assignments page .

    Office Hours

    • Individual 1:1 15 minute office hours that you can sign up here . See our calendar for detailed schedules. Video conference links will be provided during sign up.
    • Group Office Hours are 5PM-8PM on Wed, Thu, and Fri and will be held on Nooks . We will have small tables where students can work on particular problems together.
    • Go to the Zoom Client for Linux page and download the correct Linux package for your Linux distribution type, OS architecture and version.
    • Follow the linux installation instructions here .
    • Download Zoom installer here .
    • Installation instructions can be found here .
    • Go to Stanford Zoom and select 'Launch Zoom'.
    • Click 'Host a Meeting'; nothing will launch but this will give a link to 'download & run Zoom'.
    • Click on 'download & run Zoom' to obtain and download 'Zoom_launcher.exe'.
    • Run 'Zoom_launcher.exe' to install.

    Communication

    Regrading requests.

    • If you think that the course staff made a quantifiable error in grading your assignment or exam, then you are welcome to submit a regrade request. Regrade requests should be made on gradescope and will be accepted for three days after assignments or exams are returned.
    • Note that while doing a regrade we may review your entire assigment, not just the part you bring to our attention (i.e. we may find errors in your work that we missed before).

    Academic Collaboration, AI Tools Usage and Misconduct

    Academic accommodation, credit/no credit enrollment.

    Reinforcement Learning: An Introduction

    Richard s. sutton and andrew g. barto, second edition (see here for the first edition) mit press, cambridge, ma, 2018.

    Buy from Amazon Errata and Notes Full Pdf   Trimmed for viewing on computers (latest release April 26, 2022) Code Solutions -- send in your solutions for a chapter, get the official ones back Slides and Other Teaching Aids Links to pdfs of much of the literature sources cited in the book (Many thanks to Daniel Plop!) Latex Notation -- Want to use the book's notation in your own work? Download this .sty file and this example of its use

    Newly Launched - AI Presentation Maker

    SlideTeam

    Researched by Consultants from Top-Tier Management Companies

    AI PPT Maker

    Powerpoint Templates

    Icon Bundle

    Kpi Dashboard

    Professional

    Business Plans

    Swot Analysis

    Gantt Chart

    Business Proposal

    Marketing Plan

    Project Management

    Business Case

    Business Model

    Cyber Security

    Business PPT

    Digital Marketing

    Digital Transformation

    Human Resources

    Product Management

    Artificial Intelligence

    Company Profile

    Acknowledgement PPT

    PPT Presentation

    Reports Brochures

    One Page Pitch

    Interview PPT

    All Categories

    Reinforcement Learning PowerPoint Templates - Get Free Slides

    Reinforcement Learning PowerPoint Templates - Get Free Slides

    Deepali Khatri

    author-user

    Imagine you are trying to teach a robot how to play a video game.

    You can't just give it a winning strategy at the initial stage, it would be boring.

    This is when reinforcement learning enters the picture.

    It's similar to educating your robot friend by allowing it to repeatedly play the game. You give it a high five, or a virtual pat on the back in robot terminology, and say something like, " Hey, nice move !" if it accomplishes a goal, like levelling up or scoring points.

    Now, you don't scold or get mad when the robot makes mistakes. Rather, you respond with calmness, "Oh no, it didn't work. Let's give it another go."

    Through these experiences, the robot gains knowledge about which acts result in rewards and which ones, well, don't work out so well. It's similar to teaching a friend the ropes through trial and error.

    The main idea behind reinforcement learning is that the robot learns more from each game it plays. It's similar to mentoring a friend in a game; you support them, acknowledge their accomplishments, and encourage them when things go wrong.

    Thus, it aids in the improvement of machines by positive reinforcement and learning from errors, just like our friend the robot gains knowledge through play.

    Reinforcement Learning PowerPoint Templates

    Spice up your presentations on reinforcement learning without breaking a sweat by getting your hands on these editable templates. These slides will give you a killer overview of reinforcement learning. Our templates showcase use cases of reinforcement learning across various industries. From optimizing supply chains to revolutionizing healthcare, these examples will leave your audience in awe.

    You are free to customize these slides change the colors, tweak the font and play with the other elements of the deck.

    Without giving a second thought, put your hands on these slides now!

    Cover Slide

    This bold, black-themed cover slide will help you get prepared for an impactful presentation. Our template is designed for businesses seeking a seamless way to illustrate the power of reinforcement learning.

    Add an image that resonates with the essence of the topic, portraying the technological evolution shaping various industries. This cover slide is created keeping in mind modernity, while capturing the attention of the audience.

    By grabbing this template, you can effortlessly communicate the potential of reinforcement learning to transform industries. Download this slide now!

    Cover Slide

    Download this PowerPoint Template Now

    Brief Overview of Reinforcement Based Learning

    The given template provides a concise overview, beneficial for businesses dealing into machine learning. It outlines three key learning approaches: supervised, unsupervised, and reinforcement learning.

    The slide serves as a foundational tool for understanding these learning methods. It simplifies the complex data of AI by explaining how these approaches differ and their significance in training machine learning models. With editable features, it empowers users to tailor content to their specific projects and presentations.

    Download the slide to give a clear understanding of fundamental learning strategies to your audience, allowing them to fully utilize reinforcement-based learning's potential in the creation of AI. Gain insights into these approaches using this user-friendly, adaptable slide template.

    Brief Overview of Reinforcement Based Learning

    Advantages of Using Reinforcement Learning Models

    This slide elucidates how reinforcement-based learning simplifies tasks by minimizing reliance on labeled datasets. You can grab this template to showcase how it avoids the need of large labelled datasets, saving huge costs. Moreover, it tackles bias concerns while navigating complex behavioral patterns and also focuses on generating original outcomes.

    Created for companies looking for useful information, this slide helps experts understand the benefits of reinforcement learning. Developers can use these benefits to improve decision-making and accelerate procedures. It's a priceless tool for anyone trying to understand the intricacies of this advanced learning methodology, opening the door to more efficient use in a variety of applications. Make the most of your understanding and simplify difficult ideas with this extensive and easy-to-use PowerPoint slide.

    Advantages of Using Reinforcement Learning Models

    Major Types of Reinforcement Learning Models

    Get your hands on this slide to present the major types of reinforcement learning models seamlessly. It sheds light on the four major types of learning models and algorithms that programmers can use for multiple purposes. It serves as a significant tool for programmers across various industries, providing insights into essential elements such as state action, reward, q-learning, Markov decision processes, and deep reinforcement learning. The models in the slide will help you make informed decisions. It equips you with information essential for understanding the concept or applying it directly to your projects.

    Major Types of Reinforcement Learning Models

    Types of Deep Reinforcement Learning in NLP

    Get into the details of NLP with this slide that presents four primary types of deep reinforcement learning in Natural Language Processing (NLP). This essential resource highlights key concepts: Value based method, Policy based method, Model based method, and Hybrid approaches. Know about the intricacies of Q-learning, SARSA (State Action Reward State Action), and more, offering insights into their applications within NLP.

    Tailored for business professionals, this slide gives users the liberty of editing templates, simplifying complex NLP concepts for seamless integration into presentations. Understand the subtle differences between each type, which will enable communicators to explain NLP techniques with clarity. With these pre-made slides, you can easily improve your presentations while providing depth and clarity in your explanations of the various ways deep reinforcement learning is applied in the field of natural language processing.

    Types of Deep Reinforcement Learning in NLP

    Reinforcement Learning Use Cases in Digital Marketing

    This template spotlights practical applications where reinforcement learning is used in digital marketing. The slide provides details about the train data, test data, and feature selection specifics, showcasing how this technology optimizes marketing strategies.

    Examine strategies such as greedy training and policy evaluation to understand how these approaches improve decision-making. See how optimizing returns on investments and improving consumer experiences using is changing changing digital marketing. With our user-friendly, PowerPoint template, marketers can quickly implement these techniques, customize their approaches, and achieve desired outcomes. By using these insights to engage audiences and boost brand success, you can up your marketing game. Use these slides to enhance your presentations and your marketing strategy right now.

    Reinforcement Learning Use Cases in Digital Marketing

    Enhance your business strategies without any hassle with our customizable PowerPoint templates, tailored for easy modification. This blog  highlights the real-world uses as well as challenges of reinforcement learning.

    With just a few edits you can easily make your audience understand the concept of NLP, its types and its use cases by downloading this deck. Enhance your presentations and exploit the power of reinforced information by grabbing these slides. Download now!

    Click Here to Get the Free PPT

    Related posts:.

    • How to Transform Industries: Reinforcement Learning [Free Template]
    • AI Powered Marketing – The Future of Advertising (Free PPT & PDF)
    • Unleashing the Power of AI in Finance: A Paradigm Shift (Free PPT & PDF)
    • AI Text Generator : The Future of Content Creation (Free PPT & PDF)

    Liked this blog? Please recommend us

    presentation on reinforcement learning

    Unleashing the Power of AI in Finance: A Paradigm Shift (Free PPT & PDF)

    ChatGPT - Exploring the Power of this Generative AI Chatbot (Free PPT&PDF)

    ChatGPT - Exploring the Power of this Generative AI Chatbot (Free PPT&PDF)

    AI Powered Marketing - The Future of Advertising (Free PPT & PDF)

    AI Powered Marketing - The Future of Advertising (Free PPT & PDF)

    How ChatGPT is Transforming Education Sector?

    How ChatGPT is Transforming Education Sector?

    This form is protected by reCAPTCHA - the Google Privacy Policy and Terms of Service apply.

    Google Reviews

    -->

    Tentative Schedule:

    -1. Week 1 (8/27 T): Topic: Overview of Reinforcement Learning and Class Logistics Readings: N/A

    -2. Week 2 (9/3 T): Topic: RL components, Markov Decision Process, Model-based Planning. . Optional readings: Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition Sec 3.1-3.6, 4.1-4.4. Note: Project 1 starts. Readings: [ECML-PKDD'16] PULSE: A Real Time System for Crowd Flow Prediction at Metropolitan Subway Stations. (paper) Topic 2: Urban Transportation (2-1) (Slides) . Readings: CityLines: Hybrid Hub-and-Spoke System for Urban Transportation Services (paper is available in myWPI). Note: Project I starts. -->

    -3. Week 3 (9/10 T): Topic: Model-free Policy Evaluation. . Optional readings: Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition Sec 5.1-5.3, 6.1-6.6.

    -4. Week 4 (9/17 T): Topic: Model-free Control. . Note: Quiz 1 on Markov Decision Process and Model-based Control (30mins). Optional readings: Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition Sec 5.1-5.3, 6.1-6.6. Note: Project 1 due.

    -5. Week 5 (9/24 T): Topic: Value Function Approximation. . Optional readings: Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition Sec 9.1-9.4. Note: Project 2 starts.

    -6. Week 6 (10/1 T): Topic: Review of Deep Learning. Topic: Deep Reinforcement Learning. Optional readings: Mnih, Volodymyr, et al., Playing Atari with Deep Reinforcement Learning , arXiv preprint arXiv:1312.5602 (2013). Note: Quiz 2 on Model-free Policy Evaluation.

    -7. Week 7 (2/24 R): No Class. Reading day. Reading Day .

    -7. Week 7 (10/8 T): Topic: Advanced Deep Reinforcement Learning by Prof Li, and Deep Learning Implementation in Pytorch (by TA). Optional Reading #1: [AAAI 2016, Double DQN] Deep Reinforcement Learning with Double Q-learning, Hado van Hasselt and Arthur Guez and David Silver Google DeepMind https://arxiv.org/pdf/1509.06461.pdf. Optional Reading #2: [ICLR 2016] PRIORITIZED EXPERIENCE REPLAY, Tom Schaul, John Quan, Ioannis Antonoglou and David Silver Google DeepMind https://arxiv.org/pdf/1511.05952.pdf. Optional Reading #3: [ICML 2016, Dueling DQN] Dueling Network Architectures for Deep Reinforcement Learning, Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas https://arxiv.org/pdf/1511.06581.pdf. Optional Reading #4: [AAAI 2018, Rainbow] Rainbow: Combining Improvements in Deep Reinforcement Learning, Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver, AAAI 2018, https://arxiv.org/pdf/1710.02298.pdf. Note: Quiz 3 on Model-free Control. Note: Project 2 due. Note: Project 3 starts.

    -8. Week 8 (10/15 T): No Class; Fall Break Fall Break

    -9. Week 9 (10/22 T): . Topic: Advanced DQNs (Continued) and Inverse Reinforcement Learning and Imitation learning. . Note: Quiz 4 on linear function approximation for policy evaluation and Control. Note: We will have an inclass selfintroduction session, so you can start forming a team for project 4.

    -10. Week 10 (10/29 T): Topic: Imitation Learning (Continued!) and Policy as a Deep Neural Network: Policy Gradient Reinforcement Learning. Optional readings: Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition Chapter 13. optional Reading: Policy Gradient RL algorithms (a good and comprehensive blog) (For this class, reading the sections we covered is sufficient.) Note: Project 4 starts. Note: Project 3 due.

    -11. Week 11 (11/5 T): No Class! Wellness day. No Class. See this link . Note: Project 4 Proposal due.

    -13. Week 13 (11/19 T): Topic: Actor-Critic Approaches (A2C, A3C, Pathwise Derivative PG) (continued), Sparse Reward, Hierarchical RL. . Topic: RL and IRL Applications: Research work presentations from PhD students from Prof Li's group, by Menghai Pan and Xin Zhang , and Yingxue Zhang. Work #3. A work under double-blind review by Xin Zhang . Slides. Please find slides attached with the announcement message in Canvas, (not on class website, due to the anonymous review process). . Topic: More on RL (Advanced techniques), such as distributional DQN, Nosiy net DQN, hierarchical RL, etc. (Slides) . --> Note: Quiz 5 on policy gradient (including Basic PG, REINFORCE PG, and Vanilla PG).

    -14. Week 14 (11/26 T): Topic: Multi-Agent RL (MARL), and DeepMind AlphaTensor . Optional Readings: DDPG , MA-DDPG , AlphaTensor . Note: Project #4 Progressive Report is due. Please submit it to Canvas discussion board in teams.

    -15. Week 15 (12/3 T): Topic: Deep Inverse Reinforcement Learning, Multi-agent IRL, Meta-RL, and Class Review. . Optional Readings: Meta-RL , GAIL , MA-GAIL .

    -16. Week 16 (12/10 T): Topic: Project #4 Presentations. Note: Project 4 due.

    -5. Week 5 (9/19 R): Topic 1: More on Model-free Control. (Slides without annotations) and (Slides with annotations) . Note: Project 1 due. Note: Project 2 starts. --> To be updated. -4. Week 4 (9/12 R): Topic 1: Urban Mobility (1-3). (paper presentation and discussion) (Slides) Reading2: [ACM SIGSPATIAL 2016] Mining the Most Influential k-Location Set from Massive Trajectories. (find the paper in myWPI) Presenting Team: Team 2 Red team: Team 5 Topic 2: Urban Transportation (2-3). (paper presentation and disussion) (Slides) Readings: [TKDE 2015] Real-Time City-Scale Taxi Ridesharing. (paper) Presenting Team: Team 3 Red team: Team 4

    reinforcement learning

    Reinforcement Learning

    Oct 03, 2014

    670 likes | 1.27k Views

    Reinforcement Learning. Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary. Introduction. Supervised Learning:. Example Class. Reinforcement Learning:. …. Situation Reward. Situation Reward.

    Share Presentation

    • bellman update
    • utility values
    • reinforcement learning
    • utility values obey
    • utility based agents learn

    annice

    Presentation Transcript

    Reinforcement Learning • Introduction • Passive Reinforcement Learning • Temporal Difference Learning • Active Reinforcement Learning • Applications • Summary

    Introduction Supervised Learning: Example Class Reinforcement Learning: … Situation Reward Situation Reward

    Examples Playing chess: Reward comes at end of game Ping-pong: Reward on each point scored Animals: Hunger and pain - negative reward food intake – positive reward

    Framework: Agent in State Space Remark: no terminal states Example: XYZ-World e e 1 2 3 R=+5 s s n w 6 R=-9 4 5 R=+3 sw ne s s n x/0.7 x/0.3 8 R=+4 9 R=-6 7 nw Problem: What actions should an agent choose to maximize its rewards? s 10

    XYZ-World: Discussion Problem 12 TD P Bellman e e (3.3, 0.5) 1 2 3 R=+5 s s n I tried hard but: any better explanations? w 6 R=-9 4 5 R=+3 sw ne s s n (3.2, -0.5) x/0.7 x/0.3 8 R=+4 9 R=-6 7 nw s (0.6, -0.2) 10 • Explanation of discrepancies TD for P/Bellman: • Most significant discrepancies in states 3 and 8; minor in state 10 • P chooses worst successor of 8; should apply operator x instead • P should apply w in state 6, but only does it only in 2/3 of the cases; which affects the utility of state 3 • The low utility value of state 8 in TD seems to lower the utility value of state 10  only a minor discrepancy P: 1-2-3-6-5-8-6-9-10-8-6-5-7-4-1-2-5-7-4-1.

    XYZ-World: Discussion Problem 12 Bellman Update g=0.2 e e 10.145 20.72 30.58R=+5 s s n w 6-8.27R=-9 40.03 53.63R=+3 sw ne s s n x/0.7 x/0.3 83.17R=+4 9-5.98R=-6 70.001 nw s 100.63 • Discussion on using Bellman Update for Problem 12: • No convergence for g=1.0; utility values seem to run away! • State 3 has utility 0.58 although it gives a reward of +5 due to the immediate penalty that follows; we were able to detect that. • Did anybody run the algorithm for other g e.g. 0.4 or 0.6 values; if yes, did it converge to the same values? • Speed of convergence seems to depend on the value of g.

    XYZ-World: Discussion Problem 12 TD inverse R TD e e (0.57, -0.65) 1 2 3 R=+5 s s n (2.98, -2.99) w 6 R=-9 4 5 R=+3 sw ne s s n x/0.7 x/0.3 8 R=+4 (-0.50, 0.47) 9 R=-6 7 nw s • Other observations: • The Bellman update did not converge for g=1 • The Bellman update converged very fast for g=0.2 • Did anybody try other values for g (e.g. 0.6)? • The Bellman update suggest a utility value for 3.6 for state 5; what does this tell us about the optimal policy? E.g. is 1-2-5-7-4-1 optimal? • TD reversed utility values quite neatly when reward were inversed; x become –x+u with u[-0.08,0.08]. (-0.18, -0.12) 10 P: 1-2-3-6-5-8-6-9-10-8-6-5-7-4-1-2-5-7-4-1.

    XYZ-World --- Other Considerations • R(s) might be known in advance or has to be learnt. • R(s) might be probabilistic or not • R(s) might change over time --- agent has to adapt. • Results of actions might be known in advance or have to be learnt; results of actions can be fixed, or may change over time.

    To be used in Assignment3: Example: The ABC-World Remark: no terminal states e e 1 2 3 R=+5 n s n sw w w 6 R=-9 4 R=-1 5 R=-4 ne ne s s n x/0.9 x/0.1 8 R=-3 9 R=+8 7 nw Problem: What actions should an agent choose to maximize its rewards? s 10R=+9

    Basic Notations and Preview • T(s,a,s’) denotes the probability of reaching s’ when using action a in state s; it describes the transition model • A policy p specifies what action to take for every possible state sS • R(s) denotes the reward an agent receives in state s • Utility-based agents learn an utility function of states uses it to select actions to maximize the expected outcome utility. • Q-learning, on the other hand, learns the expected utility of taking a particular action a in a particular state s (Q-value of the pair (s,a) • Finally, reflex agents learn a policy that maps directly from states to actions

    Passive Learning • We assume the policy Π is fixed. • In state s we always execute action Π(s) • Rewards are given.

    Figure 21.1a All non-terminal states have reward -0.04; The two terminal states have rewards +1 and -1. Terminal State 0.8 The Agent follows arrows with probability 0.8 and moves right or left of an arrow with probability 0.1; agents are reflected off walls and transferred back to the original state, if they move towards a wall. 0.1 0.1

    Typical Trials (1,1) -0.04 (1,2) -0.04  (1,3) -0.04  (1,2) -0.04  (1,3) -0.04 …  (4,3) +1 Goal: Use rewards to learn the expected utility UΠ (s)

    Expected Utility UΠ (s) = E [ Σt=0γ R(st) | Π, S0 = s ] Expected sum of rewards when the policy is followed.

    Example (1,1) -0.04 (1,2) -0.04  (1,3) -0.04  (2,3) -0.04  (3,3) -0.04  (4,3) +1 Total reward: (-0.04 x 5) + 1 = 0.80

    Direct Utility Estimation Convert the problem to a supervised learning problem: (1,1)  U = 0.72 (2,1)  U = 0.68 … Learn to map states to utilities. Problem: utilities are not independent of each other!

    Incorrect formula replaced on March 10, 2006 Bellman Equation Utility values obey the following equations: U(s) = R(s) + γ*maxaΣs’T(s,a,s’)*U(s’) Assume γ =1, for this lecture! Can be solved using dynamic programming. Assumes knowledge of transition model T and reward R; the result is policy independent!

    Example U(1,3) = 0.84 U(2,3) = 0.92 We hope to see that: U(1,3) = -0.04 + U(2,3) The value is 0.88. Current value is a bit low and we must increase it. (1,3) (2,3)

    Bellman Update (Section 17.2 of textbook) If we apply the Bellman update indefinitely often, we obtain the utility values that are the solution for the Bellman equation!! Ui+1(s) = R(s) + γ maxa(Σs’(T(s,a,s’)*Ui(s’))) Some Equations for the XYZ World: Ui+1(1) = 0+ γ*Ui(2) Ui+1(5) = 3+ γ *max(Ui(7),Ui(8)) Ui+1(8) = 4+ γ *max(Ui(6),0.3*Ui(7) + 0.7*Ui(9) ) Bellman Update:

    Updating Estimations Based on Observations: New_Estimation = Old_Estimation*(1-) + Observed_Value* New_Estimation= Old_Estimation + Observed_Difference* Example: Measure the utility of a state s with current value being 2 and observed values are 3 and 3 and the learning rate is 0.2: Initial Utility Value:2 Utility Value after observing 3: 2x0.8 + 3x0.2=2.2 Utility Value after observing 3,3: 2.2x0.8 +3x0.2= 2.36

    Temporal Difference Learning Idea: Use observed transitions to adjust values in observed states so that the comply with the constraint equation, using the following update rule: UΠ (s)  UΠ (s) + α [ R(s) + γ UΠ (s’) - UΠ (s) ] α is the learning rate; γ discount rate Temporal difference equation. No model assumption --- T and R have not to be known.

    TD-Q-Learning Goal: Measure the utility of using action a in state s, denoted by Q(a,s); the following update formula is used every time an agent reaches state s’ from s using actions a: Q(a,s)  Q(a,s) + α [ R(s) + γ*maxa’Q(a’,s’) - Q(a,s) ] • α is the learning rate; g is the discount factor • Variation of TD-Learning • Not necessary to know transition model T!

    Active Reinforcement Learning Now we must decide what actions to take. Optimal policy: Choose action with highest utility value. Is that the right thing to do?

    Active Reinforcement Learning No! Sometimes we may get stuck in suboptimal solutions. Exploration vs Exploitation Tradeoff Why is this important? The learned model is not the same as the true environment.

    Explore vs Exploit Exploitation: Maximize its reward vs Exploration: Maximize long-term well being.

    Simple Solution to the Exploitation/Exploration Problem • Choose a random action once in k times • Otherwise, choose the action with the highest expected utility (k-1 out of k times)

    Another Solution --- CombiningExploration and Exploitation U+ (s)  R(s) + γ*maxaf(u,n) u=Ss’(T(s,a,s’)*U+(s’)); n=N(a,s) U+ (s) : optimistic estimate of utility N(a,s): number of times action a has been tried. f(u,n): exploration function (idea: returns the value u, if n is large, and values larger than u as n decreases) Example: f(u,n):= if n>navg then u else max(n/navg*u, uavg) navg being the average number of operator applications. Idea f: Utility of states/actions that have not been explored much is increased artificially.

    Applications Game Playing Checker playing program by Arthur Samuel (IBM) Update rules: change weights by difference between current states and backed-up value generating full look-ahead tree

    Summary • Goal is to learn utility values of states and • an optimal mapping from states to actions. • Direct Utility Estimation ignores • dependencies among states  we must • follow Bellman Equations. • Temporal difference updates values to • match those of successor states. • Active reinforcement learning learns the • optimal mapping from states to actions.

    • More by User

    Reinforcement Learning

    Reinforcement Learning. Mitchell, Ch. 13 (see also Barto & Sutton book on-line). Rationale. Learning from experience Adaptive control Examples not explicitly labeled, delayed feedback Problem of credit assignment – which action(s) led to payoff?

    352 views • 14 slides

    Reinforcement Learning

    Reinforcement Learning. Michael Roberts. With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998). What is RL?. Trial & error learning without model with model Structure. s3. r2. r1. s1. s2. r3. s4. RL vs. Supervised Learning.

    508 views • 21 slides

    Reinforcement Learning

    Reinforcement Learning. Yijue Hou. What is learning? . Learning takes place as a result of interaction between an agent and the world, the idea behind learning is that

    831 views • 34 slides

    Reinforcement Learning

    Reinforcement Learning. Peter Bodík. Previous Lectures. Supervised learning classification, regression Unsupervised learning clustering, dimensionality reduction Reinforcement learning generalization of supervised learning learn from interaction w/ environment to achieve a goal.

    832 views • 48 slides

    Reinforcement Learning

    Reinforcement Learning. Agenda. Online learning Reinforcement learning Model-free vs. model-based Passive vs. active learning Exploration-exploitation tradeoff. Incremental (“Online”) Function Learning. Data is streaming into learner x 1 ,y 1 , …, x n ,y n y i = f(x i )

    977 views • 44 slides

    Reinforcement Learning

    Reinforcement Learning. Slides for this part are adapted from those of Dan Klein@UCB. Main Dimensions. Model-based vs. Model-free. Passive vs. Active. Passive vs. Active

    880 views • 69 slides

    Reinforcement Learning

    Reinforcement Learning. Vimal. The learner here is a decision making agent Keeps making decisions to take actions in the environment and receive rewards (or penalty ) A set of trial-and-error runs, the agent is expected to learn the best policy that maximize the total reward . .

    647 views • 48 slides

    Reinforcement Learning

    Reinforcement Learning. Slides for this part are adapted from those of Dan Klein@UCB. Does self learning through simulator. [Infants don’t get to “simulate” the world since they neither have T(.) nor R(.) of their world]. Objective(s) of Reinforcement Learning. Given

    847 views • 67 slides

    Reinforcement Learning

    Reinforcement Learning. Slides from R.S. Sutton and A.G. Barto Reinforcement Learning: An Introduction http://www.cs.ualberta.ca/~sutton/book/the-book.html http://rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse.html. r. r. r. . . . . . . t +1. t +2. s. s. t +3. s. s. t +1.

    916 views • 37 slides

    REINFORCEMENT LEARNING

    REINFORCEMENT LEARNING

    REINFORCEMENT LEARNING. Group 11 Ashish Meena 04005006 Rohitashwa Bhotica 04005010 Hansraj Choudhary 04d05005 Piyush Kedia 04d05009 . Outline. Introduction Learning Models Motivation Reinforcement Learning Framework Q – Learning Algorithm

    981 views • 40 slides

    Reinforcement Learning

    Reinforcement Learning. Chapter 21 Vassilis Athitsos. Reinforcement Learning. In previous chapters: Learning from examples. Reinforcement learning: Learning what to do. Learning to fly (a helicopter). Learning to play a game. Learning to walk. Learning based on rewards.

    701 views • 26 slides

    Reinforcement Learning

    Reinforcement Learning. Mainly based on “Reinforcement Learning – An Introduction” by Richard Sutton and Andrew Barto Slides are mainly based on the course material provided by the same authors. http://www.cs.ualberta.ca/~sutton/book/the-book.html. Learning from Experience Plays a Role in ….

    2.1k views • 136 slides

    Reinforcement Learning

    Reinforcement Learning. So far …. Given an MDP model we know how to find optimal policies Value Iteration or Policy Iteration Later in class we will show how to find policies given just a simulator of an MDP But what if we don’t have any form of model Like when we were babies . . .

    446 views • 37 slides

    Reinforcement Learning

    Reinforcement Learning. Presented by: Bibhas Chakraborty and Lacey Gunter. What is Machine Learning?. A method to learn about some phenomenon from data , when there is little scientific theory (e.g., physical or biological laws) relative to the size of the feature space.

    1.19k views • 26 slides

    Reinforcement Learning

    AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning. Lee McCluskey, room 3/10 Email [email protected] http://scom.hud.ac.uk/scomtlm/cha2555/. Reinforcement Learning. Resources. Support Resources: Introduction for 5 minutes. http://www.youtube.com/watch?v=m2weFARriE8

    379 views • 16 slides

    Reinforcement Learning

    Reinforcement Learning. Introduction Presented by Alp Sardağ. Supervised vs Unsupervised Learning. Any Situation in which both the inputs and outputs of a component can be perceived is called Supervised Learning. Learning when there is no hint at all about correct outputs is called

    378 views • 28 slides

    Reinforcement Learning

    Reinforcement Learning. Yishay Mansour Tel-Aviv University. Outline . Goal of Reinforcement Learning Mathematical Model (MDP) Planning. Goal of Reinforcement Learning. Goal oriented learning through interaction Control of large scale stochastic environments with

    418 views • 37 slides

    Reinforcement Learning

    Presented by: Kyle Feuz. Reinforcement Learning. Outline. Motivation MDPs RL Model-Based Model-Free Q-Learning SARSA Challenges. Examples. Pac-Man Spider. MDPs. 4-tuple (State, Actions, Transitions, Rewards). Important Terms. Policy Reward Function Value Function Model.

    276 views • 25 slides

    Reinforcement Learning

    Reinforcement Learning. Slides from R.S. Sutton and A.G. Barto Reinforcement Learning: An Introduction http://www.cs.ualberta.ca/~sutton/book/the-book.html http://rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse.html. r. r. r. t +1. t +2. s. s. t +3. s. s. t +1. t +2. t +3.

    460 views • 37 slides

    Reinforcement Learning

    Reinforcement Learning. Mainly based on “Reinforcement Learning – An Introduction” by Richard Sutton and Andrew Barto Slides are based on the course material provided by the same authors. http://www.cs.ualberta.ca/~sutton/book/the-book.html. Learning from Experience Plays a Role in ….

    1.64k views • 134 slides

    Reinforcement Learning

    Reinforcement Learning. CPS 370 Ron Parr. Limitations of the MDP Framework. Assumes that transition probabilities are known How do we discover these? Envelope method assumed this problem away SDS had trouble collecting enough data How do we store them? Big problems have big models

    327 views • 27 slides

    Reinforcement Learning

    Reinforcement Learning. Problem: Find values for fixed policy  (policy evaluation) Model-based learning : Learn the model, solve for values Model-free learning : Solve for values directly (by sampling). The Story So Far: MDPs and RL. Techniques:. Things we know how to do:.

    710 views • 27 slides

    COMMENTS

    1. PDF Lecture 14: Reinforcement Learning

      Today: Reinforcement Learning 7 Problems involving an agent interacting with an environment, which provides numeric reward signals Goal: Learn how to take actions in order to maximize reward. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 14 - 8 May 23, 2017 Overview

    2. PDF Lecture 20-21: Introduction to Reinforcement Learning

      %PDF-1.4 % âãÏÓ 4 0 obj /Type /Catalog /Names /JavaScript 3 0 R >> /PageLabels /Nums [ 0 /S /D /St 1 >> ] >> /Outlines 2 0 R /Pages 1 0 R >> endobj 5 0 obj /Creator (þÿGoogle) >> endobj 6 0 obj /Type /Page /Parent 1 0 R /MediaBox [ 0 0 720 405 ] /Contents 7 0 R /Resources 8 0 R /Annots 10 0 R /Group /S /Transparency /CS /DeviceRGB >> >> endobj 7 0 obj /Filter /FlateDecode /Length 9 0 R ...

    3. PDF Reinforcement Learning

      Large Scale Reinforcement Learning 37 Adaptive dynamic programming (ASP) scalable to maybe 10,000 states - Backgammon has 1020 states - Chess has 1040 states It is not possible to visit all these states multiple times ⇒ Generalization of states needed Philipp Koehn Artificial Intelligence: Reinforcement Learning 16 April 2019

    4. PDF Reinforcement Learning Lecture 1: Introduction

      Two fundamental problems in reinforcement learning. Learning: The environment is initially unknown. The agent interacts with the environment. Planning: A model of the environment is given (or learnt) The agent plans in this model (without external interaction) a.k.a. reasoning, pondering, thought, search, planning.

    5. PDF The Path Forward: A Primer for Reinforcement Learning

      To begin our journey into the realm of reinforcement learning, we preface our manuscript with some necessary thoughts from Rich Sutton, one of the fathers of the field. Here is his Bitter Lesson March 13, 2019: The biggest lesson that can be read from 70 years of AI research is that

    6. PDF PowerPoint Presentation

      Common assumption #1: full observability. Generally assumed by value function fitting methods. Can be mitigated by adding recurrence. Common assumption #2: episodic learning. Often assumed by pure policy gradient methods. Assumed by some model-based RL methods. Common assumption #3: continuity or smoothness. Assumed by some continuous value ...

    7. Notes

      This course introduces principles, algorithms, and applications of machine learning from the point of view of modeling and prediction. It includes formulation of learning problems and concepts of representation, over-fitting, and generalization. These concepts are exercised in supervised learning and reinforcement learning, with applications to images and to temporal sequences.

    8. PDF Reinforcement Learning Tutorial

      Reinforcement Learning Tutorial. Dilip Arumugam. Stanford University. CS330: Deep Multi-Task & Meta Learning Walk away with a cursory understanding of the following concepts in RL: Markov Decision Processes Value Functions Planning Temporal-Di erence Methods.

    9. PDF Lecture 01: Introduction to Reinforcement Learning

      Prepare a concise, high-quality presentation to be given at the exam start (roughly 10 minutes). Analyze and evaluate your own results critically. Oliver Wallscheid RL Lecture 01 4. ... Reinforcement learning, fast and slow, in Trends in Cognitive Sciences, vol. 23, iss. 5, pp. 408-422, 2019 Oliver Wallscheid RL Lecture 01 16.

    10. PDF PowerPoint Presentation

      No single solution! Survey of various recent research papers. Forward transfer: train on one task, transfer to a new task. Transferring visual representations & domain adaptation. Domain adaptation in reinforcement learning. Randomization. Multi-task transfer: train on many tasks, transfer to a new task.

    11. PDF PowerPoint Presentation

      Reinforcement Learning of Vision-Based Robotic Manipulation Skills Kalashnikov, Varley, Chebotar, Swanson, Jonschkowski, Finn, Levine, Hausman. MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale. 2021. New hypothesis: could we learn these tasks without rewards using goal-conditioned RL? reuse the same exact data

    12. PPTX Berkeley AI (CS188: Artificial Intelligence)

      Learn about reinforcement learning from Berkeley AI's lecture slides, covering topics such as Q-learning, exploration and policy iteration.

    13. Lecture 16: Reinforcement Learning, Part 1

      Lecture 16: Reinforcement Learning, Part 1 Viewing videos requires an internet connection Dr. Johansson covers an overview of treatment policies and potential outcomes, an introduction to reinforcement learning, decision processes, reinforcement learning paradigms, and learning from off-policy data.

    14. Reinforcement Learning 101. Learn the essentials of Reinforcement…

      2. How to formulate a basic Reinforcement Learning problem? Some key terms that describe the basic elements of an RL problem are: Environment — Physical world in which the agent operates State — Current situation of the agent Reward — Feedback from the environment Policy — Method to map agent's state to actions Value — Future reward that an agent would receive by taking an action ...

    15. PDF PowerPoint Presentation

      Apply approximate optimality model from last time, but now learn the reward! Goals: Understand the inverse reinforcement learning problem definition. Understand how probabilistic models of behavior can be used to derive inverse reinforcement learning algorithms. Understand a few practical inverse reinforcement learning algorithms we can use.

    16. CS234: Reinforcement Learning Spring 2024

      Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. ... Poster Presentation: 5% ; Paper: 16%; 0.5% bonus for participating [answering lecture polls for 80% of the days we have lecture with polls. You may participate in ...

    17. Presentation Slides for Teaching from Sutton & Barto's Book

      Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto. Powerpoint slides for teaching each chapter of the book have been prepared and made available by Professor Barto. These are available as powerpoint files and as postscript files. The latter may be the most useful if you don't have all the right fonts installed.

    18. Sutton & Barto Book: Reinforcement Learning: An Introduction

      Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto Second Edition (see here for the first edition) MIT Press, Cambridge, MA, 2018. Buy from Amazon Errata and Notes Full Pdf Trimmed for viewing on computers (latest release April 26, 2022) Code

    19. Reinforcement Learning PowerPoint Templates

      Reinforcement Learning PowerPoint Templates. Spice up your presentations on reinforcement learning without breaking a sweat by getting your hands on these editable templates. These slides will give you a killer overview of reinforcement learning. Our templates showcase use cases of reinforcement learning across various industries.

    20. PDF PowerPoint Presentation

      Human-level control through deep reinforcement learning: Q-learning with convolutional networks for playing Atari. •Van Hasselt, Guez, Silver. (2015). Deep reinforcement learning with double Q-learning: a very effective trick to improve performance of deep Q-learning. •Lillicrap et al. (2016). Continuous control with deep reinforcement ...

    21. DS551/CS525

      Presentation: Resources: Tentative Schedule: Slides will be uploaded on Canvas before each lecture.-1. Week 1 (8/27 T): Topic: Overview of Reinforcement Learning and Class Logistics Readings: N/A-2. Week 2 (9/3 T): Topic: RL components, Markov Decision Process, Model-based Planning..

    22. PPT

      Temporal Difference Learning Idea: Use observed transitions to adjust values in observed states so that the comply with the constraint equation, using the following update rule: UΠ (s) UΠ (s) + α [ R (s) + γ UΠ (s') - UΠ (s) ] α is the learning rate; γ discount rate Temporal difference equation.