日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

代做CS 7642 Reinforcement Learning and Decision

時間:2024-04-21  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



   CS 7642: Reinforcement Learning and Decision Making Project #3 Overcooked
 1 Problem 1.1 Description
For the final project of this course, you have to bring together everything you have learned thus far and solve the multi-agent Overcooked environment (modeled after the popular video game). In this environment, you have control over 2 chefs in a restaurant kitchen who have to collaborate to cook onion soups. To cook a soup, the agents need to put 3 onions into a cooking pot, initiate cooking, wait for the soup to cook, put the soup into a dish, and serve the dish at a serving area. This project serves as a capstone to the course and as such we expect much of the project to be open-ended and self-directed. Your primary goal is to maximize the number of soups delivered within an episode on a variety of layouts ranging from fairly easy to extremely difficult. In your quest to solve these layouts you may discover auxiliary goals or metrics that are worth analyzing.
Our expectation is that you have learned what is significant to include in this type of report from the previous projects and the material we have covered so far. It is thus up to you to define:
• The direction of your project including which aspect(s) you aim to focus upon.
• How you specify and measure such aspects.
• How to train your agents.
• How to structure your report and what graphs to include (in addition to the mandatory graphs discussed later).
Your focus should be on demonstrating your understanding of the algorithm(s)/solution(s), clarifying the ratio- nale behind your experiments, and analyzing their results. Your main goal is to develop an algorithm to solve the environment but you can also use everything else studied in the course such as reward and policy shaping. The environment provides a reward shaping data structure that you are free to use. You may also design your own reward shaping in place of, or in addition to, this default setup. However, all algorithms and solutions used to solve the environment should be your own. We encourage you to start off this project with your Project 2 solution and see how far that model takes you. This will provide context for why multi-agent methods may be necessary for this environment. It will also help to ease your transition into this environment by utilizing an algorithm you’ve already gotten to work.
Figure 1: Visualization of the Overcooked environment. Carroll et al. 2019
1.2 Environment and Task
In this project, you will be training a team of 2 agents to cook onion soups in a kitchen. The objective is always to deliver as many soups as possible within a 400-timestep episode. Each soup takes 20 timesteps to cook and
 1

– Overcooked 2
delivering a soup successfully yields a +20 reward. Cooking a soup with less than 3 onions, dropping a soup on the ground, or serving the soup on the counter (instead of the designated serving area) yields no reward but hinders progress as agents lose precious time (and starve customers). Episodes are truncated to a 400 step horizon with no termination conditions. You are not permitted to increase the 400 step horizon. You are provided with 5 layouts of varying difficulty - [cramped room, asymmetric advantages, coordination ring, forced coordination, counter circuit 0 1order] as shown in Figure 2 1. Your task is to achieve a mean soup delivery count of ≥ 7 per episode across all layouts using a single approach. This means a single algorithm and a single reward-shaping function (if you utilize reward shaping). This also means a single set of hyperparameters, The idea is to build an agent that can solve any layout that is thrown at it, and not just these 5. Having a constant set of parameters also makes reproducibility much easier (something we gained an appreciation for in Project 1). Note that some layouts can be solved by a single agent algorithm and don’t require any collaboration. Other layouts benefit significantly from collaboration and some may only be solvable via collaboration. This means that a successful approach to solving all 5 layouts will likely require an explicit multi-agent approach. We also expect you to develop your approaches and analyze the results by explicitly looking at multi-agent metrics (see Section 1.8).
Figure 2: The 5 layouts you are tasked to solve. From left to right they are named [cramped room, asymmetric advantages, coordination ring, forced coordination, counter circuit 0 1order]. Car- roll et al. 2019
1.3 State Space
This is a fully-observable MDP and both agents have access to the full observation. Therefore, the state and observation spaces are equivalent. By default, the observations are provided as a 96-element vector, customized for each agent. The encoding for player i ∈ {0, 1} contains a player-centric featurized view for the ith player, and is as follows:
[player i features, other player features, player i dist to other player, player i position]
The first component, player i features has length 46 and is detailed below. Note that if you add all the feature lengths in the specification below, you will get 36 instead of the expected 46. This is because the five features related to the pot (having a combined length of 10) occur twice, once for each pot, and are concatenated together. Note also that none of our layouts contain tomatoes, so the features corresponding to tomatoes will always be 0. Finally, layouts containing only one cooking pot will have the second pot’s features zeroed out as well.
• p i orientation: one-hot-encoding of direction currently facing (length 4)
• p i obj: one-hot-encoding of object currently being held ([onion, soup, dish, tomato]) (all 0s if no object
held) (length 4)
• p i closest onion|tomato|dish|soup: (dx, dy) where dx = x dist to item, dy = y dist to item. (0, 0) if item is currently held (length 8)
• p i closest soup n onions|tomatoes: int value for number of this ingredient in closest soup (length 2)
• p i closest serving area|empty counter: (dx, dy) where dx = x dist to item, dy = y dist to item. (length
4)
1The overcooked environment has dozens of layouts but for this project we will only be focusing on these 5.
  
– Overcooked 3
• p i closest pot j exists: {0, 1} depending on whether jth closest pot is found. If 0, then all other pot features are 0. Note: can be 0 even if there are more than j pots on layout, if the pot is not reachable by player i (length 1)
• p i closest pot j is empty|is full|is cooking|is ready: {0, 1} depending on boolean value for jth closest pot (length 4)
• p i closest pot j num onions|num tomatoes: int value for number of this ingredient in jth closest pot (length 2)
• p i closest pot j cook time: int value for seconds remaining on soup. 0 if no soup is cooking (length 1)
• p i closest pot j: (dx, dy) to jth closest pot from player i location (length 2)
• p i wall j: {0, 1} boolean value of whether player i has a wall immediately in direction j (length 4)
The remaining components of the observation vector are as follows:
other player features (length 46): ordered concatenation of player j features for j ̸= i player i dist to other player (length 2): [player j.pos - player i.pos for j ̸= i]
player i position (length 2)
1.4 Action Space
The action space is discrete with six possible actions: up, down, left, right, stay, and ”interact,” which is a contextual action determined by the tile the player is facing (e.g. placing an onion when facing a counter). Each layout has one or more onion dispensers and dish dispensers, which provide an unlimited supply of onions and dishes respectively.
1.5 Installation Notes
The environment is officially supported on Python 3.7 and is installed via pip install overcooked-ai. We recommend you run in Anaconda. We require the use of PyTorch if using deep learning methods. You absolutely do not need a GPU to solve any of the layouts in less than 10 hours (in fact, GPUs typically slow RL algorithms down). To help you with getting started, we are providing you with a Jupyter notebook. You may create a copy of this notebook in order to run the starting code. This notebook demonstrates installing, building, interacting with, and visualizing the environment. You are not required to use this notebook in your project, but we encourage you to use it as a companion to this document to better understand the environment.
1.6 IMPORTANT: Reward Shaping Addendum
If you plan on using reward shaping, take a look at how the default shaped rewards are swapped by the agent index in the provided notebook. Upon episode reset, agents are assigned randomly to one of the 2 starting positions. This assignment is only reflected in the official observation that is returned to you by the environment’s step method. Any state variable you obtain from the Overcooked environment that is not in this observation variable (including anything in the info dictionary or the base environment) needs to be similarly swapped. Failure to do this means you will be assigning credit to the wrong agent roughly half the time, crippling your algorithm.
For more details on installation and operation, refer to the GitHub repository - https://github.com/ HumanCompatibleAI/overcooked_ai
1.7 Strategy Recommendations
You are free to pursue any multi-agent RL strategies in your soup-cooking quest. For instance, you may pursue a novel reward-shaping technique, however, make sure that the method chosen is relevant to multi-agent RL problems. We strongly recommend that you (1) start with your Project 2 solution adapted to this problem and (2) start with the cramped room and asymmetric advantages layouts. Below are further examples of strategies worth pursuing:

– Overcooked 4
• using reward shaping techniques for improving multi-agent considerations such as collaboration and credit
assignment;
• asynchronous methods Mnih et al. 2016;
• centralizing training and decentralizing execution (Lowe et al. 2017; J. N. Foerster et al. 2017);
• value factorisation Rashid, Samvelyan, De Witt, et al. 2020;
• employing curriculum learning (some single-agent ideas in this dissertation may be interesting and easy to extend to the multi-agent case e.g., Narvekar 2017).
• adding communication protocols (J. Foerster et al. 2016);
• improving multi-agent credit assignment (J. N. Foerster et al. 2017; Zhou et al. 2020);
• improving multi-agent exploration (Iqbal and Sha 2019; Wang et al. 2019)
• finding better inductive biases (i.e., choosing the function space for policy/value function approximation) to handle the exponential complexity of multi-agent learning, e.g., graph neural networks (Battaglia et al. 2018; Naderializadeh et al. 2020).
1.8 Procedure
This problem is more sophisticated than anything you have seen so far in this course. Make sure you reserve enough time to consider what an appropriate approach might involve and, of course, enough time to build and train it.
• Clearly define the direction of your project and which aspect(s) you aim to improve upon over your Project 2 baseline, assuming that that baseline was unable to solve all of the layouts. For example, do you want to improve collaboration among your agents?
– This includes why you think your algorithm/procedure will accomplish this and whether or not your results demonstrate success.
• Implement a solution that produces such improvements.
– Use any algorithms/strategy as inspiration for your solution.
– The focus of this project is to try new algorithms/solutions, rather than to simply im- prove hyper-parameters of the algorithms already implemented. Further, avoid search- ing for random seeds that happen to work the best as this is inconsequential analysis. Remember that the algorithm/reward-shaping/hyperparameters must be fixed across all 5 layouts.
– Justify the choice of that solution and explain why you expect it to produce these improvements.
– Even if your solution does not solve all of the layouts, you still have the ability to write
a solid paper.
– Upload/maintain your code in your private repo at https://github.gatech.edu/gt-omscs-rldm.
• Describe your experiments and create graphs that demonstrate the success/failure of your solution.
– You must provide one graph demonstrating the number of soups made across all five layouts during training. You can combine all five layouts’ plots onto one graph if you wish. Displaying a simple moving average for each layout’s training run is suggested to help with clarity.
– You must provide one graph demonstrating performance of your trained agent on each layout over at least 100 consecutive episodes. Again, you can combine all five layouts’ plots into one graph. If all five of these graphs are flat lines (a possible consequence of using a deterministic algorithm on a deterministic environment), then a bar graph is ok.
– Additionally, you must provide at least two graphs using metrics you decided on that are significant for your hypothesis/goal.
– Analyze your results and explain the reasons for the success/failure of your solution.

– Overcooked 5
– Since graphs are largely decided by you, they should have clear axis, labels, and captions. You will
lose points for graphs that do not have any description or label of the information being displayed.
– Example metrics you might consider are number of dish pickups, dropped dishes, incorrect deliveries, or picked up onions. These example metrics and more are built-in to the environment and are accessible via the info variable at the end of an episode. In your report you should clearly motivate why you are interested in a particular metric. See the provided notebook.
• We’ve created a private Georgia Tech GitHub repository for your code. Push your code to the personal repository found here: https://github.gatech.edu/gt-omscs-rldm.
• The quality of the code is not graded. You do not have to spend countless hours adding comments, etc. However, the TAs will examine code during grading.
• Make sure to include a README.md file for your repository that we can use to run your code.
– Include thorough and detailed instructions on how to run your source code in the README.md.
– If you work in a notebook, like Jupyter, include an export of your code in a .py file along with your notebook.
– The README.md file should be placed in the project 3 folder in your repository.
• You will be penalized by 25 points if you:
– Do not have any code or do not submit your full code to the GitHub repository; or – Do not include the git hash for your last commit in your paper.
• Write a paper describing your agents and the experiments you ran.
– Include the hash for your last commit to the GitHub repository in the header on the first page of
your paper.
– Make sure your graphs are legible and you cite sources properly. While it is not required, we recommend you use a conference paper format. For example: https://www.ieee.org/conferences/ publishing/templates.html.
– 5 pages maximum—really, you will lose points for longer papers.
– Explain your algorithm(s).
– Explain your training implementation and experiments.
– An ablation study would be a interesting way to find out the different components of the algorithm that contribute to your metric. (See J. N. Foerster et al. 2017.)
– Graphs highlighting your implementations successes and/or failures.
– Explanation of algorithms used: what worked best? what didn’t work? what could have worked
better?
– Justify your choices.
∗ Unlike Project 1, there are multiple ways of solving this problem and you have a lot of discretion over the general approach you take as well as experimental design decisions. Explain to the reader why, from amongst the multiple alternatives, you chose the ones you did.
∗ Your focus should be on justifying the algorithm/techniques you implemented.
– Explanation of pitfalls and problems you encountered.
– What would you try if you had more time?
– Save this paper in PDF format.
– Submit to Canvas!
1.9 Resources
1.9.1 Lectures
• Lesson 11A: Game Theory
• Lesson 11B: Game Theory Reloaded
• Lesson 11C: Game Theory Revolutions

– Overcooked 6
1.9.2 Readings
• J. N. Foerster et al. 2017
• Lowe et al. 2017
• Rashid, Samvelyan, Witt, et al. 2018
1.9.3 Talks
• Factored Value Functions for Cooperative Multi-Agent Reinforcement Learning • Counterfactual Multi-Agent Policy Gradients
• Learning to Communicate with Deep Multi-Agent Reinforcement Learning
• Automatic Curricula in Deep Multi-Agent Reinforcement Learning
1.10 Submission Details
The due date is indicated on the Canvas page for this assignment. Make sure you have set your timezone in Canvas to ensure the deadline is accurate.
Due Date: Indicated as “Due” on Canvas
Late Due Date [20 point penalty per day]: Indicated as “Until” on Canvas
The submission consists of:
• Your written report in PDF format (Make sure to include the git hash of your last commit.) • Your source code
To complete the assignment, submit your written report to Project 3 under your Assignments on Canvas (https://gatech.instructure.com) and submit your source code to your personal reposi- tory on Georgia Tech’s private GitHub
You may submit the assignment as many times as you wish up to the due date, but, we will only consider your last submission for grading purposes. Late submissions will receive a cumulative 20 point penalty per day. That is, any projects submitted after midnight AOE on the due date will receive a 20 point penalty. Any projects submitted after midnight AOE the following day will receive another 20 point penalty (a 40 point penalty in total) and so on. No project will receive a score less than a zero no matter what the penalty. Any projects more than 4 days late and any missing submissions will receive a 0.
Please be aware, if Canvas marks your assignment as late, you will be penalized. This means one second late is treated the same as three hours late, and will receive the same penalty as described in the breakdown above. Additionally, if you resubmit your project and your last submission is late, you will incur the penalty corresponding to the time of your last submission. Submit early and often.
Finally, if you have received an exception from the Dean of Students for a personal or medical emergency we will consider accepting your project up to 7 days after the initial due date with no penalty. Students requiring more time should consider taking an incomplete for this semester as we will not be able to grade their project.
1.11 Grading and Regrading
When your assignments, projects, and exams are graded, you will receive feedback explaining your successes and errors in some level of detail. This feedback is for your benefit, both on this assignment and for future assignments. It is considered a part of your learning goals to internalize this feedback. This is one of many learning goals for this course, such as: understanding game theory, random variables, and noise.
If you are convinced that your grade is in error in light of the feedback, you may request a regrade within a week of the grade and feedback being returned to you. A regrade request is only valid if it includes an explanation of where the grader made an error. Create a private Ed Discussion post titled “[Request] Regrade Project 3”. In the Details add sufficient explanation as to why you think the grader made a mistake. Be concrete and specific. We will not consider requests that do not follow these directions.

– Overcooked 7 1.12 Words of Encouragement
We understand this is a daunting project with many possible design directions to consider. As Graduate Students in Computer Science, projects that allow you to challenge and expand your skills in a practical and low-stakes manner are crucial. These projects are ideal for testing the knowledge you have garnered throughout the course and applying yourself to a difficult problem commonly faced when applying reinforcement learning in industry. After completing the course, a project like this can be valuable to highlight during interviews, to demonstrated your newfound knowledge to current employers, or to add a (new) section on your resume. Historically, many students have reported back the positive interactions encountered when discussing their projects, sometimes leading to job offers or promotions. However, please remember not to publicly post your report or code. The project is a good talking point and you would be within the bounds of the GT Honor Code if you were to share it privately with a potential employer (if you so desire), however making any part of this project publicly available would be a violation of the GT Honor Code.
We encourage you to start early and dive head-first into the project to try as many options as possible. We strongly believe the more successes and failures you experience, the greater your growth and learning will be.
The teaching staff is dedicated to helping as much as possible. We are excited to see how you will approach the problem and have many resources available to help. Over the next several Office Hours, we will be discussing various approaches in detail, as well as dive deeper into approaches on Ed Discussions. We are here to help you and want to see you succeed! With all that said:
Good luck and happy coding!

請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp

掃一掃在手機打開當前頁
  • 上一篇:代寫EMATM0050 DSMP MSc in Data Science
  • 下一篇:COMP284 代做、Java 語言編程代寫
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化
    有限元分析 CAE仿真分析服務-企業/產品研發
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
  • 短信驗證碼 trae 豆包網頁版入口 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

      <em id="rw4ev"></em>

        <tr id="rw4ev"></tr>

        <nav id="rw4ev"></nav>
        <strike id="rw4ev"><pre id="rw4ev"></pre></strike>
        欧美天堂在线观看| 亚洲国产一区二区a毛片| 国产欧美一区二区精品仙草咪| 久久国产精品99国产精| 久久蜜桃香蕉精品一区二区三区| 欧美日韩一区高清| 国产情侣一区| 亚洲精品久久久久久久久久久久久| 国产精品自在在线| 在线一区二区三区做爰视频网站| 欧美精品成人一区二区在线观看| 欧美日韩一区三区| 一区二区三区视频在线播放| 欧美精品三区| 久久人人爽爽爽人久久久| 免费观看国产成人| 欧美综合二区| 国产精品免费aⅴ片在线观看| 久久综合网色—综合色88| 欧美午夜精品理论片a级大开眼界| 午夜精品亚洲| 亚洲欧美另类国产| 国产精品一区二区在线观看| 久久久欧美一区二区| 亚洲少妇最新在线视频| 免费在线播放第一区高清av| 一本一本久久| 在线免费日韩片| 在线成人av| 一区二区三区高清视频在线观看| 久久综合伊人77777| 亚洲欧美中日韩| 亚洲精品国偷自产在线99热| 亚洲欧美国产精品va在线观看| 国产一区日韩一区| 欧美在线视频一区二区三区| 欧美刺激性大交免费视频| 一区二区三区**美女毛片| 欧美亚州一区二区三区| 亚洲午夜伦理| 欧美日韩中文字幕综合视频| 欧美午夜片在线观看| 欧美一级在线亚洲天堂| 国产精品乱码人人做人人爱| 亚洲一区在线看| 国内精品伊人久久久久av影院| 国产日产欧美精品| 国产麻豆日韩欧美久久| 在线看国产一区| 一道本一区二区| 一区二区三区视频免费在线观看| 欧美日韩国产精品专区| 国产精品午夜久久| 99精品视频免费在线观看| 国产精品一区二区三区久久| 国产精品女主播在线观看| 亚洲一区二区免费视频| 黄色亚洲精品| 国产精品一香蕉国产线看观看| 欧美+日本+国产+在线a∨观看| av成人免费在线观看| 免费亚洲一区| 乱人伦精品视频在线观看| 亚洲伊人久久综合| 好吊一区二区三区| 亚洲欧美日韩一区二区| 欧美一区二区精美| 国产精品久久久免费| 亚洲国产成人porn| 欧美大胆a视频| 亚洲一区二区三区欧美| 久久精精品视频| 国产精品伦子伦免费视频| 亚洲午夜免费福利视频| 国产综合欧美| 国产一区二区精品久久91| 激情六月婷婷久久| 亚洲一区3d动漫同人无遮挡| 亚洲一区二区四区| 欧美人成网站| 欧美人成网站| 欧美mv日韩mv国产网站| 欧美三区在线观看| 国产精品超碰97尤物18| 久久女同互慰一区二区三区| 欧美日韩国产黄| 欧美激情综合| 亚洲激情自拍| 久久免费的精品国产v∧| 欧美一区二区观看视频| 亚洲综合大片69999| 亚洲性视频h| 欧美—级高清免费播放| 国产精品专区第二| 日韩系列欧美系列| 黄色小说综合网站| 一区二区三区在线免费视频| 久久综合电影一区| 国产农村妇女毛片精品久久麻豆| 久久er精品视频| 久久精品一二三| 亚洲免费成人av电影| 久久免费国产精品1| 欧美视频精品在线| 欧美精品久久天天躁| 国产亚洲一区在线播放| 亚洲国产成人精品久久久国产成人一区| 国产人成一区二区三区影院| 亚洲欧美日韩精品久久奇米色影视| 在线播放豆国产99亚洲| 在线成人激情黄色| 亚洲国产高清一区| 午夜精品一区二区三区电影天堂| 国产香蕉97碰碰久久人人| 亚洲婷婷国产精品电影人久久| 国产精品99久久久久久久女警| 亚洲高清资源综合久久精品| 欧美高清成人| 国产精品国产成人国产三级| 久久久国产精彩视频美女艺术照福利| 国模大胆一区二区三区| 亚洲自拍都市欧美小说| 一区二区三区不卡视频在线观看| 欧美日韩午夜剧场| 久久噜噜亚洲综合| 亚洲欧美日韩在线高清直播| 欧美成人免费在线观看| 欧美女主播在线| 欧美高清成人| 国产欧美日韩视频在线观看| 亚洲欧美自拍偷拍| **网站欧美大片在线观看| 久久九九国产精品| 国产精品久久77777| 欧美夜福利tv在线| 中国成人在线视频| 国产精品videossex久久发布| 国产一区二区三区四区三区四| 久久大香伊蕉在人线观看热2| 亚洲人体1000| 久久综合久久综合这里只有精品| 久久久噜久噜久久综合| 国产性猛交xxxx免费看久久| 亚洲美女尤物影院| 亚洲激情欧美| 亚洲一区二区网站| 亚洲视频在线观看| 最近中文字幕mv在线一区二区三区四区| 尤物九九久久国产精品的分类| 欧美偷拍一区二区| 欧美日韩在线播放| 欧美午夜精品理论片a级大开眼界| 日韩视频免费观看高清完整版| 韩国精品久久久999| 国产精品久久久久免费a∨大胸| 国产伪娘ts一区| 一区二区三区**美女毛片| 久久高清国产| 国产欧美日韩精品专区| 亚洲欧美国产日韩天堂区| 欧美午夜精品理论片a级大开眼界| 一本一本久久a久久精品综合妖精| 一区二区三区成人| 91久久国产精品91久久性色|