日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

健康合肥汽車體育文旅企業動態企業推廣網站推廣外鏈推廣

COMP9414代做、代寫Python程序設計

時間：2024-07-21 來源：合肥網hfw.cc 作者：hfw.cc 我要糾錯

COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 24 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:
1https://www.gymlibrary.dev/environments/toy text/taxi/
1
env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 14 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.

請加QQ：99515681 郵箱：99515681@qq.com WX：codinghelp

掃一掃在手機打開當前頁

上一篇:COMP9021代做、代寫python設計程序

下一篇:COMP6008代做、代寫C/C++，Java程序語言

注：此文是出于傳遞更多信息之目的。所轉載的內容，其版權均由原作者和資料提供方所擁有！若侵犯了您的合法權益，請聯系我們，將及時更正、刪除，謝謝。

無相關信息

合肥生活資訊

·合肥汽車客運網上售票

·合肥汽車客運

·合肥校外培訓機構“白名單”

·合肥市人民政府征兵辦公室電話

·合肥市中小學教師招聘考試網

·合肥市醫療保險管理中心電話查詢（合肥市醫保

·2023合肥市住房公積金查詢指南

·合肥市住房租賃交易服務平臺（官方網站）

·合肥市消防救援支隊聯系電話

·合肥露營地推薦給你！合肥有哪些露營地？

·2023年合肥具備學歷教育辦學資質的中等職業學

·合肥淮河路步行街

·廬江縣各單位常用電話號碼

·合肥市廬江縣湯池鎮百花村

·安徽省美術館

·安徽創新館 - 安徽科技大市場

·安徽省2023年普通高等學校體育專業課統一考試

·安徽肥東管灣國家濕地公園

·安徽廬陽董鋪國家濕地公園

·肥東大劇院

·廬陽區文化館

·安徽這70個村落擬列入中國傳統村落名錄

·合肥市非機動車安全管理條例，非機動車這些行

·合肥信易貸平臺，為中小微企業融資

·合肥市公管局

·安徽省征地信息公開平臺

·安徽省教育招生考試院，安徽高招咨詢熱線開通

·合肥最新義務教育學區劃分

·成績錄取查詢

·合肥市區2022年高考各分考區考點安排

·合肥交警民意熱線開通

·安徽學習技能可獲補貼

·合肥市各縣區救助站聯系電話地址

·合肥市婚姻登記機構電話地址

·合肥城鄉居民最低生活保障標準和特困人員救助

·合肥熱電，合肥供暖

·合肥24小時核酸檢測服務機構名單，合肥核酸檢

·合肥城鄉居民基本養老保險個人參保信息查詢

·2022年合肥市區中考報名方案發布

·2022屆安徽畢業生求職創業補貼1500元發放申請

·合肥市人社部門聯系電話

·合肥市生育相關服務指南（2021年）

·合肥市公共就業人才服務

·合肥市2021年義務教育招生入學政策

·合肥市2021年中小學幼兒園暑假安排

·合肥教育局各部咨詢電話

·合肥最新展會計劃

·合肥市公共就業人才服務管理中心

·合肥市醫療保障局

·合肥市2021年中小學幼兒園寒假安排

·安徽省政府定價的經營服務性收費目錄清單

·合肥市“互聯網+不動產登記”一體化平臺

·四種合肥通卡要年審

·2020合肥城鄉居民養老保險待遇與繳費標準

·合肥市住房保障和房產管理局

·合肥市殯儀館電話

·合肥招生考試網

·合肥辦理的社�？I務指南

·合肥市社會保障卡業務經辦窗口地址（人社部門

·合肥市最低工資標準2019

合肥圖文信息

釘釘簽到打卡位置修改神器，2026怎么修改定位在范圍內 — 釘釘簽到打卡位置修改神器，2026怎么修改定

2025年10月份更新拼多多改銷助手小象助手多多出評軟件 — 2025年10月份更新拼多多改銷助手小象助手多

有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化 — 有限元分析 CAE仿真分析服務-企業/產品研發

急尋熱仿真分析？代做熱仿真服務+熱設計優化

出評開團工具

挖掘機濾芯提升發動機性能

海信羅馬假日洗衣機亮相AWE 復古美學與現代科技完美結合 — 海信羅馬假日洗衣機亮相AWE 復古美學與現代

合肥機場巴士4號線

推薦信息

欄目更新

熱點信息

·代做CS2810、代寫Python/Java程序

·SEHH2042代做、c/c++程序設計代寫

·SEHH2042代做、代寫c++，Java編程

·越南歷任國家主席有哪些（越南現任主席是誰）

·COSC2276代做、C/C++語言程序代寫

·COMP3009J代做、代寫Python程序設計

·代寫CS3026、代做Virtual Disk

·ISOM3028代做、Python/c++編程語言代寫

·代寫IMSE2113、Java，c/c++編程設計代做

·COMP2011代寫、C++編程設計代做

短信驗證碼豆包網頁版入口目錄網排行網

關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
ICP備06013414號-3 公安備 42010502001045

日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

<em id="rw4ev"></em>

<tr id="rw4ev"></tr>

<nav id="rw4ev"></nav>

<strike id="rw4ev"><pre id="rw4ev"></pre></strike>

欧美性大战久久久久久久| 亚洲欧美综合另类中字| 性做久久久久久| 亚洲一区bb| 国产真实乱子伦精品视频| 国产日韩欧美高清| 欧美人妖另类| 99re6这里只有精品视频在线观看| 欧美日本乱大交xxxxx| 欧美成人第一页| 亚洲国产成人精品久久久国产成人一区| 欧美一级播放| 国产午夜精品在线观看| 亚洲人成77777在线观看网| 欧美中文字幕在线视频| 欧美一激情一区二区三区| 久久日韩粉嫩一区二区三区| 国产一区二区三区丝袜| 国产婷婷精品| 国产欧美一区二区精品婷婷| 国产人成一区二区三区影院| 亚洲伊人观看| 久久精品主播| 欧美日韩亚洲天堂| 国产精品伦理| 国产中文一区| 亚洲精品一区在线观看| 男人的天堂成人在线| 欧美华人在线视频| 欧美在线一二三四区| 欧美大胆成人| 久久人91精品久久久久久不卡| 欧美高清视频一区二区| 亚洲精品影视在线观看| 国产精品美女久久久久久免费| 99精品欧美一区二区三区综合在线| 亚洲精品美女在线观看播放| 国产日韩在线播放| 久久福利一区| 亚洲精品午夜| 亚洲精品国产精品乱码不99| 欧美三日本三级少妇三2023| 亚洲另类视频| 久久精品中文字幕一区| 国产精品日韩欧美一区二区| 午夜精品一区二区三区四区| 久久成人国产| 老司机精品导航| 狠狠色丁香婷婷综合久久片| 欧美日韩国产免费观看| 麻豆精品视频| 国产精品一二一区| 亚洲一区二区三区成人在线视频精品| 久久福利影视| 亚洲欧美日韩另类精品一区二区三区| 麻豆9191精品国产| 久久久人成影片一区二区三区| 亚洲福利一区| 国内精品国产成人| 欧美日韩一区在线观看| 亚洲一区二区在| 欧美猛交免费看| 欧美一区1区三区3区公司| 久久久久国产精品一区二区| 一区二区三区国产在线观看| 久久夜精品va视频免费观看| 亚洲色诱最新| 亚洲乱码国产乱码精品精98午夜| 一区二区三区精品视频在线观看| 看片网站欧美日韩| 激情小说另类小说亚洲欧美| 亚洲人成绝费网站色www| 国产视频丨精品|在线观看| 国产偷久久久精品专区| 亚洲高清久久网| 欧美国产激情| 亚洲一区日韩在线| 亚洲国产一区二区视频| 一区二区三区久久久| 欧美在线免费播放| 欧美精品在欧美一区二区少妇| 亚洲老板91色精品久久| 激情欧美一区二区三区在线观看| 欧美体内she精视频在线观看| 亚洲精品网站在线播放gif| 国产精品久久久久毛片软件| 亚洲精品影院| 欧美国产视频在线观看| 国产精品视频男人的天堂| 国产一区二区三区电影在线观看| 日韩视频精品在线观看| 国产精品视频网站| 亚洲日本一区二区三区| 欧美日韩免费观看一区| 麻豆9191精品国产| 狠狠色狠色综合曰曰| 欧美性色aⅴ视频一区日韩精品| 午夜精彩视频在线观看不卡| 亚洲综合大片69999| 国产日韩高清一区二区三区在线| 欧美理论在线播放| 国产亚洲欧美日韩一区二区| 欧美午夜a级限制福利片| 99在线热播精品免费99热| 国产伦精品一区二区三区视频孕妇| 国产精品久久久久久久久果冻传媒| 一本久久综合| 这里只有精品视频在线| 亚洲国产欧美日韩另类综合| 国产精品视频最多的网站| 国产一区二区在线免费观看| 久久久精品久久久久| 亚洲国产成人在线视频| 狠狠色狠狠色综合日日小说| 黄色日韩网站视频| 美女日韩在线中文字幕| 亚洲精品一区二区三区蜜桃久| 午夜精品久久久久久99热软件| 午夜欧美精品| 一色屋精品视频在线看| 午夜精品视频在线观看一区二区| 亚洲欧美日韩在线观看a三区| 国产精品国产亚洲精品看不卡15| 欧美日本免费一区二区三区| 亚洲午夜精品网| 久久女同互慰一区二区三区| 欧美日韩一区在线观看视频| 国产精品欧美一区喷水| 一区二区冒白浆视频| 中文av一区特黄| 久久国产免费| 欧美成人精品一区二区| 一区二区三区免费网站| 激情久久一区| 久久久久久成人| 欧美日韩亚洲综合| 国产一区二区精品在线观看| 欧美体内she精视频| 亚洲福利视频二区| 欧美在线观看视频一区二区三区| 一区二区三区四区国产精品| 欧美日韩精品是欧美日韩精品| 亚洲日本电影在线| 欧美精品免费在线| 欧美日韩在线直播| 亚洲丁香婷深爱综合| 欧美电影免费观看高清完整版| 国产精品日韩在线观看| 激情五月***国产精品| 国精品一区二区| 久久最新视频| 久久伊人精品天天| 国产欧美日韩精品a在线观看| 亚洲国产欧美在线| 红桃视频一区| 亚洲私人影院在线观看| 狠狠干狠狠久久| 亚洲欧美日韩国产综合| 一本色道久久综合狠狠躁篇怎么玩| 亚洲高清不卡在线| 亚洲精品一区二区三区婷婷月| 久久本道综合色狠狠五月| 亚洲欧美日韩国产中文| 午夜精品一区二区三区四区|