日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

ECE 498代寫、代做Python設計編程
ECE 498代寫、代做Python設計編程

時間:2024-11-15  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機打開當前頁
  • 上一篇:IEMS5731代做、代寫java設計編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化
    有限元分析 CAE仿真分析服務-企業/產品研發
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
  • 短信驗證碼 trae 豆包網頁版入口 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

      <em id="rw4ev"></em>

        <tr id="rw4ev"></tr>

        <nav id="rw4ev"></nav>
        <strike id="rw4ev"><pre id="rw4ev"></pre></strike>
        亚洲国产欧美久久| 国产精品久久久久久户外露出| 国产精品一区二区三区成人| 午夜视频在线观看一区二区三区| 国产日韩欧美高清免费| 嫩草国产精品入口| 夜夜夜精品看看| 欧美激情一区二区三区高清视频| 国产精品久久久久aaaa| 免费h精品视频在线播放| 欧美精品一区二区蜜臀亚洲| 欧美成人自拍视频| 欧美成人高清视频| 欧美区日韩区| 亚洲第一二三四五区| 裸体女人亚洲精品一区| 亚洲国产成人av好男人在线观看| 美日韩丰满少妇在线观看| 久久国产手机看片| 一本色道精品久久一区二区三区| 亚洲一区国产| 亚洲欧美激情视频| 久久婷婷麻豆| 国产精品欧美在线| 国产精品区免费视频| 一区二区三区 在线观看视频| 国产日本欧美一区二区| 国产人久久人人人人爽| 欧美日韩高清一区| 午夜久久资源| 亚洲黄页视频免费观看| 午夜精品美女自拍福到在线| 国产精品久久久久久久久久久久久| 国产精品久久久久久久久久尿| 在线日韩av片| 韩日精品在线| 欧美视频在线一区二区三区| 亚洲欧美另类中文字幕| 老牛嫩草一区二区三区日本| 久久精品国产视频| 国产精品成人一区二区三区吃奶| 日韩视频―中文字幕| 久久精品中文字幕一区二区三区| 快播亚洲色图| 国产精品区二区三区日本| 欧美日一区二区三区在线观看国产免| 日韩视频在线免费观看| 亚洲综合精品四区| 欧美精品一区二区三区久久久竹菊| 加勒比av一区二区| 欧美视频一区在线观看| 在线电影国产精品| 免费日韩成人| 亚洲精品久久久久中文字幕欢迎你| 国产视频欧美| 欧美sm视频| 伊人蜜桃色噜噜激情综合| 欧美四级剧情无删版影片| 国产精品理论片在线观看| 国产午夜亚洲精品羞羞网站| 性做久久久久久久久| 在线视频你懂得一区二区三区| 国产精品男gay被猛男狂揉视频| 在线观看中文字幕不卡| 一区二区三区不卡视频在线观看| 国产真实乱偷精品视频免| 蜜臀久久99精品久久久久久9| 亚洲精品小视频在线观看| 欧美一区二视频在线免费观看| 国产真实精品久久二三区| 亚洲欧美日韩专区| 国产精品你懂的在线欣赏| 亚洲破处大片| 久久青青草综合| 午夜性色一区二区三区免费视频| 欧美一区亚洲二区| 在线欧美小视频| 国产精品九九久久久久久久| 国产精品久久精品日日| 欧美日韩高清一区| 欧美国产日韩在线| 欧美激情一区二区久久久| 国产在线精品一区二区夜色| 欧美区在线观看| 久久精品国产精品亚洲| 最新日韩在线视频| 国产精品极品美女粉嫩高清在线| 午夜视频在线观看一区| 久久久精品国产一区二区三区| 欧美精品日韩三级| 性欧美xxxx大乳国产app| 亚洲国产精彩中文乱码av在线播放| 亚洲性视频网址| 一区二区在线观看视频| 国产亚洲二区| 国产精品永久免费在线| 国产精品视频久久一区| 你懂的网址国产 欧美| 亚洲一区三区电影在线观看| 午夜精品一区二区三区电影天堂| 亚洲电影中文字幕| 国产精品qvod| 亚洲六月丁香色婷婷综合久久| 亚洲激情在线激情| 欧美日韩国产免费观看| 老司机久久99久久精品播放免费| 99精品99| 欧美日韩一区二区三区高清| 麻豆精品国产91久久久久久| 亚洲欧洲日本国产| 亚洲高清视频一区二区| 99re国产精品| 国产精品成人国产乱一区| 国内精品视频在线观看| 一本久道久久久| 国产一区二区在线免费观看| 狠狠噜噜久久| 国产午夜精品全部视频在线播放| 国产精品色婷婷| 国产精品激情偷乱一区二区∴| 国产老女人精品毛片久久| 欧美日韩国产一区二区| 国产婷婷97碰碰久久人人蜜臀| 亚洲激情在线观看视频免费| 国产精品一区二区三区久久久| 在线不卡欧美| 国产欧美三级| 欧美丰满高潮xxxx喷水动漫| 亚洲国产精品va在看黑人| 欧美成va人片在线观看| 欧美一二区视频| 国产精品久久久一区二区三区| 国产欧美日韩一区二区三区| 亚洲私人影院在线观看| 亚洲主播在线观看| 国产精品乱码一区二区三区| 性一交一乱一区二区洋洋av| 久久精品国产2020观看福利| 亚洲精品免费网站| 亚洲免费电影在线| 久久久久国产一区二区三区| 美女爽到呻吟久久久久| 国产午夜精品视频免费不卡69堂| 揄拍成人国产精品视频| 久久国产精品72免费观看| 亚洲女与黑人做爰| 国产亚洲欧美日韩日本| 黄色精品在线看| 欧美成人精精品一区二区频| 国产精品日韩欧美综合| 最新亚洲视频| 亚洲福利视频一区二区| 欧美中文在线观看| 久久九九久精品国产免费直播| 亚洲精品综合在线| 99精品国产在热久久| 久久爱91午夜羞羞| 欧美日韩国产成人在线免费| 欧美网站大全在线观看| 午夜欧美大尺度福利影院在线看| 欧美日韩国语| 欧美成人午夜激情在线| 国产精品久久网站| 日韩一级裸体免费视频|