<em id="rw4ev"></em>

      <tr id="rw4ev"></tr>

      <nav id="rw4ev"></nav>
      <strike id="rw4ev"><pre id="rw4ev"></pre></strike>
      合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

      代做CAP 4611、代寫C/C++,Java程序
      代做CAP 4611、代寫C/C++,Java程序

      時間:2025-04-28  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



      Final Exam
      Instructor: Amrit Singh Bedi
      Instructions
      This exam is worth a total of 100 points. Please answer all questions clearly
      and concisely. Show all your work and justify your answers.
      • For Question 1 and 2, please submit the PDF version of your solution
      via webcourses. You can either write it in latex or do it on paper and
      submit the scanned version. But if you do it on paper and scan it,
      you are responsible for ensuring it is readable and properly scanned.
      There will be zero marks if it is not clearly written or scanned.
      • The total time to complete the exam is 24 hours and it is due at 4:00
      pm EST, Friday (April 25th, 2025). This is a take-home exam. Please
      do not use AI like ChatGPT to complete the exam. There are zero
      marks if found (believe me, we would know if you use it).
      Question 1 50 marks
      Context: In supervised learning, understanding the bias-variance tradeoff
      is crucial for developing models that generalize well to unseen data.
      Problem 1 10 marks
      Define the terms bias, variance, and irreducible error in the context of su pervised learning. Explain how each contributes to the total expected error
      of a model.
      1
      Problem 2 20 marks
      Derive the bias-variance decomposition of the expected squared error for a
      regression problem. That is, show that:
      ED,ε[(y − f
      ˆ(x))2
      ] =  Bias[f
      ˆ(x)]
      2
      + Var[f
      ˆ(x)] + σ
      2
      where f
      ˆ(x) is the prediction of the model trained on dataset D, y = f(x)+ε,
      and σ
      2
      is the variance of the noise ε.
      Hint: You can start by taking y = f(x) + ε, where E[ε] = 0, and
      Var[ε] = σ
      2
      . Let f
      ˆ(x) be a learned function from the training set D. Then
      proceed towards the derivation.
      Problem 3 10 marks
      Consider two models trained on the same dataset:
      • Model A: A simple linear regression model.
      • Model B: A 10th-degree polynomial regression model.
      Discuss, in terms of bias and variance, the expected performance of each
      model on training data and unseen test data. Which model is more likely
      to overfit, and why?
      Problem 4 10 marks
      Explain how increasing the size of the training dataset affects the bias and
      variance of a model. Provide reasoning for your explanation. (10 marks)
      Question 2: Using Transformer Attention 50
      marks
      Context. Consider a simplified Transformer with a vocabulary of six to kens:
      • I (ID 0): embedding  1.0, 0.0

      • like (ID 1): embedding  0.0, 1.0

      • to (ID 2): embedding  1.0, 1.0

      2
      • eat (ID 3): embedding  0.5, 0.5

      • apples (ID 4): embedding  0.6, 0.4

      • bananas (ID 5): embedding  0.4, 0.6

      All three projection matrices are the 2 × 2 identity:
      WQ = WK = WV = I2.
      When predicting the next token, the model uses masked self-attention: the
      query comes from the last position, while keys and values come from all
      previous tokens. (Note: show step by step calculation for all questions
      below)
      (a) (10 marks) For the input sequence [I, like, to] (IDs [0, 1, 2]),
      compute the query, key and value vectors for each token.
      (b) (15 marks) Let Q be the query of the last token and K, V the keys
      and values of all three tokens.
      • Compute the row vector of raw attention scores qK⊤, where q is
      the query of the last token and K is the 3×2 matrix of keys. .
      • Scale by √
      dk (with dk = 2) and apply softmax to obtain attention
      weights.
      • Compute the context vector as the weighted sum of the values.
      (c) (15 marks) Given the context vector c ∈ R
      2
      from part (b), com pute the unnormalized score for each vocabulary embedding via c ·
      embed(w), i.e. dot-product.
      • Apply softmax over these six scores to get a probability distribu tion.
      • Which token has the highest probability? [Note: Because the six
      embeddings are synthetic and not trained on real text, the token
      that receives the highest probability may look ungrammatical in
      normal English; this is an artifact of the toy setup.]
      (d) (10 marks) Explain why the model selects the token you found in
      (c). In your answer, discuss:
      • How the attention weights led to that choice.
      • Explain why keys/values may include the current token but never
      future tokens .
      3

      請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp

      掃一掃在手機打開當前頁
    1. 上一篇:代做ISYS1001、代寫C++,Java程序
    2. 下一篇:FINM7406代做、代寫Java/Python編程
    3. ·代做ISYS1001、代寫C++,Java程序
    4. ·代做COMP2221、代寫Java程序設計
    5. ·代寫MATH3030、代做c/c++,Java程序
    6. ·COMP 5076代寫、代做Python/Java程序
    7. ·代寫COP3503、代做Java程序設計
    8. ·COMP3340代做、代寫Python/Java程序
    9. ·COM1008代做、代寫Java程序設計
    10. ·MATH1053代做、Python/Java程序設計代寫
    11. ·CS209A代做、Java程序設計代寫
    12. ·ITC228編程代寫、代做Java程序語言
    13. 合肥生活資訊

      合肥圖文信息
      出評 開團工具
      出評 開團工具
      挖掘機濾芯提升發動機性能
      挖掘機濾芯提升發動機性能
      戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
      戴納斯帝壁掛爐全國售后服務電話24小時官網
      菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
      菲斯曼壁掛爐全國統一400售后維修服務電話2
      美的熱水器售后服務技術咨詢電話全國24小時客服熱線
      美的熱水器售后服務技術咨詢電話全國24小時
      海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
      海信羅馬假日洗衣機亮相AWE 復古美學與現代
      合肥機場巴士4號線
      合肥機場巴士4號線
      合肥機場巴士3號線
      合肥機場巴士3號線
    14. 短信驗證碼 酒店vi設計 投資移民

      關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

      Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
      ICP備06013414號-3 公安備 42010502001045

      成人久久18免费网站入口