日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

健康合肥汽車體育文旅企業動態企業推廣網站推廣外鏈推廣

CS439編程代寫、代做Java程序語言
CS439編程代寫、代做Java程序語言

時間：2024-10-13 來源：合肥網hfw.cc 作者：hfw.cc 我要糾錯

CS439: Introduction to Data Science Fall 2024

Problem Set 1

Due: 11:59pm Friday, October 11, 2024

Late Policy: The homework is due on 10/11 (Friday) at 11:59pm. We will release the solutions
of the homework on Canvas on 10/16 (Wednesday) 11:59pm. If your homework is submitted to
Canvas before 10/11 11:59pm, there will no late penalty. If you submit to Canvas after 10/11
11:59pm and before 10/16 11:59pm (i.e., before we release the solution), your score will be
penalized by 0.9k
, where k is the number of days of late submission. For example, if you
submitted on 10/14, and your original score is 80, then your final score will be 80*0.93
=58.**
for 14-11=3 days of late submission. If you submit to Canvas after 10/16 11:59pm (i.e., after we
release the solution), then you will earn no score for the homework.

General Instructions

Submission instructions: These questions require thought but do not require long answers.
Please be as concise as possible. You should submit your answers as a writeup in PDF format,
for those questions that require coding, write your code for a question in a single source code
file, and name the file as the question number (e.g., question_1.java or question_1.py), finally,
put your PDF answer file and all the code files in a folder named as your Name and NetID (i.e.,
Firstname-Lastname-NetID.pdf), compress the folder as a zip file (e.g., Firstname-LastnameNetID.zip),
and submit the zip file via Canvas.

For the answer writeup PDF file, we have provided both a word template and a latex template
for you, after you finished the writing, save the file as a PDF file, and submit both the original
file (word or latex) and the PDF file.

Questions

1. Map-Reduce (35 pts)

Write a MapReduce program in Hadoop that implements a simple “People You Might Know”
social network friendship recommendation algorithm. The key idea is that if two people have a
lot of mutual friends, then the system should recommend that they connect with each other.

Input: Use the provided input file hw1q1.zip.

The input file contains the adjacency list and has multiple lines in the following format:
<User><TAB><Friends>
Here, <User> is a unique integer ID corresponding to a unique user and <Friends> is a commaseparated
list of unique IDs corresponding to the friends of the user with the unique ID <User>.
Note that the friendships are mutual (i.e., edges are undirected): if A is friend with B, then B is
also friend with A. The data provided is consistent with that rule as there is an explicit entry for
each side of each edge.

Algorithm: Let us use a simple algorithm such that, for each user U, the algorithm recommends
N = 10 users who are not already friends with U, but have the largest number of mutual friends
in common with U.

Output: The output should contain one line per user in the following format:

<User><TAB><Recommendations>

where <User> is a unique ID corresponding to a user and <Recommendations> is a commaseparated
list of unique IDs corresponding to the algorithm’s recommendation of people that
<User> might know, ordered by decreasing number of mutual friends. Even if a user has
fewer than 10 second-degree friends, output all of them in decreasing order of the number of
mutual friends. If a user has no friends, you can provide an empty list of recommendations. If
there are multiple users with the same number of mutual friends, ties are broken by ordering
them in a numerically ascending order of their user IDs.

Also, please provide a description of how you are going to use MapReduce jobs to solve this
problem. We only need a very high-level description of your strategy to tackle this problem.

Note: It is possible to solve this question with a single MapReduce job. But if your solution
requires multiple MapReduce jobs, then that is fine too.

What to submit:

(i) The source code as a single source code file named as the question number (e.g.,
question_1.java).

(ii) Include in your writeup a short paragraph describing your algorithm to tackle this problem.

(iii) Include in your writeup the recommendations for the users with following user IDs:
924, 8941, 8942, **19, **20, **21, **22, 99**, 9992, 9993.

2. Association Rules (35 pts)

Association Rules are frequently used for Market Basket Analysis (MBA) by retailers to
understand the purchase behavior of their customers. This information can be then used for many different purposes such as cross-selling and up-selling of products, sales promotions,
loyalty programs, store design, discount plans and many others.

Evaluation of item sets: Once you have found the frequent itemsets of a dataset, you need to
choose a subset of them as your recommendations. Commonly used metrics for measuring
significance and interest for selecting rules for recommendations are:

2a. Confidence (denoted as conf(A → B)): Confidence is defined as the probability of
occurrence of B in the basket if the basket already contains A:

conf(A → B) = Pr(B|A),

where Pr(B|A) is the conditional probability of finding item set B given that item set A is
present.

2b. Lift (denoted as lift(A → B)): Lift measures how much more “A and B occur together” than
“what would be expected if A and B were statistically independent”:
* and N is the total number of transactions (baskets).

3. Conviction (denoted as conv(A→B)): it compares the “probability that A appears without B if
they were independent” with the “actual frequency of the appearance of A without B”:

(a) [5 pts]

A drawback of using confidence is that it ignores Pr(B). Why is this a drawback? Explain why lift
and conviction do not suffer from this drawback?

(b) [5 pts]

A measure is symmetrical if measure(A → B) = measure(B → A). Which of the measures
presented here are symmetrical? For each measure, please provide either a proof that the
measure is symmetrical, or a counterexample that shows the measure is not symmetrical.

(c) [5 pts]
A measure is desirable if its value is maximal for rules that hold 100% of the time (such rules are
called perfect implications). This makes it easy to identify the best rules. Which of the above
measures have this property? Explain why.

Product Recommendations: The action or practice of selling additional products or services to
existing customers is called cross-selling. Giving product recommendation is one of the
examples of cross-selling that are frequently used by online retailers. One simple method to
give product recommendations is to recommend products that are frequently browsed
together by the customers.

Suppose we want to recommend new products to the customer based on the products they
have already browsed on the online website. Write a program using the A-priori algorithm to
find products which are frequently browsed together. Fix the support to s = 100 (i.e. product
pairs need to occur together at least 100 times to be considered frequent) and find itemsets of
size 2 and 3.

Use the provided browsing behavior dataset browsing.txt. Each line represents a browsing
session of a customer. On each line, each string of 8 characters represents the id of an item
browsed during that session. The items are separated by spaces.

Note: for the following questions (d) and (e), the writeup will require a specific rule ordering
but the program need not sort the output.

(d) [10pts]

Identify pairs of items (X, Y) such that the support of {X, Y} is at least 100. For all such pairs,
compute the confidence scores of the corresponding association rules: X ⇒ Y, Y ⇒ X. Sort the
rules in decreasing order of confidence scores and list the top 5 rules in the writeup. Break ties,
if any, by lexicographically increasing order on the left hand side of the rule.

(e) [10pts]

Identify item triples (X, Y, Z) such that the support of {X, Y, Z} is at least 100. For all such triples,
compute the confidence scores of the corresponding association rules: (X, Y) ⇒ Z, (X, Z) ⇒ Y,
and (Y, Z) ⇒ X. Sort the rules in decreasing order of confidence scores and list the top 5 rules in
the writeup. Order the left-hand-side pair lexicographically and break ties, if any, by
lexicographical order of the first then the second item in the pair.

What to submit:

Include your properly named code file (e.g., question_2.java or question_2.py), and include the
answers to the following questions in your writeup:
(i) Explanation for 2(a).

(ii) Proofs and/or counterexamples for 2(b).

(iii) Explanation for 2(c).

(iv) Top 5 rules with confidence scores for 2(d).

(v) Top 5 rules with confidence scores for 2(e).

3. Locality-Sensitive Hashing (30 pts)

When simulating a random permutation of rows, as described in Sec 3.3.5 of MMDS textbook,
we could save a lot of time if we restricted our attention to a randomly chosen k of the n rows,
rather than hashing all the row numbers. The downside of doing so is that if none of the k rows
contains a 1 in a certain column, then the result of the min-hashing is “don’t know,” i.e., we get
no row number as a min-hash value. It would be a mistake to assume that two columns that
both min-hash to “don’t know” are likely to be similar. However, if the probability of getting
“don’t know” as a min-hash value is small, we can tolerate the situation, and simply ignore such
min-hash values when computing the fraction of min-hashes in which two columns agree.

(a) [10 pts]

Suppose a column has m 1’s and therefore (n-m) 0’s. Prove that the probability we get
“don’t know” as the min-hash value for this column is at most (
+,-
+ )..

(b) [10 pts]

Suppose we want the probability of “don’t know” to be at most ,/0. Assuming n and m are
both very large (but n is much larger than m or k), give a simple approximation to the smallest
value of k that will assure this probability is at most ,/0. Hints: (1) You can use (
+,-
+ ). as the
exact value of the probability of “don’t know.” (2) Remember that for large x, (1 − /
1
)1 ≈ 1/ .

(c) [10 pts]

Note: This question should be considered separate from the previous two parts, in that we are
no longer restricting our attention to a randomly chosen subset of the rows.
When min-hashing, one might expect that we could estimate the Jaccard similarity without
using all possible permutations of rows. For example, we could only allow cyclic permutations
i.e., start at a randomly chosen row r, which becomes the first in the order, followed by rows
r+1, r+2, and so on, down to the last row, and then continuing with the first row, second row,
and so on, down to row r−1. There are only n such permutations if there are n rows. However,
these permutations are not sufficient to estimate the Jaccard similarity correctly.

Give an example of two columns such that the probability (over cyclic permutations only) that
their min-hash values agree is not the same as their Jaccard similarity. In your answer, please
provide (a) an example of a matrix with two columns (let the two columns correspond to sets
denoted by S1 and S2) (b) the Jaccard similarity of S1 and S2, and (c) the probability that a
random cyclic permutation yields the same min-hash value for both S1 and S2.

What to submit:

Include the following in your writeup:

(i) Proof for 3(a)

(ii) Derivation and final answer for 3(b)

(iii) Example for 3(c)

請加QQ：99515681 郵箱：99515681@qq.com WX：codinghelp

掃一掃在手機打開當前頁

上一篇:FINM8006代寫、代做Python編程設計

下一篇: ICT50220代做、代寫c++，Java程序設計

注：此文是出于傳遞更多信息之目的。所轉載的內容，其版權均由原作者和資料提供方所擁有！若侵犯了您的合法權益，請聯系我們，將及時更正、刪除，謝謝。

無相關信息

合肥生活資訊

·合肥汽車客運網上售票

·合肥汽車客運

·合肥校外培訓機構“白名單”

·合肥市人民政府征兵辦公室電話

·合肥市中小學教師招聘考試網

·合肥市醫療保險管理中心電話查詢（合肥市醫保

·2023合肥市住房公積金查詢指南

·合肥市住房租賃交易服務平臺（官方網站）

·合肥市消防救援支隊聯系電話

·合肥露營地推薦給你！合肥有哪些露營地？

·2023年合肥具備學歷教育辦學資質的中等職業學

·合肥淮河路步行街

·廬江縣各單位常用電話號碼

·合肥市廬江縣湯池鎮百花村

·安徽省美術館

·安徽創新館 - 安徽科技大市場

·安徽省2023年普通高等學校體育專業課統一考試

·安徽肥東管灣國家濕地公園

·安徽廬陽董鋪國家濕地公園

·肥東大劇院

·廬陽區文化館

·安徽這70個村落擬列入中國傳統村落名錄

·合肥市非機動車安全管理條例，非機動車這些行

·合肥信易貸平臺，為中小微企業融資

·合肥市公管局

·安徽省征地信息公開平臺

·安徽省教育招生考試院，安徽高招咨詢熱線開通

·合肥最新義務教育學區劃分

·成績錄取查詢

·合肥市區2022年高考各分考區考點安排

·合肥交警民意熱線開通

·安徽學習技能可獲補貼

·合肥市各縣區救助站聯系電話地址

·合肥市婚姻登記機構電話地址

·合肥城鄉居民最低生活保障標準和特困人員救助

·合肥熱電，合肥供暖

·合肥24小時核酸檢測服務機構名單，合肥核酸檢

·合肥城鄉居民基本養老保險個人參保信息查詢

·2022年合肥市區中考報名方案發布

·2022屆安徽畢業生求職創業補貼1500元發放申請

·合肥市人社部門聯系電話

·合肥市生育相關服務指南（2021年）

·合肥市公共就業人才服務

·合肥市2021年義務教育招生入學政策

·合肥市2021年中小學幼兒園暑假安排

·合肥教育局各部咨詢電話

·合肥最新展會計劃

·合肥市公共就業人才服務管理中心

·合肥市醫療保障局

·合肥市2021年中小學幼兒園寒假安排

·安徽省政府定價的經營服務性收費目錄清單

·合肥市“互聯網+不動產登記”一體化平臺

·四種合肥通卡要年審

·2020合肥城鄉居民養老保險待遇與繳費標準

·合肥市住房保障和房產管理局

·合肥市殯儀館電話

·合肥招生考試網

·合肥辦理的社�？I務指南

·合肥市社會保障卡業務經辦窗口地址（人社部門

·合肥市最低工資標準2019

合肥圖文信息

2025年10月份更新拼多多改銷助手小象助手多多出評軟件 — 2025年10月份更新拼多多改銷助手小象助手多

有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化 — 有限元分析 CAE仿真分析服務-企業/產品研發

急尋熱仿真分析？代做熱仿真服務+熱設計優化

出評開團工具

挖掘機濾芯提升發動機性能

海信羅馬假日洗衣機亮相AWE 復古美學與現代科技完美結合 — 海信羅馬假日洗衣機亮相AWE 復古美學與現代

合肥機場巴士4號線

合肥機場巴士3號線

推薦信息

欄目更新

熱點信息

·代做CS2810、代寫Python/Java程序

·SEHH2042代做、c/c++程序設計代寫

·SEHH2042代做、代寫c++，Java編程

·COSC2276代做、C/C++語言程序代寫

·COMP3009J代做、代寫Python程序設計

·代寫CS3026、代做Virtual Disk

·越南歷任國家主席有哪些（越南現任主席是誰）

·ISOM3028代做、Python/c++編程語言代寫

·COMP2011代寫、C++編程設計代做

·代寫ECON0013、代做Python/c++語言程序

短信驗證碼目錄網排行網

關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
ICP備06013414號-3 公安備 42010502001045

日韩精品一区二区三区高清_久久国产热这里只有精品8_天天做爽夜夜做爽_一本岛在免费一二三区

<em id="rw4ev"></em>

<tr id="rw4ev"></tr>

<nav id="rw4ev"></nav>

<strike id="rw4ev"><pre id="rw4ev"></pre></strike>

韩国一区二区三区美女美女秀| 亚洲影视综合| 国产精品视频xxxx| 国产一区二区三区在线观看免费视频| 欧美高清自拍一区| 国产精品免费视频观看| 国产精品人人做人人爽| 久久久久国色av免费观看性色| 久久久久久久综合日本| 亚洲免费高清视频| 亚洲一区二三| 久久精品视频在线看| 久久aⅴ国产紧身牛仔裤| 久久国产精品99久久久久久老狼| 欧美日韩国产精品一区二区亚洲| 亚洲国产精品一区在线观看不卡| 亚洲福利在线视频| 国产欧美日韩视频一区二区| 在线不卡中文字幕| 国产主播喷水一区二区| 久久男人资源视频| 国产欧美一区二区白浆黑人| 久久精品国产亚洲a| 免费av成人在线| 日韩视频在线观看免费| 国产综合av| 久久免费的精品国产v∧| 国语自产精品视频在线看抢先版结局| 国产乱子伦一区二区三区国色天香| 欧美专区日韩专区| 亚洲欧美中文日韩v在线观看| 亚洲精品一区二区三区婷婷月| 裸体一区二区三区| 亚洲视频导航| 国产一区在线视频| 狠狠色丁香婷婷综合久久片| 国产精品美女久久| 国产精品久久久久久av下载红粉| 欧美日韩国产欧美日美国产精品| 国产欧美一区二区精品秋霞影院| 日韩午夜av电影| 在线免费高清一区二区三区| 国产精品亚洲一区二区三区在线| 国产自产v一区二区三区c| 在线一区二区日韩| 亚洲视频中文字幕| 国产精品美女www爽爽爽视频| 国产精品尤物福利片在线观看| 国产精品视频免费| 亚洲国产一区二区三区高清| 在线播放国产一区中文字幕剧情欧美| 欧美69视频| 欧美日韩综合另类| 欧美一级片在线播放| 欧美成人中文字幕| 亚洲在线免费视频| 欧美日韩中文字幕精品| 亚洲一本视频| 136国产福利精品导航网址| 在线看片日韩| 夜夜嗨av一区二区三区中文字幕| 欧美激情综合| 国产精品jizz在线观看美国| 一区二区亚洲欧洲国产日韩| 欧美美女bb生活片| 夜夜嗨av一区二区三区网站四季av| 欧美一区二区免费视频| 欧美一区二区三区在| 欧美精品在线一区二区| 国产精品theporn88| 国产一区二区三区日韩欧美| 国产精品久久久久免费a∨| 欧美日韩综合在线免费观看| 国产手机视频一区二区| 国产亚洲成av人在线观看导航| 日韩亚洲欧美高清| 国产精品一区二区久久久久| 欧美黄色小视频| 亚洲国产婷婷综合在线精品| 国产精品二区影院| 黄色成人在线观看| 欧美怡红院视频| 国产精品国产三级国产a| 亚洲老司机av| 国产精品视频免费观看www| 欧美日韩一二三四五区| 欧美视频在线观看一区| 黄色日韩精品| 在线观看av一区| 日韩天天综合| 国产偷国产偷亚洲高清97cao| 欧美日韩国产高清视频| 欧美国产日韩免费| 亚洲四色影视在线观看| 亚洲资源av| 欧美日韩国产va另类| 玖玖在线精品| 欧美韩日精品| 欧美电影在线免费观看网站| 在线视频欧美一区| 欧美日本网站| 国产亚洲欧美另类一区二区三区| 亚洲午夜精品17c| 欧美高清你懂得| 国产精品亚洲精品| 欧美日韩精品不卡| 亚洲大胆人体在线| 欧美综合77777色婷婷| 久久综合九色综合久99| 亚洲午夜激情免费视频| 国产欧美日本一区视频| 国产资源精品在线观看| 欧美电影电视剧在线观看| 久久爱www久久做| 欧美日韩免费观看一区| 国内精品久久久久影院色| 激情一区二区三区| 欧美在线视频导航| 美国三级日本三级久久99| 最新成人在线| 欧美激情网站在线观看| 亚洲国产高清视频| 亚洲伊人色欲综合网| 麻豆精品国产91久久久久久| 欧美日韩一区在线观看| 欧美福利小视频| 亚洲欧洲免费视频| 国产在线视频欧美一区二区三区| 亚洲色图制服丝袜| 亚洲欧美另类在线观看| 亚洲一区免费在线观看| 国产精品无码专区在线观看| 韩国av一区二区三区四区| 欧美日韩国产高清视频| 国产精品理论片| 99riav久久精品riav| 亚洲素人在线| 欧美亚洲一区三区| 国产麻豆日韩欧美久久| 亚洲一区二区三区777| 国产一级一区二区| 久久成人18免费网站| 亚洲三级电影全部在线观看高清| 欧美日韩在线亚洲一区蜜芽| 国产乱子伦一区二区三区国色天香| 久久综合亚洲社区| 每日更新成人在线视频| 久久久av水蜜桃| 欧美日韩国产精品免费观看| 国户精品久久久久久久久久久不卡| 怡红院精品视频在线观看极品| 免费在线看一区| 欧美一区在线看| 国产精品乱人伦中文| 狠狠色狠狠色综合人人| 国产精品一区二区三区久久久| 欧美日韩中文字幕在线| 国产精品h在线观看| 欧美91福利在线观看| 欧美大片免费观看| 亚洲国产成人高清精品| 欧美另类一区二区三区| 欧美一区二区黄| 黄色国产精品|