## Pregunta de entrevista

Entrevista de Data Scientist

-

# Data challenge was very similar to the ads analysis challenge on the book the collection of data science takehome challenge, so that was easy (if you have done your homework). SQL was: you have a table where you have date, user_id, song_id and count. It shows at the end of each day how many times in her history a user has listened to a given song. So count is cumulative sum. You have to update this on a daily basis based on a second table that records in real time when a user listens to a given song. Basically, at the end of each day, you go to this second table and pull a count of each user/song combination and then add this count to the first table that has the lifetime count. If it is the first time a user has listened to a given song, you won't have this pair in the lifetime table, so you have to create the pair there and then add the count of the last day. Onsite: lots of ads related and machine learning questions. How to build an ad model, how to test it, describe a model. I didn't do well in some of these.

Respuesta

## Respuestas de entrevistas

18 respuestas

44

Can't tell you the solution of the ads analysis challenge. I would recommend getting in touch with the book author though. It was really useful to prep for all these interviews. SQL is a full outer join between life time count and last day count and then sum the two.

Anónimo en

66

Can you post here your solution for the ads analysis from the takehome challenge book. I also bought the book and was interested in comparing the solutions. Also can you post here how you solved the SQL question?

Anónimo en

24

for the SQL, I think both should work. Outer join between lifetime count and new day count and then sum columns replacing NULLs with 0, or union all between those two, group by and then sum.

Anónimo en

11

REPLACE INTO summary_table(name,song,tot_count) SELECT d_name,d_song,d_cnt+ IFNULL(tot_count,0) FROM (select name d_name,song d_song,count(*) d_cnt from today_table t where to_date ='12-26-2013' group by name,song) LEFT JOIN (select name s_name,song s_song,tot_count from summary_table s) ON s_name=d_name and s_song=d_song;

Anónimo en

12

I had the interview and queries are like this. Not hard, but boy, coding live in front of a screen is very different than reading a question on glassdoor and trying to answer it!

Anónimo en

9

Thanks so much for this! So helpful!

Anónimo en

5

insert into summary_table select d.* from (select b.user_id, b._date, b.song, b.todaysum+coalesce(c.maxsum,0) as cumsum from today_summary b left join (select a.user_id, a.song, max(cumsum) as maxsum from summary_table a group by a.user_id, a.song) c on b.user_id=c.user_id and b.song=c.song) d;

Anónimo en

14

Main query works, having trouble with update statement though... SELECT current_date, t.user_id, t.song_id, SUM(t.count) FROM ( SELECT * FROM cumu UNION ALL SELECT * FROM curr) t GROUP BY t.user_id, t.song_id;

Anónimo en

7

I've seen many reviews where the reviewer referred to this book, but didn't explain much beyond that. I'd say they're trying to make you buy the book.

2

Could you please explain a little bit more about "ads analysis challenge". What is given and what is the objective?

Erin en

2

Could you please explain a little bit more about "ads analysis challenge". What is given and what is the objective?

Erin en

2

I made a composite primary key (user_id, song_id) REPLACE INTO usersong1 SELECT f.datert, f.user_id, f.song_id, f.daily+f.counts FROM (select * from usersong1 a JOIN (select datert, user_id, song_id, COUNT(user_id) as daily from rt group by rt.user_id, rt.song_id ) e ON (a.user_id = e.user_id) AND (a.song_id = e.song_id)) f;

Anónimo en

25

for the SQL, I think Union all should be used instead of outer join

Anónimo en

3

Update cuml set cuml.cnt = (cuml.cnt + cur.cnt) From Cuml cuml inner Join Curr cur On cuml.Userid = cur.userid and cuml.songid = cur.songid

Anónimo en

0

INSERT INTO History(date, user_id, song_id, count) SELECT date, user_id, song_id, SUM(count) FROM Daily WHERE Daily.user_id = History.user_id AND Daily.song_id = History.song_id ON DUPLICATE KEY UPDATE SET History.count = History.count + Daily.count;

RC en

0

INSERT INTO History(date, user_id, song_id, count) SELECT date, user_id, song_id, SUM(count) FROM Daily GROUP BY date, user_id, song_id WHERE Daily.user_id = History.user_id AND Daily.song_id = History.song_id ON DUPLICATE KEY UPDATE SET History.count = History.count + Daily.count;

RC en

0

Note: In Facebook they want you to write efficient queries and union all is not efficient. SELECT (CASE WHEN C.User_id IS NOT NULL THEN C.User_id ELSE D.User_id END) AS USERS, (CASE WHEN C.song_id IS NOT NULL THEN C.song_id ELSE D.song_id END) AS SONG, sum(ISNULL(Cum,0))+sum(isnull(daily_count,0)) as sum FROM [dbo].[Q_15_Cum] AS C FULL JOIN (SELECT User_id, song_id, COUNT(*) AS DAILY_COUNT FROM [dbo].[Q_15_Daily] GROUP BY User_id, song_id )D ON C.User_id=D.User_id AND C.song_id=D.song_id group by(CASE WHEN C.User_id IS NOT NULL THEN C.User_id ELSE D.User_id END) , (CASE WHEN C.song_id IS NOT NULL THEN C.song_id ELSE D.song_id END) order by 1,2

SQL_Master en

0

-> first update cumm table for the user_id,song_id combinations that exist in both tables UPDATE cumm AS v SET date = s.date, count = v.count + s.count FROM (select date,user_id,song_id,count(*) as count from curr group by 1,2,3) AS s where v.user_id=s.user_id and v.song_id=s.song_id -> then insert in cumm table the combinations that dont exist in cumm table Insert into cumm select a.* from (select date,user_id,song_id,count(*) as count from curr group by 1,2,3) a left join cumm b on a.user_id=b.user_id and a.song_id=b.song_id where b.user_id is null

MM en