Preguntas de entrevista de Data mining analyst
243
Preguntas de entrevista para Data Mining Analyst compartidas por los candidatosPrincipales preguntas de entrevista

Given the set a a b b a a c c b b of unknown length, write an algorithm that figures out which occurs most frequently and with the most continuous repetition.
2 respuestas↳
doesn't matter, no answer is right!
↳
Maintains Hashmap to store the overall frequency and longest continuous sequence. Menos


List the strings that are anagrams from a set of strings?
2 respuestas↳
Sorting the strings is not optimal because each sort is O(N log N) where N is the number of characters in each word. A more optimal solution is to create a function to encode each word as a hash table of character frequencies, which is O(N) for each word. Menos
↳
sort the strings and compare

How would you design a recommendation system (like amazon)?
2 respuestas↳
Use collaborate filtering to compare personal preference with others. If A and B are similar, we can recommend preferred items in B to A. Menos
↳
Why downvote on other answer? He/she is right. Collaborative filtering is the most common strategy for recommendation systems. You see user A buys these things and user B also bought those things but user B bought this other thing too so let's show that thing to User A. Menos

there really were none.
2 respuestas↳
they seemed ot want to hear what I had ot say about my past assignments and relevance to the opening. i think they were not impressed. Menos
↳
Intuited

We do pre-screening on the data to remove fraud threats -- so how do we find a data sample that we can use to determine a real representation of fraud events.
2 respuestas↳
Remove screen and look at the unbiased data.
↳
Yes, remove prescreen and look the unbiased sample. IF the unbiased sample becomes too big, then just randomly choose 1/2, or small, for the purpose of representation of fraud events. Menos

why do you think you should be chosen for this position?
2 respuestas↳
I'm hard working, great team player, reliable, quick learner etc etc
↳
cuz i got a 10inch and great performer in front of the camera - porn industry

Implement a sampling function with nominal distribution.
2 respuestas↳
I think you mean Normal distribution! If you are using R use set.seed(). You can then use rnorm() with size, mean & SD. e.g. >set.seed(123) >rnorm(100, 2, 5) Menos
↳
I'm the original poster, sorry for my typo. I actually mean multinomial distribution. And the advanced question was, if the probability is a skewed distribution, how would you speed up your algorithm. You can find both answer from Wikipedia. :) Menos
Tell me about your past experience in engineering.
1 respuestas↳
Provided examples from my education and work.
