Cover image for Wikimedia Foundation
Logo

Wikimedia Foundation

Empresa activa

Wikimedia Foundation

Añadir una entrevista

Pregunta de entrevista

Entrevista de Research Software Engineer

-

Wikimedia Foundation

Build a program to process data from an emitter. The data arrives ordered and for every received record your program may take from 0.1 to 5 seconds to process. The processed data has to be given to a stream, ordered and in real time. For the sake of the example the processing time is random sleeping between 0.1 to 5 seconds.

Respuestas de entrevistas

2 respuestas

0

Build a queues based system with multiple record processors that work in parallel, but make sure that this processing happens in parallel, not just concurrently as in the real world the CPU will be working, not just sleeping.

Anónimo en

0

As an addition to the answer above: Parallelising the elements processing without extra logic around it would cause the processed elements to be published downstream in a non-deterministic order. If we want to maintain order and parallelism, a solution could be to have a (circular) atomic auto incrementing integer `i`, after processing an element `e` assign the latest `i` to it by putting them into a map from `i` to `e`. Keep track of the latest `i` which has been published downstream, let's call it `latest`. At this point, whenever `i` is incremented, check if `i` is the successor of `latest`, if that's the case it means you can publish that element downstream and you can also publish all the elements in the map that are successors (while clearing them from the map). If you use this approach in some cases (eg. when processing of one element produces lots of data), you should make sure the queue in bounded, not to risk out of memory while processing too many elements in parallel.

Anónimo en

Añadir respuestas o comentarios

Para publicar un comentario sobre esto, inicia sesión o regístrate.