Helixa logo

Data Engineer

full-time Helixa Milan


At Cubeyou we work on Artificial Social Intelligence training machines to understand, predict and generate social patterns. Artificial minds are becoming extremely smart but need to be capable of much more than functional and cognitive tasks — they need to be able to read people and understand human needs, motivations and emotions, they need empathy.

We work on the right brain of AI to make sure that our society not only becomes more efficient but indeed a better one. This means developing algorithms that Understand how humans function, Predict what they will do and Generate affinities and behaviours for the good.

We do a great deal of research and build leading edge technologies and deploy them to solve practical everyday problems. Cubeyou is a company, packed with extremely talented and hard working people that enjoy solving hard problems and building great products.

We have offices in the US and Europe and we are looking for more talent to help us teach machines how to become part of our society!

AI team

The AI team at Cubeyou is at the forefront of innovation in advanced analytics and machine learning, including deep learning and more traditional algorithms.

Our team is composed by top class scientists, engineers and domain experts driving the data science capabilities of our products end-to-end, including: data ingestion and processing, exploratory analysis, modeling, validation, visualization, tuning and automation.
We alternate the development workflow with research spikes on state-of-the-art machine intelligence applied to understanding and generalizing people behaviour and affinities.
Our R&D strives to cover a full spectrum of topics: bayesian inference, cross-domain adaptation, natural language processing, computer vision, generative models, scalable distributed systems and GPUs hardware optimizations just to name few.

Artificial Intelligence at Cubeyou raises both scientific and engineering challenges. We deal with a very large amount of data and rely on various data management and distributed computing technologies (Spark, MongoDB) deployed into cloud infrastructures (AWS). We utilize a diverse number of languages (Python, Scala) and tools (PyData stack, MLlib, TensorFlow, Keras, PyTorch). Periodically, we work side by side with the core development team to maintain the data lake and to deploy solutions in production with minimum overhead.

The ultimate goal is to help the company grow in three major areas:

  1. Understanding consumer characteristics, preferences and lifestyle.
  2. Predicting latent variables and future outcomes (e.g. hidden traits, migrations, market indicators).
  3. Generating unobserved patterns (e.g. synthetic consumers population, augmented reality, personalizations).

In addition to the core business, 10% of team time each week is devoted to learning and experimenting with latest advances in Artificial Intelligence, Deep Learning, Sociology and Data Technologies.


A Data Engineer at Cubeyou expands and optimizes our data architecture and infrastructure, manages ingestion pipelines from multiple sources into our data lake, maintains large datasets consumed daily, supports the R&D of our AI technology as well as the operations required to deploy algorithms at scale.

Our scientists and engineers work as a team and are responsible for the entire end-to-end process, from research to production.

The ideal candidate is a senior software/data engineer with experience in building robust data pipelines and large-scale deployment of data-driven products in the cloud.

If you love coding and can’t wait to master the new and upcoming technology in Big Data and Artificial Intelligence, this is the right place for you.

The position is based in our rapidly growing R&D office in Milan (Italy) @ Talent Garden Merano.


  • Design and implement scalable agents for large datasets download.
  • Assemble and transform TBs of structured and unstructured datasets.
  • Create and maintain data pipeline architectures.
  • Identify, design, and implement internal process improvements and automations.
  • Maintain a clean codebase for production and dev environments.
  • Build sanity checks and dashboards for monitoring data quality and ensuring a healthy infrastructure.
  • Ensure engineering and programming practices among the entire team.
  • Build robust workflows for training, evaluation and deploying algorithms at scale.
  • Invent and implement smart strategies for providing high quality labelled data for training.
  • Work closely with the Chief Scientist, CEO and Product Owner to implement and improve the functionalities and user experience of the platform and design new features.
  • Adopt a Continuous Learning process to remain up to date with the latest and most productive technologies.

Required Skills

  • Master’s degree or above in computer science or software/computer/IT engineering fields.
  • 3 years experience, or a comparable industry career, in building production systems for software development, data engineering, data science or similar.
  • Experience with designing distributed architectures leveraging modern cloud infrastructures (AWS or other providers).
  • Experience with Docker containers, orchestration systems (e.g. Kubernetes), continuous integration and job schedulers.
  • Working knowledge of Python and software development.
  • Experience with distributed computing and NoSQL technologies (e.g. Spark, Hadoop, Flink, HBase, MongoDB, Cassandra...).
  • Experience with large volume ETL or data streaming.
  • Enthusiast of agile development and lean principles.
  • Ability to prototype and test suboptimal solutions quickly and iterate up to a final product that can be deployed in production.

Desired skills

  • PhD with a proven track record of publications or applications to scalable data architectures, is prefered.
  • Familiar with functional programming and Scala.
  • Familiar with other scripting languages such as Go, Javascript, Groovy.
  • Knowledge of Artificial Intelligence and ability to apply it to real problems.
  • Knowledge of serverless architectures (Lambda, Kinesis, Glue).
  • Contributor or owner of GitHub repositories.
  • Competitive salary.
  • Equity.
  • Free lunch delivered daily.
  • Personal budget for conferences and training.
  • Flexible working hours.
  • Startup atmosphere with the usual perks.
  • Regular team building activities.

published: Jan. 9, 2019

Apply for this position