Big data ops#

A team of experienced data scientists and engineers working in a start-up or an SME where they have the requirement for large-scale big data processing to perform ML training or inference. They use big data tools such as Kafka and Spark (or Hadoop) to build and orchestrate their data pipelines. High-powered processors such as GPUs or TPUs are used to speed up data processing and ML training. The development of ML models is led by data scientists and deploying the models is orchestrated by data/software engineers. A strong focus is given to developing models. Much less focus is given on monitoring the models.

As they continue with their operations, the team can:

  • Experience a lack of traceability for model training and monitoring.

  • Experience a lack of reproducible artifacts

  • Incur huge costs, or more than expected, due to mundane and repeated work

  • Experience code and data starting to grow independently.

These operations are typical for big data ops:

  • The team consists of data scientists/engineers.

  • There are high requirements for big data processing capacity.

  • Databricks is a key framework to share and collaborate inside teams and between organisations.

  • ML model development happens in the cloud by using one of many ML workflow management tools such as Spark MLlib.

  • There are low support requirements for open source technologies such as PyTorch and TensorFlow for deep learning.