X
Business

Netflix: Our Metaflow Python library for faster data science is now open source

Netflix's Metaflow Python tool helps data scientists deploy machine-learning models to production faster.
Written by Liam Tung, Contributing Writer

Netflix's data-science team has open-sourced its Metaflow Python library, a key part of the 'human-centered' machine-learning infrastructure it uses for building and deploying data-science workflows. 

The video-streaming giant uses machine learning across all aspects of its business, from screenplay analysis, to optimizing production schedules, predicting churn, pricing, translation, and optimizing its giant content distribution network. 

According to Netflix software engineers, Metaflow was built to help boost the productivity of its data scientists who like to express business logic through Python code but don't want to spend too much time thinking about engineering issues, such as object hierarchies, packaging issues, or dealing with obscure APIs unrelated to their work.

SEE: Six in-demand programming languages: Getting started (free PDF)

The idea behind Metaflow was to give Netflix data scientists the ability to see early on whether a prototyped model would fail in production, allowing them to fix whatever the issue was and ideally speed up deployment times. Netflix in February revealed that Metaflow had helped reduce median deployment times from four months to just seven days.   

Netflix offers this nutshell description of its Python library on the new metaflow.org website: "Metaflow helps you design your workflow, run it at scale, and deploy it to production. It versions and tracks all your experiments and data automatically. It allows you to inspect results easily in notebooks."

It can also be used with popular Python data-science libraries, including PyTorch, Tensorflow, and SciKit Learn.  

Netflix, as is well known, is one of the largest users of Amazon Web Services (AWS), so it's not surprising that Metaflow integrates with numerous AWS services, including the ability to snapshot all code and data in Amazon S3, which Netflix uses as its 'data lake'. This ability should help users quickly scale up models using AWS's storage, compute, and machine-learning services. 

The ability to snapshot code in S3 is what enables Metaflow's automated versioning and experiment tracking so developers can safely inspect and restore Metaflow execution. 

Metaflow is also bundled with a "high-performance S3 client, which can load data up to 10Gbps".

The client allows any organization's data scientists to achieve what Netflix data scientists have done for the past few years. Netflix revealed in April that it used Metaflow to "push the limits of Python", enabling it to use "parallelized and optimized Python code to fetch data at 10Gbps, handle hundreds of millions of data points in memory, and orchestrate computation over tens of thousands of CPU cores". 

"This client has been massively popular among our users, who can now load data into their workflows an order of magnitude faster than before, enabling faster iteration cycles," Netflix software engineers said today. 

SEE: Two malicious Python libraries caught stealing SSH and GPG keys

Metaflow also integrates with Batch, the AWS container-based compute platform. 

Netflix argues that Metaflow on AWS allows developers to get the speed of developing on a laptop, with the deeper compute resources available in the cloud. 

"Metaflow makes it easy to move back and forth between the local and remote modes of execution" by not necessitating changes to code or libraries for each state, which in turn should make troubleshooting easier. 

More on Python and programming languages 

  • Microsoft: We're creating a new Rust-based programming language for secure coding  
  • Microsoft's Rust experiments are going well, but some features are missing
  • Tech jobs: Python programming language and AWS skills demand has exploded  
  • Programming language Python 2's end looms, so why will many miss the deadline? 
  • Google: As Go programming language turns 10, here are the big names using it
  • Python programming language creator retires, saying: 'It's been an amazing ride'
  • Programming languages: How Instagram's taming a multimillion-line Python monster
  • Salesforce: Why we ditched Python for Google's Go language in Einstein Analytics  
  • Python-inspired Nim: Version 1.0 of the programming language launches
  • Google: Take our free Kotlin programming language courses to build Android apps
  • Microsoft: We want you to learn Python programming language for free
  • Google: Dart 2.5 programming language SDK will 'supercharge' developers
  • Raspberry Pi gets MIT's Scratch 3 programming language for Raspbian
  • Julia programming language: Users reveal what they love and hate the most about it
  • Is Julia fastest-growing new programming language? Stats chart rapid rise in 2018
  • Python vs R for data science: Professor rates programming language rivals
  • Programming languages: Python predicted to overtake C and Java in next 4 years 
  • Netflix: Python programming language is behind every film you stream
  • JPMorgan's Athena has 35 million lines of Python code, and won't be updated to Python 3 in time TechRepublic
  • Mozilla's radical open-source move helped rewrite rules of tech CNET
  • Editorial standards