|
|
October 10 · Issue #27 · View online
Curated essays about the future of Data Science. Production Data Science and learning resources for continuous learning. Covers Data Science, Data Engineering, MLOps & DataOps. Curated by people at https://nibble.ai/
|
|
The Morning Paper is great! And this post is no exception, it succinctly summarizes the lessons learned integrating around 150 successful customer-facing applications of machine learning at Booking.com.
|
|
Which flavor of data professional are you? (See below)
|
Why is building machine learning systems so hard?
Building machine learning systems in 2019 feels like stacking together Lego. Except, you have to construct all the Lego pieces from scratch. In the dark. This article discusses why machine learning systems have high essential complexity: complexity that cannot be reduced, and is inherent to the system.
|
The Politics of Images in Machine Learning Training Sets
You open up a database of pictures used to train artificial intelligence systems. At first, things seem straightforward. You’re met with thousands of images: apples and oranges, birds, dogs, horses, mountains, clouds, houses, and street signs. But as you probe further into the dataset, people begin to appear: cheerleaders, scuba divers, welders, Boy Scouts, fire walkers, and flower girls. Things get strange: A photograph of a woman smiling in a bikini is labeled a “slattern, slut, slovenly woman, trollop.” Fascinating essay on bias (or politics) in AI through an investigation into the politics of training sets, and the fundamental problems with classifying humans.
|
MLOps Tooling: an overview of some tools
Anyone in the industry knows how confusing it can be to discern between a data scientist, data engineer, model ops engineer, research scientist, and the list goes on. Titles aside, getting models into production (i.e. serving a ML model that provides predictions) is frankly where the money is. An overview of some tools explored by the author at the last MLOps NYC conference: Kubeflow, MLFlow, SageMaker, Dask and Rapids.
|
Data science is not a science project
There are many barriers to success in data-driven initiatives. Chief among them is the difficulty many organizations face in operationalizing analytics: deploying, monitoring, and managing analytics and AI in business processes. To prove business value, data-driven initiatives need to move past the pilot phase. That’s when the hard work of deploying, monitoring, and managing analytics begins.
|
Pixelation to represent endangered species counts
In 2008, the World Wildlife Fund ran a campaign that used pixelation to represent the number of animals left for endangered species. One pixel represents an animal, so an image appears more pixelated when there are fewer animals left. Although data visualization is extremely powerful, I feel we don’t share nearly enough of it here. This is particularly true in this one from the excellent FlowingData.
|
|
Which flavor of data professional are you?
There are not enough hours in the day for one person to do everything alone. A field guide to the expanding data science universe. As this universe is expanding rapidly, it’s crucially important to get a mental map of the different roles involved at each stage of a project.
|
|
Release of Streamlit, an app framework to build beautiful ML tools
In my experience, every nontrivial machine learning project is eventually stitched together with bug-ridden and unmaintainable internal tools. Streamlit is a free and open-source which aims to enable machine learning engineers to be able to create beautiful apps without needing a full dedicated tools team.
|
Hugging Face releases Transformer 2.0 🤗
Transformers provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG). The library glues these together through an abstraction layer that speeds up experimentation. Formerly pytorch-transformers, transformers is a popular open-source NLP library. The new releases now brings support for Tensorflow 2 and interoperability between PyTorch and Tensorflow (hence the new name).
|
What’s new in MLflow 1.3?
A bunch of bug fixes, performance enhancements and some new features, including:
- Tensorflow 2.0 support
- GeoJSON artifact previews
- HTML artifact previews
|
|
SQL queries don't start with SELECT
What order do SQL queries actually run in?
|
Ensemble Methods for Machine Learning: AdaBoost
Simple intuition and maths behind AdaBoost.
|
Applied Machine Learning - Columbia W4995
A very good introductory course taught by Andreas C Müller, associate research scientist at Columbia University Data Science Institute.
|
|
If you found something particularly useful, I would love to know, please reach out to me at [email protected] with “nibble.ai weekly” in the subject. Have a great week. Florent
|
Did you enjoy this issue?
|
|
|
|
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
|