View profile

nibble.ai weekly - Issue #26: Data Tooling Market 2019, Data Science is boring, Full stack Deep Learning and more...

Revue
 
This week's top pick is Jon Ma's Data Tooling Market — 2019 : an overview of the state of data tools
 

nibble.ai dispatch

October 2 · Issue #26 · View online
Curated essays about the future of Data Science. Production Data Science and learning resources for continuous learning. Covers Data Science, Data Engineering, MLOps & DataOps. Curated by people at https://nibble.ai/

This week’s top pick is Jon Ma’s Data Tooling Market — 2019 : an overview of the state of data tools in 2019 as well as interesting opportunities for investors and founders that are in the data tooling stack.

Is data science boring?
Is data science boring?
Data science is boring
A provocative title for an interesting read.
Observations, opinions and advices from a Data Science manager who leads teams to deploy ML systems at Fortune-100 enterprises.
Many people cherrypick the exciting parts of doing Data Science (or ML, Machine Learning) to motivate themselves and others. But we must face a reality: the real work is often “boring” — boring as comparing to what people romanticize.
What happened to Hadoop
Hadoop was the next big thing in enterprise IT; until it wasn’t. Other, bigger forces, from the cloud to AI, picked away at its utility.
Hadoop opened people’s eyes to what was possible with big data, but it’s also a reminder that no single technology is going to remake the world of enterprise IT— at least not anymore.
What makes a good data science practice
Banana Data Podcast #10 🎙
An interesting conversation between the hosts of The Banana Data Podcast Episode 10. From intentional data to getting outside perspectives, the hosts walk us through how to build not only a scalable AI practice, but one that is responsible, ethical, and interpretable. buzzsprout.com
Technical
Machine learning and analytics for time series data
O'Reilly Data Show Podcast 🎙
A very interesting conversation about time series data, and, specifically, anomaly detection and forecasting. oreilly.com
Scaling a Mature Data Pipeline — Managing Overhead
There is often a hidden performance cost tied to the complexity of data pipelines — Overhead. This post introduces its concept, and examines the techniques they use at Airbnb to avoid it in their data pipelines. medium.com
News
TensorFlow 2.0 is now available!
TensorFlow 2.0 just got out of beta with a focus on simplicity and ease of use, featuring updates like:
  • Easy model building with Keras and eager execution.
  • Robust model deployment in production on any platform. - Powerful experimentation for research.
  • API simplification by reducing duplication and removing deprecated endpoints.
medium.com (or check the full changelog)
Github introduces the CodeSearchNet challenge
And also releases the data: a large dataset of functions with associated documentation written in Go, Java, JavaScript, PHP, Python, and Ruby from open source projects on GitHub. github.blog
Learning resources
Full Stack Deep Learning
Training the model is just one part of shipping a Deep Learning project. This course teaches the full stack of production Deep Learning. fullstackdeeplearning.com
Handling Imbalanced Datasets with SMOTE in Python
A step-by-step tutorial with Python code to deal with imbalanced datasets with SMOTE. kite.com
Productionizing machine learning: from deployment to drift detection
This post discusses how to detect and address model drift.
As they say, “Change is the only constant in life”. This also holds true for machine learning models, as over time they could deteriorate in their accuracy or in their predictive power, often referred to as model drift
A gentle introduction to Kubernetes
A guided tutorial to learn how to deploy Kubernetes services and Ambassador API gateway. Github repo included. medium.com
Mathematics for Machine Learning
Course notes by Garrett Thomas, a student at the Department of Electrical Engineering and Computer Sciences University of California, Berkeley for CS189.
Machine learning uses tools from a variety of mathematical fields. This document is an attempt to provide a summary of the mathematical background needed for an introductory class in machine learning, which at UC Berkeley is known as CS 189/289A.
Did you enjoy this issue?
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue