|
|
October 17 · Issue #28 · View online
Curated essays about the future of Data Science. Production Data Science and learning resources for continuous learning. Covers Data Science, Data Engineering, MLOps & DataOps. Curated by people at https://nibble.ai/
|
|
Hello everyone, There is a recurring theme in this weekâs edition: the limits of deep neural networks. With so much hype around deep learning, itâs good to take a step back and see both their strengths and weaknesses. If thatâs something youâre interested in, all these resources are tagged with a âď¸. On a side note, Iâm thrilled to receive such great support from all of you. You can help us push this bigger. If you like what you read, forward this issue to a friend so that they can enjoy it as well.
Cheers! Florent
|
|
Why Deep Learning models are so easy to fool (Illustration by Edgar BÄ
k)
|
âď¸ Why Deep Learning models are so easy to fool
These problems are more concerning than idiosyncratic quirks in a not-quite-perfect technology, [âŚ] the most striking illustration that DNNs are fundamentally brittle: brilliant at what they do until, taken into unfamiliar territory, they break in unpredictable ways. Albeit the fact that deep neural networks are extremely good at finding correlations/patterns in massive amounts of data, weâre discovering theyâre far from perfect and suffer serious flaws.
|
âď¸ Yoshua Bengio says deep learning needs to be fixed
I think it would be a good thing if thereâs a correction in the business world because thatâs where the hype is. While some people have been advocating for the importance of causal reasoning in building AI systems for quite a while (Judea Pearl and Gary Marcus among others), the fact that Yoshua Bengio, a Turing award winner for his contribution to the development of deep learning, is now advocating for similar ideas is a big step in moving beyond the hype.
tl;dr. Deep learning is good at finding patterns in reams of data, but canât explain how theyâre connected. Turing Award winner Yoshua Bengio wants to change that.
|
The State of Machine Learning Frameworks in 2019
In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. Â
tl;dr. PyTorch lies ahead in research while TensorFlow is still dominant in industry. Meanwhile, some new frameworks are emerging, like Jax, that natively supports forward-mode auto-differentiation, useful for quantizing/distilling models.
|
How to Manage Machine Learning Products
Compared to software engineering, ML is still in its infancy and therefore lacks industry standards, metrics, infrastructure, and tools. An overview of the challenges in managing ML Products from a Product Manager working for a startup that builds machine learning software for robotic vision and control.
tl;dr. Machine Learning is a new field that keeps evolving, deploying ML is a highly iterative process driven by experimentation that faces structural challenges beyond purely technical ones, including lack of buy-ins from stakeholders because of explainability and interpretability issues.
|
Considerations for producing the best AI and Machine Learning models
The actual best way to decide whether or not a model is good, is to have the business interest directly plugged in into that decision. A short O'Reilly interview with Jed Dougherty (Dataiku) about the things that are often overlooked when building Machine Learning systems.
|
|
Best practices for data modeling
The data in your data warehouse are only valuable if they are actually used. To make your data usable, you need to consider how the data are presented to end-users and how quickly users can answer their questions. A set of guidelines to help you build better data models that are maintainable, useful and performant: how to contextualize given your constraints, design considerations like grain, naming, materialization, permissioning and governance.
|
âď¸ Toward a Hybrid of Deep Learning and Symbolic AI
Just because you can build a better ladder, doesnât mean you can build a ladder to the moon. A conversation with Gary Marcus in the Artificial Intelligence podcast discussing the limits of deep learning.
|
âď¸ What BERT is Not
Allyson Ettinger, a computational linguist, joins the Data Skeptic podcast to discuss her work in computational linguistics, specifically in exploring some of the ways in which the popular natural language processing approach BERT has limitations.
tl;dr. Although displaying impressive performance, BERT also suffers from surprising weaknesses, from extreme vulnerability to adversarial inputs to exploiting flaws in the benchmark or completely failing to show a generalizable understaning of the meaning of negation within the context of a sensitivity text.
|
|
Facebook Debuts PyTorch 1.3 With PyTorch Mobile, Quantization, TPU Support and More
Includes PyTorch Mobile, Quantization, TPU Support: some of its biggest weaknesses until now.
|
Cool New Features in Python 3.8
Python 3.8 is out and people at Real Python explored the biggest changes to show you how you can take advantage of Python 3.8.
- Assignment expression with the walrus operator
- Positional-Only arguments
- More precise types for type annotation
- Simpler debugging with f-strings
- More cool features and some details about Pythonâs new governance
Assignment operation was a very contentious proposal, that decided Guido Van Rossum to retire as the BDFL for Python. What do you think of these new changes?
|
|
An Introduction to the Bootstrap Method
A well-documented introduction to the Bootstrap resampling method, followed by the motivation when it was introduced by Bradley Efron (1979). With Python code and maths equations, based on some good resources (with links provided in the article).
|
Data Structures Easy to Advanced Course
A full, 8 hours long đ˛ video tutorial from a Google Engineer that teaches data structures to beginners using high-quality animations to represent the data structures visually. You will learn how to code various data structures together with simple to follow step-by-step instructions. Every data structure presented will be accompanied by some working source code (in Java) to solidify your understanding.
|
Analyzing Your MLflow Data with DataFrames
Learn more about the two new APIs that give you direct access to your experiment data, making it possible to analyze your MLflow data via a pandas or Spark DataFrame.
|
|
Call for speakers in Paris đŤđˇ Weâre looking for speakers for community events in Paris to share good practices about operationalizing data science. If youâre working on improving the lifecycle of data science project within your organization and want to share your experience, reply to this email so we can set something up. â Have a great week! Florent
|
Did you enjoy this issue?
|
|
|
|
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
|