Hi everyone, Here are the cherry-picks for this edition: an investigation about OpenAI following the
|
|
October 31 · Issue #30 · View online
Curated essays about the future of Data Science. Production Data Science and learning resources for continuous learning. Covers Data Science, Data Engineering, MLOps & DataOps. Curated by people at https://nibble.ai/
|
|
Hi everyone, Here are the cherry-picks for this edition: an investigation about OpenAI following the recent controversy around their announcement of the “Rubik’s cube solving”, the risks of AutoML and how to avoid them by the Harvard Business Review, a new release of Hyperopt with support for distributed tuning with Apache Spark, the launch of a data streaming nanodegree by Udacity and summaries of some very interesting research papers and more…
|
|
Would you spend $1 billion for a Rubik's cube?
|
I spent $1 billion and all I got was a Rubik's cube
Following the recent controversy around the announcement of OpenAI’s “Solving the Rubik’s cube”, Vicki started investigating to figure out for herself whether the cube was hype or not. Doing so she got sidetracked and looked deeply into OpenAI. “Anyone who believed that Elon Musk and Sam Altman were forming something for the good of mankind with a non-profit were fooling themselves.”
|
The Risks of AutoML and How to Avoid Them
As machine learning becomes more automated, human judgment still matters. In some respects, advancements in machine learning have given us a false sense that large volumes of data coupled with machine learning magic are all that’s needed — the variety and complementarity of the data matters, too.
|
Glue Work in Analytics
Glue work is valuable - without it, projects fall apart - tasks get dropped, teams miscommunicate, and it is just harder to get things done. However, there is a difference between valuable and valued. If glue work makes the team function more smoothly and effectively, but it is not valued, then that is a problem. […] in analytics we’re in the midst of a period of transition where the dominant tools, processes, and expectations are shifting. We’re adopting some of the best practices of engineering, but it’s a work in progress.
|
Military artificial intelligence can be easily and dangerously fooled
AI warfare is beginning to dominate military strategy in the US and China, but is the technology ready?
|
|
How notebooks are changing the way we develop code
An overview of the ecosystem and tools shaping the future of notebooks.
|
|
Your Data Analysts Need These 4 Qualities For Optimal Success
Many folks who find themselves in analytics as a profession have somewhat contrarian instincts. They’re not fans of following the herd, and they’re often convinced the herd is going to its doom.
|
|
Hyperopt 0.2.1 includes distributed tuning via Apache Spark
Hyperopt is an hyperparameter optimization library in Python. This new release now includes a SparkTrials class that allow you to scale out a hyperparameter tuning across a Spark cluster. This obviously covers random search, but the Tree of Parzen Estimators (TPE) is also included, which is no easy feat considering TPE is a Sequential Model-Based Optimization. Looking forward to give it a try.
|
|
Udacity launches Data Streaming Nanodegree
Udacity just launched a new nanodegree to learn the fundamentals of stream processing, including how to work with the Apache Kafka ecosystem, data schemas, ApacheAvro, Kafka Connect and REST proxy, KSQL, and Faust Stream Processing.
|
Stop explaining black-box machine learning models for high stakes decisions and use interpretable models instead
Another great summary by the Morning Paper with two main takeaways. A sharpening of your understanding of the difference between explainability and interpretability, and why the former may be problematic. Let us stop calling approximations to black box model predictions explanations. For a model that does not use race explicitly, an automated explanation “This model predicts you will be arrested because you are black” is not a model of what the model is actually doing, and would be confusing to a judge, lawyer or defendant. And some great pointers to techniques for creating truly interpretable models. The belief that there is always a trade-off between accuracy and interpretability has led many researchers to forgo the attempt to produce an interpretable model.
|
Exponentially Growing Learning Rate for Deep Learning?
A nice overview of a rather technical paper. This short post explores the implications of a recent research paper about using an exponentially growing learning rate for training a neural network. You might be wondering how is this not going to blow up the weights of the net: it all comes down to the Scale Invariance induced by Batch Normalization.
|
Top three mistakes with K-Means Clustering during data analysis
This post explores a few cases where K-Means Clustering does not perform well or may produce unintuitive results: wrong guess on the number of clusters, highly dimensional feature space and clusters in strange or irregular shapes.
|
|
If you liked this edition, it’s quite likely your colleagues would enjoy it too! You can simply forward this email or Don’t forget to share this so they can enjoy it. On a side note, we’re organizing events to connect data professionals in Paris. If you’re reading this, we probably have a lot to talk about and we’d love for you to join us. Next networking event is scheduled for wednesday 13th, november, it’s free but requires an invitation: if you’re interested or want to know more, shoot me an email at [email protected]. Cheers! Florent
|
Did you enjoy this issue?
|
|
|
|
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
|