Probably one of the most important takeaways it’s the fact that ML includes a lot or uncertainties in Product Roadmaps. This kind of uncertainties can be related with model performance, data sources availabilities, tools, processing time, modeling issues and so on. This is highlighted in the following slide:
The point is that more the Product Management tries to enforce some Software Development patterns (that has more deterministic nature) in Machine Learning (that has more probabilistic approach, we will face more and more failed projects in ML.
Early warning: Danial Tse, a researcher at Google, developed an algorithm that beat a number of trained radiologists in testing. Tse and colleagues trained a deep-learning algorithm to detect malignant lung nodules in more than 42,000 CT scans. The resulting algorithms turned up 11% fewer false positives and 5% fewer false negatives than their human counterparts. The work is described in a paper published in the journal Nature today.
That reminds me of a lot of haterism, defensiveness, confirmation bias and especially a lack of understanding of technology and their potentials to help people worldwide. I’ll not cite most of this here but you can check in my Twitter @flavioclesio.
Some people from academic circles, especially from Statistics and Epidemiology, started in several different ways bashing the automation of statistical methods (Machine Learning) using a lot of questionable methods to assess ML even using one of the worst systematic reviews in history to create a false dichotomy between the Stats and ML researchers.
Most of the time that kind of criticism without a consistent argumentation around the central point sounds more like pedantism where these people say to us in a subliminal way: “- Hey look those nerds, they do not know what they are doing. Trust use <<Classical Methods Professors>>, We have <<Number of Papers>> in that field and those folks are only coders that don’t have all the training that we have.“
This situation’s so common that In April I needed to enter in a thread with Frank Harrell to discuss that an awful/pointless Systematic Review should not be used to create that kind of point less dichotomy in that thread:
My point it’s: Statistics, Machine Learning, Artificial Intelligence, Python, R, and so on are tools and should be and should be treated as such.
I invite all my 5 readers to exercise the following paradigm shift: Instead to think
“This AI in Health will take Doctors out of their jobs?”
let’s change the question to
“Hey, you’re telling me that using this very easy to implement free software with commodity CPU power can we democratize health exams for the less favored people together with the Doctors?“
About the relative certification from Peer Review process provided by conferences:
So my first suggestion is this: change from a relative metric to a standalone evaluation. Conferences should accept or reject each paper by some fixed criteria, regardless of how many papers get submitted that year. If there end up being too many papers to physically fit in the venue, select a subset of accepted papers, at random, to invite. This mitigates one major source of randomness from the certification process: the quality of the other papers in any given submission pool.
And the most important piece it’s about the create a rejection board to disincentivize low-quality submissions:
This means that if you submit to NeurIPS and they give you an F (rejection), it’s a matter of public record. The paper won’t be released, and you can resubmit that work elsewhere, but the failure will always live on. (Ideally we’ll develop community norms around academic integrity that mandate including a section on your CV to report your failures. But if not, we can at least make it easy for potential employers to find that information.) Why would this be beneficial? Well, it should be immediately obvious that this will directly disincentivize people from submitting half-done work. Each submission will have to be hyper-polished to the best it can possibly be before being submitted. It seems impossible that the number of papers polished to this level will be anywhere close to the number of submissions that we see at major conferences today. Those who choose to repeatedly submit poor-quality work anyways will have their CVs marred with a string of Fs, cancelling out any certification benefits they had hoped to achieve.
I personally bet € 100 that if any conference adopt this mechanism, at least 98% of all of these planting-flag papers will be vanished forever.
We define reproducibility as the ability to recompute data analytic results given an observed dataset and knowledge of the data analysis pipeline. The replicability of a study is the chance that an independent experiment targeting the same scientific question will produce a consistent result (1). Concerns among scientists about both have gained significant traction recently due in part to a statistical argument that suggested most published scientific results may be false positives (2). At the same time, there have been some very public failings of reproducibility across a range of disciplines from cancer genomics (3) to economics (4), and the data for many publications have not been made publicly available, raising doubts about the quality of data analyses. Popular press articles have raised questions about the reproducibility of all scientific research (5), and the US Congress has convened hearings focused on the transparency of scientific research (6). The result is that much of the scientific enterprise has been called into question, putting funding and hard won scientific truths at risk.
So far so good. But the problem it is about the following sentence:
Unfortunately, the mere reproducibility of computational results is insufficient to address the replication crisis because even a reproducible analysis can suffer from many problems—confounding from omitted variables, poor study design, missing data—that threaten the validity and useful interpretation of the results.
If we think that using enforce replication/reproduction patterns in any experiments will prevent/vanish any methodological problems, this assumption it’s not only wrong but naive for the lack of a better word.
The point about the replication/reproducibility it’s a matter to put a higher standard in science where we can ensure that: 1) All the process follow some methodology that explains how some solution transformed until the final result, 2) implies that with that we have a better chance to remove any bias (e.g. cognitive, publication, systematic, etc. ), and 3) if the methodology it’s wrong this methodology can be verified, checked and fixed for the entire scientific society.
We should use more inductive biases, but we have to work out what are the most suitable ways to integrate them into neural architectures such that they really lead to expected improvements. We have to enhance pattern-matching state-of-the-art models with some notion of human-like common sense that will enable them to capture the higher-order relationships among facts, entities, events or activities. But mining common sense is challenging, so we are in need of new, creative ways of extracting common sense. Finally, we should deal with unseen distributions and unseen tasks, otherwise “any expressive model with enough data will do the job.” Obviously, training such models is harder and results will not immediately be impressive. As researchers we have to be bold with developing such models, and as reviewers we should not penalize work that tries to do so. This discussion within the field of NLP reflects a larger trend within AI in general—reflection on the flaws and strengths of deep learning. Yuille and Liu wrote an opinion titled Deep Nets: What have they ever done for Vision? in the context of vision, and Gary Marcus has long championed using approachesbeyonddeeplearning for AI in general. It is a healthy sign that AI researchers are very much clear eyed about the limitations of deep learning, and working to address them.
Today we have at maximum a good way to perform word frequencies and some structural language analysis using NLP (in practical terms). For the rest, we’re far away from the human capacity.
If a good exploratory technique gives you more data, then maybe good exploratory data analysis gives you more questions, or better questions. More refined, more focused, and with a sharper point. The benefit of developing a sharper question is that it has a greater potential to provide discriminating information. With a vague question, the best you can hope for is a vague answer that may not lead to any useful decisions. Exploratory data analysis (or maybe just data analysis) gives you the tools that let the data guide you towards a better question.
Several different methods can be found and most of the papers have the code of the implementation that can helps to reproduce the results and most important: The code can be forked and used in a customizable fashion for practical applications.
This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.
It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.