October 19, 2017
Human: What do we want?
Computer: Natural language processing!
Human: When do we want it?
Computer: When do we want what?
Artificial intelligence, especially machine learning, will overhaul big industries, including manufacturing, finance, and healthcare, potentially adding up to $126 billion to the US economy by 2025. June 2017 McKinsey report emphasizes that digital-native firms such as Google and Baidu are betting vast amounts of money on AI – between $20 billion and $30 billion in 2016, including significant M&A activity. Baidu is in partnership with the government to build the country’s first national AI research laboratory in Beijing. The company increased its R&D spending to $464 million from April to July this year, 28% more than in the same period a year ago, according to FT.
Unlike humans, aging has a positive effect on AI. McKinsey concludes that advances in the speed of GPUs have enabled the training speed of deep learning systems to improve five or six times over in each of the last two years.
More data – the world creates about 2.2 exabytes of it every day – translates into more insights and higher accuracy because it exposes algorithms to more examples they can use to identify correct and reject incorrect answers. Machine learning systems enabled by these torrents of data have reduced computer error rates in some applications – for example, in image identification – to about the same as the rate for humans.
AML does not stand for anti-money laundering; it stands for automated machine learning.
As Sebastian Raschka, a Ph.D. student at Michigan State University, describes, if computer programming is about automation, and machine learning is all about automating automation, automated machine learning is the automation of automating automation.
Programming relieves us by managing rote tasks; machine learning allows computers to learn how to best perform these rote tasks; automated machine learning allows for computers to learn how to optimize the outcome of learning how to perform these rote actions.
This is a very powerful idea; while we previously have had to worry about tuning parameters and hyperparameters, automated machine learning systems can learn the best way to tune these for optimal outcomes by a number of different possible methods.
While not new, the idea finds application in a range of high-profile projects. In May 2017, Google unveiled a new approach to machine learning where neural networks are used to build better neural networks. Google designed its AutoML project to be an artificial intelligence that could help humans create other, more powerful, efficient systems than human engineers can. Google research suggests that AutoML might be smarter at recognizing the best approaches to solving a problem than the human experts.
Tom Simonite, MIT Technology Review San Francisco bureau chief, shares that in one experiment, researchers at the Google Brain artificial intelligence research group had software design a machine-learning system to take a test used to benchmark software that processes language. What it came up with surpassed previously published results from software designed by humans.
In 2002, the idea of getting machines to effectively teach machines resonated with Jeremy Achin, data scientist and the CEO of DataRobot. Achin saw an opportunity to automate machine learning while taking part in data-science competitions on the crowdsourcing platform Kaggle (acquired by Google), which offered prizes for the algorithm that performs best at making a specific prediction from a large data set. One of the best early Kaggle contestants, Achin, realized he was already automating a lot of the steps involved in each competition.
I thought that if we collected enough data sets, enough problems, and ran enough experiments, we could do machine learning on machine learning. That was the original idea, Achin shared with Will Knight, MIT Technology Review.
Airbnb was an unexpected find in research on automated machine learning applications/projects.
We have found AML tools to be most useful for regression and classification problems involving tabular datasets; however, the state of this area is quickly advancing. In summary, we believe that in certain cases AML can vastly increase a data scientist’s productivity, often by an order of magnitude, says Hamel Husain, a Data Scientist at Airbnb.
Husain shares that the company has leveraged AML for following tasks:
Unbiased presentation of challenger models: AML can quickly present a plethora of challenger models using the same training set as your incumbent model. This can aid the data scientist in choosing the best model family.
Detecting Target Leakage: Because AML builds candidate models extremely fast in an automated way, we can detect data leakage earlier in the modeling life cycle.
Diagnostics: As mentioned earlier, canonical diagnostics can be automatically generated such as learning curves, partial dependence plots, feature importance, etc.
Tasks like exploratory data analysis, pre-processing of data, hyper-parameter tuning, model selection and putting models into production can be automated to some extent with an automated machine learning framework.
Whether there is one AI expert or million, talent shortage will inevitably stagnate AI capabilities and adoption. AI proliferation without human factor constraint would mean systems being able to produce other, more efficient and powerful systems.
The goal is to make this technology more accessible, says John Giannandrea, SVP of Engineering at Google who leads Google’s AI efforts. So anybody could say ‘build me a predictive model’ and it goes off and does it.
If self-starting AI techniques become practical, they could increase the pace at which machine-learning software is implemented across the economy. Companies must currently pay a premium for machine-learning experts, who are in short supply, emphasizes Simonite.