At Google I/O 2016, CEO Sundar Pichai showed a future filled with their artificial intelligence (AI). It is all very interesting, but I do have some questions.
How much data does Google’s AI need?
Google’s AI is backed by enormous amounts of data about us. Data that is collected from photos that are publicly posted onto the Internet, and from photos that we upload onto the Google Cloud services from our mobile phones. Data from our messages on Gmail or events on Google Calendar. Data from the GPSs on our Android phones which tell Google where we are every hour of the day. Data from our browsers which tell Google (often without us knowing it), which website we have been visiting. No other company has access to similar amounts of private information.
However, what has not been answered is how much data Google’s AI actually needs.
Can effective AI be created without too much data?
A recent article by Steve Kovach on Apple’s next generation AI system is very interesting.
Siri brings in 1 billion queries per week from users to help it get better. But VocalIQ was able to learn with just a few thousand queries and still beat Siri.
This suggests that it is possible to construct a advanced AI system with magnitudes smaller data sets; data sets that do not have to be aggregates of private user information, but can simply be collated from a relatively small number of people who were paid for the work.
Of course we need to see the results to be sure. At the same time, I find it interesting that IBM Watson was able to win Jeopardy without tapping into huge data sets like those that Google uses.
Does an intelligent assistant mean you have to give up your privacy?
Apple tries hard not to see your private data. Apple believes that your private data belongs to you only, and that you should be the only one who holds the keys. Many people have questioned this approach, based on the assumption that widespread access to private information from millions of people on the server level is the only way to create a sufficiently good AI system.
Apple’s approach does not preclude the storage and analysis of personal data, as long as it happens in a way that Apple itself cannot see. One way to do this is to handle analysis on the smartphone. This is what the NSDataDetector class in the Mac/iOS API does. It’s actually pretty neat, and Apple has a patent on it. Similar but more advanced approaches could easily be implemented in iOS, given the performance of today’s CPUs.
The question is, is this approach sufficient? Will analysing your private data on your device always be much less powerful than analysing it on the server? Furthermore, will there be a significant benefit in collating the private data from strangers to analyse your own? If so, then Google’s approach (which sacrifices your privacy) will remain significantly superior. If not, then Apple’s approach will suffice. That is, you will not necessarily have to give up your privacy to benefit from intelligent assistants.
Does Google need the data for other purposes?
Let us assume that there existed a technology that allowed you to create an effective intelligent assistant, but that did not require that you give up your personal data. Would Google still collect your personal data?
The answer to this question is quite obviously YES. Google ultimately needs your private information for ad targeting purposes.
Could Google be using the big data/AI argument to justify the collection of huge amounts of private data for ad targeting purposes? I think, very possibly YES.