Google’s Justification

At Google I/O 2016, CEO Sundar Pichai showed a future filled with their artificial intelligence (AI).   It is all very interesting, but I do have some questions.

How much data does Google’s AI need?

Google’s AI is backed by enormous amounts of data about us. Data that is collected from photos that are publicly posted onto the Internet, and from photos that we upload onto the Google Cloud services from our mobile phones. Data from our messages on Gmail or events on Google Calendar. Data from the GPSs on our Android phones which tell Google where we are every hour of the day. Data from our browsers which tell Google (often without us knowing it), which website we have been visiting. No other company has access to similar amounts of private information.

However, what has not been answered is how much data Google’s AI actually needs.

Can effective AI be created without too much data?

A recent article by Steve Kovach on Apple’s next generation AI system is very interesting.

Siri brings in 1 billion queries per week from users to help it get better. But VocalIQ was able to learn with just a few thousand queries and still beat Siri.

This suggests that it is possible to construct a advanced AI system with magnitudes smaller data sets; data sets that do not have to be aggregates of private user information, but can simply be collated from a relatively small number of people who were paid for the work.

Of course we need to see the results to be sure. At the same time, I find it interesting that IBM Watson was able to win Jeopardy without tapping into huge data sets like those that Google uses.

Does an intelligent assistant mean you have to give up your privacy?

Apple tries hard not to see your private data. Apple believes that your private data belongs to you only, and that you should be the only one who holds the keys. Many people have questioned this approach, based on the assumption that widespread access to private information from millions of people on the server level is the only way to create a sufficiently good AI system.

Apple’s approach does not preclude the storage and analysis of personal data, as long as it happens in a way that Apple itself cannot see. One way to do this is to handle analysis on the smartphone. This is what the NSDataDetector class in the Mac/iOS API does. It’s actually pretty neat, and Apple has a patent on it. Similar but more advanced approaches could easily be implemented in iOS, given the performance of today’s CPUs.

The question is, is this approach sufficient? Will analysing your private data on your device always be much less powerful than analysing it on the server? Furthermore, will there be a significant benefit in collating the private data from strangers to analyse your own? If so, then Google’s approach (which sacrifices your privacy) will remain significantly superior. If not, then Apple’s approach will suffice. That is, you will not necessarily have to give up your privacy to benefit from intelligent assistants.

Does Google need the data for other purposes?

Let us assume that there existed a technology that allowed you to create an effective intelligent assistant, but that did not require that you give up your personal data. Would Google still collect your personal data?

The answer to this question is quite obviously YES. Google ultimately needs your private information for ad targeting purposes.

Could Google be using the big data/AI argument to justify the collection of huge amounts of private data for ad targeting purposes? I think, very possibly YES.

  • obarthelemy

    I think you’re vastly over-estimating how uninterested Apple is in its users’ data. Here’s a breakdown: http://motherboard.vice.com/blog/what-apple-does-and-doesnt-know-about-you . There have been other hints it does care a lot: http://www.cbsnews.com/news/apple-activates-the-reality-distortion-field-the-iphone-isnt-tracking-you-really-update/

    Their main line is “we collect all we can, but we anonymize some of it”. Well how much is “some” ? Apple has full access to iCloud and location and browsing and app usage and keyboard data. Also, “anonymous” data only remains “anonymous” as long as nobody links it to a user, which is typically trivial to do. They have the data, they even make specific efforts to collect it. Then they supposedly anonymize some of it and/or don’t use it. Color me dubious: why strive to collect it in the first place ?

    Since this is at the mercy of PR embellishment, any tech glitch, internal/external enforcement lapse, and policy change, I would put Apple’s privacy in the exact same category as Google’s They get whatever they can, they use it to improve their service to you… and their revenues from you. The idfference seems to be they’re mildly trying to anonymize a bit of it. Policy subject to change and w/o warranty.

    • Thanks for the links. They are very informative, although they aren’t actually news to anybody who understands Apple’s attitude and current status towards privacy.

      As for underestimating Apple’s interest in users’ data, there is fine distinction. The distinction is so subtle and nuanced that one can easily ignore it or skim over it. However if you focus on Apple’s intention (or lack of it), then I find it much easier to understand.

      Yes, iPhones collect data and on the iPhones, they link it to the user. However, Apple does not (or at least, tries hard not to). In one of your articles, it mentions how the iPhone is tracking your data (which is hardly a surprise because Apple has a section in the “privacy” settings that shows you where you’ve been). However, the article fails to shed light on the fact that the data is only accessible from iTunes backups (on the Mac) which are unencrypted (a legacy setting). Most users nowadays would use an iCloud backup (which is encrypted). Even on iTunes, they would mostly turn on encryption (in fact, without encryption, iTunes backups decline to backup your health data).

      Therefore, I would say that it is true to say *iPhones* collect data but false to say that *Apple* collects (or intends to collect) data. Whatever data that Apple collects will be anonymous and/or encrypted (at least I believe that is the plan). The all important links connecting the dots will reside on your iPhone, but not at Apple.

      Technically, making sure that Apple does not see the data is harder than storing everything on the servers in a way that Apple can see. It tends to make the experience and seamlessness of their services worse. Because of this, before privacy was a thing at Apple, they were much more relaxed; hence the legacy non-encryption setting for iTunes backups. Apple is still in the process of making sure that they do not see identifiable customer data, and that is why there are still glitches that people can point to and accuse Apple of hypocrisy.

      I hope what I’m saying makes it clear that any holes in Apple’s services that still exist today regarding privacy, are not necessarily indicative of Apple’s attitude. Privacy and full encryption are still works in progress. Therefore, it is not helpful to find holes and accuse Apple of being essentially the same as Google.

      This is what I mean about it being difficult to understand the subtle nuances in Apple’s position. This is why I steer away from the technical discussion about each Apple service, and instead try to look at the intentions of Apple, Google and other industry players going forward. The current state of technology is probably not a good predictor here. I believe understanding intentions and business models will be much more useful.

      • obarthelemy

        The iCloud encryption is purely PR, Apple can and does decrypt it at will. Basically, it just a passcode Apple asks for when restoring the backup, and Apple knows that passcode. The there’s really no difference with an unencrypted backup, only Apple’s assurances that they won’t look at the data, or only when asked, or only anonymously, or only when it’s good for us, and sometimes when it’s good for them but not often.

        That’s what bothers me: it’s mostly talk, and light on facts. I’ve never seen a serious comparison between Apple, Google, Facebook and MS of
        1- which data is technically collected (my guess: they all collect the same: location, browsing, usage, searches incl. voice, typing…)
        2- which data is officially being used by the first party for services (assistants…) and ads
        3- which data is sold/shared with others
        4- a recap of the frequency and gravity of leaks/breaches.

        The whole discussion is shaped by PR, with Google being rather transparent and arrogant about grabbing and using everything, but Apple being very fishy in presenting themselves as privacy advocates when in fact they seem to collect quite a bit of data and use PR-encryption that’s not actually encryption, hoping users won’t notice (and mostly, they’re right…), that deviousness makes taking their word for it a bit naive.

        • Full encryption with even Apple not knowing the keys is not simple. You have to balance this with a situation where the customer might have forgotten their password entirely. Maximising privacy will tend to make the service less convenient for users, and it takes technology and some innovation to make privacy less painful.

          Apple has not yet found the golden formula, but at least they are trying hard.

          http://mobile.eweek.com/security/apple-to-hand-icloud-encryption-key-management-to-account-holders.html

          I understand why you would think it naive to take their word for their privacy stance. However, given the current technical limitations, I think it is better and more predictive than the alternative of focusing on their current holes.

          • obarthelemy

            Full encryption was “solved” over 20 years ago, see PGP https://en.wikipedia.org/wiki/Pretty_Good_Privacy. There’s no technical reason for not having strong encryption everywhere, only business reasons. Including for Apple.

            And same as Apple is loudly advertising its on-device encryption while whitewashing its backups’ fake encryption, focusing on user data encryption is myopic: what about data collection, what about non-user-data (location history, searches, Siri queries…), what about metadata; how safe (technically and from internal/external politics is “anonymized” data which a single database query or a bit of cross-referencing can de-anonymize ? … Until we see a careful and audited analysis by an expert, all of this must be considered (and has been proven to be !) PR.

          • Encryption technology per se is not the issue in this case. It is, who owns the keys? What happens if the keys are lost?

            Apple does hand the keys to the end user for keychain password encryption, with a warning that users will not get their data back if the lose the keys. They haven’t done that for the rest of iCloud, presumably because people DO lose their keys quite a lot.

          • One more point that I want to add. I don’t really care whether or not Apple’s attitude is PR or not. Apple has to make money too, so using privacy as a PR tool is totally fair game.

            The issue is whether or not Apple is really serious about privacy, willing to invest in the necessary technologies, and even sometimes open to worsening the user experience because of this.

            Whether they are doing this to protect human rights, or whether they are doing this for money, isn’t the point. The only issue for me is whether they are genuinely commited or not.

          • obarthelemy

            Agreed, what matters is not the “why”, which is the same for every company: profits. Management that doesn’t go for profits will/should be fired, and analysts/pundits that think it’s about anything but profits are at best naive at worst idiots.

            What matters is the “what”. Privacy at Apple is clearly a PR talking point, so it cannot be taken at face value, which it is. A lot of people are unaware that Apple’s stance on local phone data evaporates as soon as that data is backed up, only covers *some* of that local data, and is designed to hide a whole lot of unnecessary data collection, analysis, and sharing that does go on.

          • As I think I have mentioned before, the fact that Apple holds the encryption keys for iCloud backups is most likely due to people losing their passwords. For the most sensitive data like KeyChain passwords, Apple does not hold the keys but it makes you very, very aware that Apple can no longer help you if you lose them.

            In a broad sense, I consider Apple holding the encryption keys to be due to technical limitations. If Apple can find a way that they can let go of the encryption keys, still allow access to the user in case of password loss, and thwart the FBI, I believe they will do just that. I don’t think that it is a good idea to accuse Apple of hypocrisy, when it could easily be due to technical inability.

          • obarthelemy

            Well, we’re talking about the “holding it wrong” magic company that re-activated beacon tracking during an OS update and badmouthed phablets and pens until they made those, so hypocrisy is far from out of character.

            It could be due to technical inability indeed, but more likely, to UX and PR issues. 2FA and local backdoors are fairly standard, but they both detract from the overall message and complicate use. I’m not taking Apple’s PR word for it, and I have seen none other.

            And the focus on encryption is probably a red herring to distract from tracking and intrusions. That’s the bulk of the collected data, it’s not the same data as the encrypted one, and I haven’t seen a recap of what’s happening on the iOS side either. But gosh are those passwords strongly encrypted ^^