How Useful Will Google Now Be?

With Google announcing Google Now on Tap at Google I/O 2015 and Apple announcing Proactive at WWDC 2015, there is now a lot of discussion on how useful these predictive personal assistants will be. In particular, there is a lot of discussion on how much data these personal assistants will need to collect about you, and whether these assistants need to send this data to be analysed in the cloud.

The problem I have with these arguments is that they do not go into specifics. Instead of say “everything is going to be cool”, we should be having a detailed discussion of how each predictive recommendation is actually made, and whether each recommendation could be performed easily on your local device, or whether it needs to be done in the cloud.

Here, I would like to dig into a pretty good article comparing Apple’s approach and Google’s approach, and look at the examples given there.

Exhibit 1

For instance, if it were possible for Google Photos to figure out that I have a Tesla, and Tesla wanted to alert me to a recall, that would be a service that we would consider offering, with appropriate controls and disclosure to the user.

It’s hard for me to think that Tesla would not have your email address or that they wouldn’t be able to contact you through their dealer network. In fact, in many cases, I imagine that instead of contacting you directly, the recall information would preferably be sent through dealerships due to the complex relationships that they may have. In this case, the benefit gained in exchange for giving up your privacy is extremely trivial.

Exhibit 2

If you’re texting a friend about dinner, Google will give you restaurant reviews and directions automatically. In the future, it might make a reservation and call a driverless car.

The first step here is for the AI to understand that you are texting about dinner. The algorithm could look at keywords (like “dinner” or “eat”), the time, and maybe some other things. It should be pretty simple for the AI to understand that you are thinking about dinner. Next, it needs to give you some reviews which can easily be done through an anonymous connection to Yelp’s services. Reserving a car can also be done through an Uber app installed on your local device, without telling anything on the cloud that you are going to have dinner with a certain person. What I’m saying here is that in this example, there is no need to give each service any more information than is absolutely necessary. Nobody except your device needs to have a comprehensive understanding of who you are texting, when you are going to have dinner and where you are. Each cloud service just needs to know a small portion of this information.

The only place in this article where they detail what Apple can and cannot do is here.

Apple is giving you recommendations based on the phone in your pocket; Google is giving you recommendations based on everything you’ve done that it has recorded.

The assumption is that your phone will no know what you did on your Mac and that will degrade the service that Apple can provide. Well first, there is Bluetooth and WiFi. Apple could use Bluetooth/WiFi to sync your personal information on your Mac with your iPhone. It is easy for Apple to have your devices in sync without ever storing information in the cloud. Also Apple could even sync your information to the cloud in a encrypted format that would be very difficult to decipher. Therefore, the fact that Apple respects privacy does not mean that your information cannot be shared between your devices. This can easily be done.
Second, there is the question of whether any information that stays only on your PC is important at all. Your email, your calendar, your reminders are already synced between your Mac and your iPhone. There is very little relevant information that only stays on your PC.

Although I certainly need to dig into this in a bit more detail, I am skeptical that invading your privacy is essential for providing a better personal assistant service. I would welcome any examples where the personal assistant must absolutely send all knowledge of everything about you to servers in the cloud to be analysed.

  • Good topic, Naofumi, definitely agree there’s been a lot of general discussion on this but little specifically.

    I agree with your assessment of each example, and with your broader point that in general a “device-centric” approach should have access to enough individual data e.g. iOS should theoretically be able to have data whether an iPhone user views a photo, creates a calendar event, receives an email, goes to a location, downloads an app, etc.

    Potentially Apple would be blind to activities done within 3rd-party apps, e.g. videos watched within the YouTube app, photos seen in Facebook, messages sent on Line. It’s also possible that if those apps use mostly standard iOS APIs that Apple could glean some information anyway. I think it’s hard for us on the outside to know that level of detail. And similarly, while iOS would know if a user has calendar events on the native Calendar, or 3rd party apps that integrate to the native calendar e.g. Fantastical, if someone uses Google Calendar exclusively then iOS may not know much or as much

    By the same token, even on Android Google likely has the same issue. They likely have good data on Android users’ calendars, Google Play downloads, email, maps usage etc. but probably don’t have great data of activity within 3rd party apps. I also believe that other than Nexus devices, Google receives more limited data from most manufacturers’ handsets, although as they push more functionality into GMS perhaps that changes a bit.

    That said, I think here are some of the real issues:

    1) When Apple says intelligence based on on-device data, does that mean they are averse to gleaning any cross-device data, or that if you get a new iPhone that learning is lost?

    I don’t think we know this answer. You seem to lean towards Apple being open to doing cross-device data as long as it’s a secure approach, and that’s my felling also. (interestingly, I believe iCloud backups didn’t originally include Health data, but they do now, which may validate this theory)

    2) How is the ML trained?

    I believe this is the core issue that Ben Thompson, for one, is focused on, and I think he has a point. While having an individual user’s data to tell that one user “there’s a storm in the city your flight is departing from so we estimate 40% chance your flight will be delayed” is the end-point, the ML has to be developed and trained. Ben’s position for example is that when it comes to Photos, that while presumably iCloud Photos will have trillions of photos, Apple may choose not to train their ML on those photos.

    My view is similar to the above, we don’t know if that’s the case and there’s not enough indication that Apple would be so remiss to ignore all that rich training data. They do mention they anonymize your ID, throw away the placeholder ID after 3 months etc. but that could well be plenty or they may still keep enough of a representation for training purposes.

    3) Even if you allow parity on #2, will Google just have more or more types of data?

    Let’s look at a few key data types:

    Address Book, app installs, URLs opened, location, maps usage, calendar, in-app actions e.g. deep linking

    These seem likely to be roughly at parity. Probably more iOS users will have deep linking (given iOS9 will be on more handsets than Android L/M), while perhaps Google will get more calendar data because some iOS users may use the native GCal app and not set up their calendar through iOS’s gcal setup. But overall these all seem very close.

    Email

    Potentially an area where Google would have more data. Certainly more people use Gmail than iCloud mail. And while you can set up iOS Mail to work with your Gmail account, since Apple only has access through a narrow IMAP straw perhaps they’re limited. Even so, they can probably over time see the vast majority of your emails. So only users who install the Gmail app and don’t use Mail.app might be lost.

    Overall hard to say but probably advantage Google.

    Photos
    Clearly this is the one that set off the debate. In terms of raw volume of photos taken on mobile, Apple probably would have volume close to what Google Photos may have, as I’d guess that iPhone users perhaps take more photos on average. However, since Google Photos is free for unlimited storage space (with many caveats, but main point being most users could store e.g. 50GB of photos for free), that is likely to lead to more photos actually being stored vs. iCloud. On the one hand, iOS users not using iCloud Photo Library likely still have Photo Stream backing up their photos to iCloud, so if Apple’s terms allow they could do training with that data.

    Older/offline photos is probably a bigger difference, e.g. if I have a laptop with 10 years worth of old photos, I’m much less likely to pay Apple $100/yr. or whatever to store those in iCloud.

    So I think Google likely gets the advantage here for training data. That said, while Apple surprisingly did not adjust iCloud storage pricing at WWDC, it seem likely they will do so soon and certainly within 2-3 years I’d expect storage to be virtually free and unlimited even for iCloud Photo Library

    Speech

    Last I had heard, Google does much speech recognition on-device while Siri was still going up to the cloud, part of why Google’s speech rec is (or was) faster. So this is actually an area where theoretically Apple gets more data, though I expect Google also would store and send speech to the cloud.

    Another aspect is that on Android enabling ‘Ok Google’ can do continuous background listening. So that is one area where Google may get a lot more data as I don’t expect Apple to be running the microphone continuously anytime soon as that seems pretty invasive.

    Other data types?

    Overall it seems that while Google may have a bit more data in some cases, things are fairly even so it is mainly a question of how willing Apple is to use data they have access to and how smartly they can use it for training while preserving some clear privacy or anonymity.

    • I agree that having more data is generally better. However, I’m sure there is a law of diminishing returns in play. Also the quality of data is probably very important.

      Which leads to the question. Will you be more likely to feed quality personal information to your phone if you know that it will not be shared with anybody including Apple? If you know that Google is going to use that info to show you Ads (which anybody might accidentally see), will you be less likely to put, for example, your blood glucose readings into your phone?

      I people are comfortable putting their blood glucose into their iPhones but not on their Androids, which will end up being more useful as an assistant?

      I once had a laugh when a sales rep showed me his Facebook page, which had an Ad that betrayed his precise age. It’s a good thing that it wasn’t an ad for medication.

  • As far as Tim Cook’s speech, while again all the analysis has been pretty vague, I believe what he’s essentially alluding to is e.g.

    – do you trust Google Photos when it’s likely Google will see and be able to fully parse photos of your legal documents e.g. divorce/tax/etc. agreements?

    – since end-to-end encryption is a nonstarter for Google (else they couldn’t gather data from you), not to mention their cozy relationship with the White House including two top US cabinet posts occupied by Xooglers, do you trust that the NSA isn’t more likely to get your info from Google than from Apple?

    – while it’s true that Google’s server apps like Gmail seem fairly secure, arguably (though debatable) more so than Apple’s server apps, other endpoints like Android and Google+ clearly are much less secure, so do you trust that others won’t gain access to your data through those holes?

    – will they try some ‘experiments’ like showing you ads with generated faces that subtly resemble your baby or your mother to increase response rate?

    – given Google’s repeated offenses in this area (wifi collection, street view, Safari security circumvention, etc.) and grand ambitions (e.g. as of couple of years ago they literally manufacture killer robots for the US government), can you really know what else they may do with your data?

    – a bit farther out, since it seems clear that the real goal Larry and Sergey have is building generalized AI by gathering as big a data set as possible, what will be the ramifications (economic, jobs-wise, social, political etc.) if/when they achieve that?

    Of course very few people know about these topics, and even super smart ones that surely do like @monkbent and @praxtime don’t seem to consider them, so ultimately I think they’re both right that consumers won’t hear about or remember or care enough about any of the above for it to matter, perhaps unless there is some specific major incident/breach.

    • Public awareness of privacy issues is something that I struggle with.

      I get the argument that few people know about it, and the people who do don’t care. On the other hand, our Japanese schools are very sensitive about showing any pictures of our students on any material that might find its way out. Email phishing is getting more sophisticated each day, targeting users based on leaked emails and associated information.

      It looks to me like a very unstable situation. I wouldn’t be surprised if the public opinion suddenly swayed to the extreme. In fact, I would even dare to say that public opinion is bound to change; we just don’t know when.

      Long-term, like 10-30 years out, I wouldn’t bet on privacy continuing to be a non-issue. Anything could happen, including a cyber-war between powerful nations, which will certainly change our perceptions.

  • Pingback: Start up: Lightning at Twitter, academic publishers strangle libraries, that iOS/OSX hack explained, and more | The Overspill: when there's more that I want to say()