Google’s Justification

At Google I/O 2016, CEO Sundar Pichai showed a future filled with their artificial intelligence (AI).   It is all very interesting, but I do have some questions.

How much data does Google’s AI need?

Google’s AI is backed by enormous amounts of data about us. Data that is collected from photos that are publicly posted onto the Internet, and from photos that we upload onto the Google Cloud services from our mobile phones. Data from our messages on Gmail or events on Google Calendar. Data from the GPSs on our Android phones which tell Google where we are every hour of the day. Data from our browsers which tell Google (often without us knowing it), which website we have been visiting. No other company has access to similar amounts of private information.

However, what has not been answered is how much data Google’s AI actually needs.

Can effective AI be created without too much data?

A recent article by Steve Kovach on Apple’s next generation AI system is very interesting.

Siri brings in 1 billion queries per week from users to help it get better. But VocalIQ was able to learn with just a few thousand queries and still beat Siri.

This suggests that it is possible to construct a advanced AI system with magnitudes smaller data sets; data sets that do not have to be aggregates of private user information, but can simply be collated from a relatively small number of people who were paid for the work.

Of course we need to see the results to be sure. At the same time, I find it interesting that IBM Watson was able to win Jeopardy without tapping into huge data sets like those that Google uses.

Does an intelligent assistant mean you have to give up your privacy?

Apple tries hard not to see your private data. Apple believes that your private data belongs to you only, and that you should be the only one who holds the keys. Many people have questioned this approach, based on the assumption that widespread access to private information from millions of people on the server level is the only way to create a sufficiently good AI system.

Apple’s approach does not preclude the storage and analysis of personal data, as long as it happens in a way that Apple itself cannot see. One way to do this is to handle analysis on the smartphone. This is what the NSDataDetector class in the Mac/iOS API does. It’s actually pretty neat, and Apple has a patent on it. Similar but more advanced approaches could easily be implemented in iOS, given the performance of today’s CPUs.

The question is, is this approach sufficient? Will analysing your private data on your device always be much less powerful than analysing it on the server? Furthermore, will there be a significant benefit in collating the private data from strangers to analyse your own? If so, then Google’s approach (which sacrifices your privacy) will remain significantly superior. If not, then Apple’s approach will suffice. That is, you will not necessarily have to give up your privacy to benefit from intelligent assistants.

Does Google need the data for other purposes?

Let us assume that there existed a technology that allowed you to create an effective intelligent assistant, but that did not require that you give up your personal data. Would Google still collect your personal data?

The answer to this question is quite obviously YES. Google ultimately needs your private information for ad targeting purposes.

Could Google be using the big data/AI argument to justify the collection of huge amounts of private data for ad targeting purposes? I think, very possibly YES.

Clontech ProteoTuner System

細胞内のタンパク質の量を、タンパク質レベルで直接制御するキットがClontechより発売されました[日本語] [英語]。さすがClontechという感じのとても面白い製品です。

細胞内のタンパク質のレベルを制御する方法として、通常はmRNAの発現量を調節します。この代表的な製品がこれまたClontechのTet Systemですね。ただし、細胞内のタンパク質の量というのはmRNAの発現量だけでなく、mRNAの安定性、タンパク質の合成量、さらにタンパク質の安定性によって最終的に決定されるので、mRNAの発現量を変えても、タンパク質の量がほとんど変化しないこともあります。また変化したとしても、何時間もしてやっと変化することになるかも知れません。

今回のProteoTunerシステムは、細胞外から膜透過性の因子を添加することによって、細胞内のタンパク質の安定性を変化させて、直接タンパク質の量を上昇させるものです。mRNAの発現量を調節するよりもよっぽど直接的にタンパク質の量を制御できるので、いろいろな面白い実験ができそうです。一度合成されたタンパク質も安定性が低いので、因子を取り除いたらすぐに分解されるので、タンパク質レベルを短時間だけスパイクすることもでできます。

特に早い応答が重要な細胞周期などの研究に利用できそうです。