DALI Meeting + Views on Machine Learning and Artificial Intelligence

I just got back from the first DALI meeting, held in La Palma. I was a co-organiser with Zoubin Ghahramani, Thomas Hoffman and Bernhard Schoelkopf. The original vision was mainly driven by Bernhard, and the meeting is an attempt to recapture the spirit of some of the early NIPS conferences and the Snowbird meeting: a smaller meeting with some focus and a lot of informal debate. A schedule designed to encourage discussion and for people to engage across different fields and sub-fields.

The meeting was run as a day of workshops, followed by a day of plenary sessions and a further day of workshops. Zoubin organised the workshop schedule, and Thomas the plenary sessions. For the workshops we decided on topics and invited organisers who themselves invited the attendees, we heard about Probabilistic Programming, Networks and Causality, Deep Learning for Vision, Probabilistic Numerics and Statistical Learning Theory. We had plenaries from experts in machine learning as well as one by Metin Sitti on Mini/Micro/Nanorobotics. Thomas ended the plenary session with a panel discussion with Alex Graves, Ralf Herbrich, Yann LeCun, Bernhard Schoelkopf, Zoubin Ghahramani and myself, chaired by Thomas.

Thomas seeded the panel discussion by asking us to make three minute statements. He asked about several things, but the one that caught my eye was machine learning and artificial intelligence. Everyone had interesting things to day, and I don’t want to paraphrase them too much, but it distilled some of my thinking (being asked to summarize in 3 minutes) so I wanted to reflect that here.

I will only mention other’s views briefly, because I don’t want to misrepresent what they might have said, and that’s easy to do. But I’m happy for any of them to comment on the below. They had many interesting things to say about the topics also (probably much more so than me!).

I only had two ‘notes’ for the discussion which I spoke to off the cuff, so I’ll split the thoughts into those two sections. Those who know me know I can talk for a long time, and I was trying to limit this tendency!

Note 1: Perception and Self Perception

This note meant to me that perception was an area where we’ve been successful, but self-perception less so. I’ll try and clarify.

I’m probably using these terms too loosely, so let me define what I mean by ‘perception’.I mean the sensing of objects and our environment. The particular recent success of deep learning has been on sensing the environment, categorising objects, locating pedestrians. I’ve always felt the mathematical theory of how we should aim to do this was fairly clear: it’s summarised by Bayes’ rule which is widely used in robotics, vision, speech etc. The big recent change from the deep learning community has been the complexity of the mappings that we use to form this perception and our ability to learn from data. So I see this as a success.

For self-perception I mean the sensing of our selves, our prediction of our own interactions with the environment. How what we might do could affect the environment and how we will react to those effects. This has an interesting flavour of infinite regress. If we try and model ourselves and the environment we need a model that is larger than ourselves and the environment. However, that model is part of us, so we need another model on top of that. This is the infinite regress, and it’s non convergent. It strikes me that the only way we can get around that is to use a ‘compression’ of ourselves, i.e. have a model within our model in order to predict our interactions with the environment. This compressed model of ourselves will not be entirely accurate, and may mis-predict our own behaviour, but it is necessary to make the problem tractable.

A further complication is that our environment also contains other complex intelligent entities that try to second guess our behaviour. We need to also model them. I think one way we do this is by projecting our own model of ourselves onto them, i.e. using our own model of our own motivations, with appropriate modifications, to incorporate other people in our predictions. I see this as some form of ‘self-sensing’ and also sensing of others. I think doing it well may lead naturally to good planning algorithms, and planning was something that Yann mentioned we do badly. I don’t think we’re very good at this yet, and I think we would benefit from more open interaction with cognitive scientists and neuroscientists in understanding how humans do this. I know there’s a lot of research in this area in those fields, but I’m not an expert. Having a mathematical framework which shows how we can avoid this infinite regress through compression would be great.

These first thoughts were very much my thoughts about challenges for AI. The next thought tries to address AI in society.

Note 2: Creeping and Creepy AI

I think what we are seeing with successful AI is that it is emerging slowly, and without most people noticing. Large amounts of our interactions with computers are dictated by machine learning algorithms. We were lucky to have Lars Backstrom at the meeting who leads the team at Facebook that decides how to rank our news feed on the site. This is done by machine learning, but most people would be unaware that there is some ‘Artificial Intelligence’ underpinning it. Similarly, the ads we view across all sites are ranked by AI. Machine learning also recommends products on Amazon. Machine learning is becoming a core computational technique. I was sitting next to Ralf when Amazon launched their new machine learning services on AWS. Driverless cars are another good example, they are underpinned by a lot of machine learning ideas, but those technologies are also already appearing in normal cars. ‘Creeping AI’ is enhancing human abilities, improving us rather than replacing us. Allowing a seamless transition between what is human and what is computer. It demands better interaction between the human and computer, and better understanding between them.

However, this leads to another effect that could be seen as ‘creepy AI’. When the transition between computer and human is done well, it can be difficult to see when the human stops and the machine learning starts. Learning systems are already very capable of understanding our personalities and desires. They do this in very different ways to how humans do it (see self perception above!). They use large amounts of data about our previous behaviour and that of other humans to make predictions about our future behaviour. This can be seen as creepy. How do we avoid this? We need to improve people’s understanding of when AI is being used and what it is doing, improve their ability to control it. Improving our control of our data and developing legislation to protect us are things I think we need to do to address that.

We can avoid AI being creepy by remaining open to debate, understanding what users want, but also giving them what they need. In the long term they need a better understanding of our methodologies and their implications, as well as better control of how their data is being used. This is one of motivations of our open data science agenda.

Questions from the Audience

There were several questions from the audience, but the two that stuck out most for me were from Uli von Luxburg and Chris Watkins. Uli asked if we had a responsibility to worry about the moral side when developing these methods. I believe she phrased her question as to how much we should be worrying about ‘creepy AI’. I didn’t get my answer in initially, and before I could there was a follow up question from Chris about how we deal with the natural data monopoly. I’ve addressed these ideas before in the digital oligarchies post. Uli’s question is coming up more often, and a common answer to it is “this is something for society to decide”. I want to react strongly against that answer. Society is made up of people, who include experts. Those experts have a deeper understanding of the issues and implications than the general population. It’s true that there are philosophers and social scientists who can make important contributions to the debate, but it’s also true that amongst those with the best understanding of the implications of technology are those who are expert in it. If some of us don’t engage in the debate, then others will fill the vacuum. Uli’s question was probably more about whether an individual researcher should worry about these issues, rather than whether we should engage in debate. However, even if we don’t choose to contribute to the debate, I feel there is an obligation on us to be considering these issues in our research. In particular, the challenges we are creating by developing and sharing these technologies will require technological solutions as well as legislative change. These go hand in hand. Certainly those of us who are academics, and funded by the public, would not be doing our job well if we weren’t anticipating these needs and driving the technology towards answering them.

The good news is as follows, meetings like DALI are excellent for having such debates and engaging with different communities. I think when Bernhard initially envisaged the meeting, this atmosphere was what he was hoping for. That is also what got Thomas, Zoubin and myself excited about it. I think the meeting really achieved that.

The Meeting as a Whole

I haven’t mentioned too much of the thoughts as others, because they were offered informally, and often as a means to developing debate, but if I’ve misrepresented anything above please feel free to comment below. I also apologise for omitting all the interesting ideas others spoke about, but again I didn’t want to endanger the open atmosphere of the meeting by mistakenly misrepresenting someone else’s point of view (which may also have been presented in the spirit of the devil’s advocate). I think the meeting was a great success and we were already talking about venue for next year.

Advertisements

3 thoughts on “DALI Meeting + Views on Machine Learning and Artificial Intelligence

  1. Yunqing says:

    Looks like an interesting meeting! I have some thoughts about your first note though.

    First, a side note for compression.. DL itself can be viewed as compression.
    Unsupervised training: minimize the information needed to perfectly reconstruct input
    Supervised training: minimize information needed to perfectly reconstruct labels
    Such “information” to be minimized consists of a) the model b) the error residuals.

    ======
    As for self-perception, I think you’re messing up some concepts. There are actually 2 questions here:
    1. Can a machine perceive itself?
    2. Can a machine model itself?

    For the first question, simple recurrent nets can achieve that by definition.

    As for the second question. Think about this: can a Turing machine simulate a Turing machine? Of course. But how? Let’s use a bit of Virtual Machine vocabulary here, since it helps:
    Host: The Turing machine that runs in hardware. It simulates the behaviour of the Client
    Client: The Turing machien that runs in software. It is simulated by the Host

    The definition of the Host’s behaviour exists in the Host’s action table
    The definition of the Client’s behaviour exists in the Host’s tape

    So long as the “tape” is not an internal component of the Host, and that the Host does not try to model the tape, there is no recursion here. And this would suffice for all practical purposes. When the Host wants to simulate its own behaviour, it:
    1. perceives its internal state
    2. dumps its internal state into the tape (also make some assumptions about external input)
    3. run the simulated machine to figure out what comes out

    I think we humans actually have a “tape”: our working memory, our paper, whiteboard, and the Internet.

    Going so far as to think that we need a complete “model in model” would probably be less productive.

  2. Looks like an interesting meeting! I have some thoughts about your first note though.

    First, a side note for compression.. DL itself can be viewed as compression.
    Unsupervised training: minimize the information needed to perfectly reconstruct input
    Supervised training: minimize information needed to perfectly reconstruct labels
    Such “information” to be minimized consists of a) the model b) the error residuals.

    ======
    As for self-perception, I think you’re messing up some concepts. There are actually 2 questions here:
    1. Can a machine perceive itself?
    2. Can a machine model itself?

    For the first question, simple recurrent nets can achieve that by definition.

    As for the second question. Think about this: can a Turing machine simulate a Turing machine? Of course. But how? Let’s use a bit of Virtual Machine vocabulary here, since it helps:
    Host: The Turing machine that runs in hardware. It simulates the behaviour of the Client
    Client: The Turing machien that runs in software. It is simulated by the Host

    The definition of the Host’s behaviour exists in the Host’s action table
    The definition of the Client’s behaviour exists in the Host’s tape

    So long as the “tape” is not an internal component of the Host, and that the Host does not try to model the tape, there is no recursion here. And this would suffice for all practical purposes. When the Host wants to simulate its own behaviour, it:
    1. perceives its internal state
    2. dumps its internal state into the tape (also make some assumptions about external input)
    3. run the simulated machine to figure out what comes out

    I think we humans actually have a “tape”: our working memory, our paper, whiteboard, and the Internet.

    Going so far as to think that we need a complete “model in model” would probably be less productive.

  3. Hi Yunqing,

    I should probably have emphasised better that I’m relating ideas from the perspective of a machine learning researcher. My thinking is from an information theory and probabilistic perspective. I’m afraid I’ve never been very good at seeing how thinking about things as a Turing machines helps in the sort of modelling I do. The Turing machine is such a general model for computation, that without a program it is non-predictive about action and environment and their inter-effects. Your description to me seems to relate to emulation of one machine by another. Of course it’s nice that this is possible, but it doesn’t give me clues as to how to go about the modelling (e.g. from a probabilistic perspective).

    Of course, I accept that my perspective is narrow, and there are also likely limitations with it.

    I meant particular things by ‘perception’ and ‘self-perception’ in the text, and tried to define them., perhaps I should have used other terms. So from my perspective a recurrent net doesn’t ‘self perceive’, it is just an interesting model of time series data.

    Neil

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s