EPSRC College of Reviewers

Yesterday, I resigned from the EPSRC college of reviewers.

The EPSRC is the national funding body in the UK for Engineering and Physical Sciences. The college of reviewers is responsible for reading grant proposals and making recommendations to panels with regard to the quality, feasibility and utility of the underlying work.

The EPSRC aims to fund international quality science, but the college of reviewers is a national body of researchers. Allocation of proposals to reviewers is done within the EPSRC.

In 2012 I was asked to view only one proposal, in 2013 so far I have received none. The average number of review requests per college member in 2012 was 2.7.

It’s not that I haven’t been doing any proposal reviewing over the last 18 months, I’ve reviewed for the Dutch research councils, the EU, the Academy of Finland, the National Science Foundation (USA), BBSRC, MRC and I’m contracted as part of a team to provide a major review for the Canadian Institute for Advanced Research. I’d estimate that I’ve reviewed around 20 international applications in the area of machine learning and computational biology across this period.

I resigned from the EPSRC College of Reviewers because I don’t wish people to read the list of names in the college and assume that, as a member of the college, I am active in assessing the quality of work the EPSRC is funding. Looking back over the last ten years all the proposals I have reviewed come from a very small body of researchers, all of whom, I know, nominate me as a reviewer.

Each submitted proposal nominates a number of reviewers who the proposers consider to be appropriate. The EPSRC chooses one of these nominated reviewers, and selects the remainder from the wider college.

Over a 12 year period as an academic, I have never been selected to review an EPSRC proposal unless I’ve been nominated by the proposers to do so.

So in many senses this resignation changes nothing, but by resigning from the college I’m highlighting the fact that if you do think I am appropriate for reviewing your proposal, then the only way it will happen is if you nominate me.

Advertisements

Machine Learning as Engine Design

Originally posted on 5th July 2012: http://uspace.shef.ac.uk/blogs/profneil/2012/07/05/machine-learning-as-engine-design

 

Just back from ICML 2012 this week, as usual it was good to see everyone and as ever it was difficult to keep track of all the innovations across what is a very diverse field.

 

One talk that was submitted as a paper but presented across the conference has triggered this blog entry. The talk was popular amoungst many attendees and seemed to reflect concerns some researchers in the field have. However, I felt it didn’t reflect my own perspective, and if it had done I wouldn’t have been at the conference in the first place. It was Kiri Wagstaff’s “Machine Learning that Matters”. Kiri made some good points and presented some evidence to back up her claims. Her main argument was that ICML doesn’t care enough about the applications. Kiri’s paper can be found here: http://arxiv.org/pdf/1206.4656.pdf. A comment from one audience member also seemed to indicate that he (the audience member) felt we (the ICML conferene) don’t do enough to engage with application areas.

 

As a researcher who spends a large amount of time working directly in application areas, I must admit to feeling a little peeved by what Kiri said. Whilst she characterized a portion of machine learning papers correctly, I believe that these papers are in a minority.  And I suspect an even larger proportion of such papers are submitted to the conference and then rightly rejected. The reason I attend machine learning conferences is that there are a large number of researchers who are active in trying to make a real difference in important application areas.

 

It was ironic that the speaker previous to Kiri was Yann Le Cun, who presented tailored machine learning hardware for real time segmentation of video images. Rather than focussing on this aspect of Yann’s work Kiri chose to mention the league table he maintains for the MNIST digits (something Yann does as a community service–I think he last published a paper on MNIST over 10 years ago). She presented the community’s use of the MNIST digits and UCI data sets as being indicative that we don’t care about `real applications’. Kiri found that 23% of ICML papers present results only on UCI and/or simulated data. However, given that ICML is a mainly methodological conference I do not find this surprising at all. I did find it odd that Kiri focussed only on classification as an application. I attended no talks on `classical’ classification at the conference (i.e. discriminative classification of vectorial data without missing labels, missing inputs or any other distinguishing features).  I see that very much as ‘yesterday’s problem’. An up to date critique might have focussed on Deep Learning, Latent topic models, compressed sensing or Bayesian non parametrics (and I’m sure we could make similar claims about those methods too).

 

However, even if the talk had focussed on more contemporary machine learning, I would still find Kiri’s criticisms misdirected. I’d like to use an analogy to explain my thinking. Machine learning is very much like the early days of engine design. From steam engines to jet engines the aim of engines is to convert heat into kinetic energy. The aim of machine learning algorithms is to convert data into actionable knowledge. Engine designers are concerned with aspects like power to weight ratio. They test these features through proscribed tests (such as maximum power output). These tests can only be indicative. For example high power output for an internal combustion engine (as measured on a ‘rig’) doesn’t give you the `drivability’ of that engine in a family car. That is much more difficult to guage and involves user tests. The MNIST data is like a power test: it is indicative  only (perhaps a necessary condition rather than sufficient), however it is still informative,

 

My own research is very much inspired by applications. I spend a large portion of my time working in computational biology and have always been inspired by challenges in computer vision. In my analogy this is like a particular engine designer being inspired by the challenges of aircraft engine design. Kiri’s talk seemed to be proposing that designing engines in itself isn’t worthwhile unless we simultaneously build the airplane around our engine. I’d think of such a system as a demonstrator for the engine, and building demonstrators is a very worthwhile endeavour (many early computers, such as the Manchester baby, were built as demonstrators of an important component such as memory systems). In my group we do try and do this, we make our methods available immediately on submission, often via MATLAB, and later in a (slightly!) more polished fashion through software such as Bioconductor. These are our demonstrators (of varying quality). However, I’d argue that in manycases the necessary characteristics of the engine being designed (power, efficiency, weight for engines; speed, accuracy, storage for ML) are so well understood that you don’t need the demonstrator. This is why I think Kiri’s criticisms, whilst well meaning, were misdirected. They were equivalent to walking into an engine development laboratory and shouting at them for not producing finished cars. An engine development lab’s success is determined by the demand for their engines. Our success is determined by the demand for our methods, which is high and sustained. It is absolutely true that we could do more to explain our engines to our user community, but we are a relatively small field (in terms of numbers, 700 at our international conference) and the burden of understanding our engines will also, necessarily, fall upon our potential users.

 

I know that you can find poorly motivated and undervalidated models in machine learning, but I try and avoid those papers. I would have preferred a presentation that focussed on succesful machine learning work that makes a serious difference in the real world. I hope that is a characteristic of my work, but I know it is a characteristic of many of my colleagues’.

Personal Thoughts on Computer Science Degrees

Originally posted on my uSpace blog on May 20th 2012: http://uspace.shef.ac.uk/blogs/profneil/2012/05/20/personal-thoughts-on-computer-science-degrees

Background

Computer science has evolved as a subject. The early days of computer science focused on languages and compilers. Ease of programming and reusability of code were key objectives. Ensuring the quality of the resulting software for reliability and safety concerns was a cornerstone of computer science research. The early days of the field were dominated by breakthroughs in these areas. The needs of modern Computer Science are very different. The very success of Computer Science has meant that computing is now pervasive, the consequence is vast realms of data. Automatically extracting knowledge from this data should now be the main goal of modern Computer Science.

Introduction

I cannot say when I became a computer scientist as my undergraduate was in Mechanical Engineering, and whilst my PhD in Machine Learning was in a Computer Science department I was isolated there in terms of my research field and was closer related in my research to colleagues in Engineering and Physics.

My first postdoctoral position was with Microsoft, but I programmed only in MATLAB, and my second postdoctoral position was as a Computer Science Lecturer in Sheffield, but I still felt somewhat out of place. At the time Sheffield was rare in that it had a speech processing group based in Computer Science, and there was also a large and successful language processing group which overlapped more with my research.

My initial focus on arriving in the department was refining my own knowledge of computer science, I taught Networks and two classic books on Operating Systems (Tanenbaum) and Compilers (Aho, Sethi and Ullman) still sit on my shelf. I still thought of myself as an Engineer in the classical sense, only one who was interested in data. A  Data Engineer, if you will. My contemporaries in machine learning research are mostly from Physics, Engineering or Mathematics backgrounds. There were more Psychologists than Computer Scientists.

 

Machine Learning Today

Today the situation has very much changed. I am a convinced computer scientist. So what has happened in the intervening 10 years. Did I dedicate myself so much to the teaching of Networks and the reading of Operating Systems and Compilers that I forsook my original research field? No, in fact it turned out that I didn’t have to conquer the mountain of computer science, the mountain chose to come to me. Today machine learning is at the core of Computer Science. The big four US institutions in Computer Science: MIT, Stanford, Berkeley and CMU all have very large groups in machine learning. In all of these cases these groups have grown in size significantly since 1996 when I started my PhD. Whilst MIT was active then, that was mostly through their Brain Sciences unit. CMU already had a very large group, and has since moved machine learning to a separate department, but Berkeley and Stanford were also yet to grow such large groups.

At the first NIPS (the premier machine learning conference) I attended there were no industry stands, in recent years we have had stands from Google, Yahoo!, Microsoft as well as a range of financial institutions and even airline booking companies.

Machine learning is now at the core of computer science. Of my current cohort of PhD students 4 out of 5 have computer science undergraduate degrees. The fifth has an undergraduate statistics education. The quality of these students is excellent. They combine mathematical strengths with an excellent technical understanding of their machines and what they are physically capable of. They are trained in programming, but they use their programming like they use their ability to write English, as a means to an end: not as the end in itself.

Modern Computer Science

The research effort to standardize machines, simplify language, encourage code reuse and formalize software specification has to a large extent been successful. Whilst it is not the case that everyone can program (as was envisaged by the inventors of BASIC). Today you do not need a degree in Computer Science to implement very complex systems. You can capitalize on the years of experience integrated in modern high level programming languages and their associated software libraries. There is a large demand for programmers who can combine php with MySql to provide a complex retail interface. But there are many individuals that implement these systems without ever having attended a University. Indeed, the prevailing wisdom seems to be that such skills (implementation of a well specified system in standard programming languages) will be subject to a worldwide labour market causing UK IT workforce to be undercut costwise by countries with a large portion of highly educated people, where labour costs are lower (e.g. India). In the UK (and more widely in Europe and the US) our target should not be to produce graduates who can only implement software to known specifications. What, then, is the role of computer science in a developed country like the UK? What graduates should we be producing?

Historically we would have hoped to produce graduates who had a developed understanding of operating systems and compilers, graphics, and perhaps formal methods. We would have produced people that could have designed the next generation of computer languages. We would have produced people who could conceive and design protocols for the internet. That would have been our target. These goals are still at the computer science core. But today we need to be much more ambitious. The success of the preceding generations has now meant that computer science is pervasive, far beyond the technical domain where it has previously dominated. The internet and social networking means that computers are affecting our every day lives in ways that were only imagined even 15 years ago.

This prevents a major technical challenge. In the past, some of the most advanced uses of computers were in other technical fields (engine management systems, control etc.) Those fields had technical expertise which they were able to bring to bear. The software engineer provided a service role to the engineering experts. Today, there are very few technical experts in the vast realms of data that computers have facilitated. Even in technical domains such as Formula 1, the amount of data being produced means that it is technical expertise that is required in data analysis rather than engineering systems. To a large extent, we made this mess, and now it is time for us to clean it up.

A modern Computer Science degree must retain a very large component of analysis of unstructured data. What do I mean by unstructured data? Data that is not well curated, it was not collected with a particular question in mind, it is just there. Traditional statistics worked by designing an experiment: carefully deciding what to measure in order to answer a specific question. The need for modern data analysis is very different. We need to be able to assimilate unstructured data sources to translate them into a system that can be queried. It may not be clear what questions we’ll be able to answer with the resulting system, and we are only likely to have minimal control over what variables are measured.

Examples of data of this type include health records of patients and associated genomic information. Connectivity data: links between friends in social networks or links between documents such as web pages. Purchase information for customers of large supermarkets or web retailers. Preference information for consumers of films. These data sets will contain millions or billions of records that will be `uncurated’ in the sense that the size of the data sets means that no single individual will have been through all the data consistently removing outliers or dealing with missing or corrupted values. The data may also not be in a traditional vectorial form, it could be in the form of images, text or recorded speech. We need algorithms that deal with these challenges automatically.

The Next Generation of Graduates

To address this situation we need to train a generation of computer scientists to deal with these challenges. The fundamentals they will require are language processing: extracting information from unstructured documents. Speech processing: extracting information from informal meetings, conversations or direct speech interaction with computer. Bioinformatics: extracting information from biological experiments or medical tests. Computer vision: extracting information from images or videos. Sitting at the core of each of these areas is machine learning: the art of processing and assimilating a range of unstructured data sources into a single model.

These areas must form the basis of a modern computer science course if we are to provide added value over what will be achievable by farming out software engineering. At the core of each of the areas outlined above is a deep mathematical understanding. Mathematics is more important to computer science than at any time previously. The algorithms used in all the areas developed above are derived and implemented through mathematics. The modern computer science education needs to be based on solid principles: probability and logic. These areas are at the core of mathematics and it is the responsibility of computer science to drive forward research in these areas. A modern computer science graduate must be fluent in programming languages and systems. Not as an end in themselves, but as a means to an end: the construction of a complex interacting systems for extracting knowledge from data. Teaching programing alone is like teaching someone how to write without giving them something to say.

It must be the target of a leading Computer Science undergraduate course to produce students that can address these challenges. All modern Computer Science courses should have a significant basis of data analysis beginning from the very first year. Computer science graduates should understand text and language processing: extracting meaning from documents. Rapid evolution of the language through internet mediums requires flexible algorithms for decoding meaning. The large volume of text on the internet presents major analysis challenges, but the wider challenge of understanding video: images and speech, gesture and emotion recognition hardly had its surface scratched.

A depth of understanding of probabilistic modelling, language modelling and signal processing must be built up over the second and third years of the degree. Our best graduates would have at least a four year education where they are given an opportunity in their final year to put the ideas they have learned into practice through thesis work on cutting edge research questions. Our graduates must be adaptable, they need to be able to build on the analysis skills we equip them with to address new challenges. If Computer Science doesn’t produce graduates in this mold there is no other field that will.

Conclusion

Many grand visions of Computer Science have largely been realized to, perhaps, a greater degree than was even anticipated: a computer on every desktop has become a computer in every pocket. Social interfaces through the internet that connect across the world and through generations. International commerce conducted with a click of the mouse. All of these successes have created an enormous challenge in processing of uncurated data. Computer Science Research has developed the first generation of tools to address these challenges, it is time for Computer Science departments to produce the first generation of graduates who will wield these tools with confidence.

The computer has changed the world, and I believe now it is time for the world to change the way we study computers.