The dome of the Radcliffe Camera against a blue sky

Statistics and Machine Learning (EPSRC CDT)

  • Entry requirements
  • Funding and costs

College preference

  • How to Apply

About the course

The Statistics and Machine Learning (StatML) Centre for Doctoral Training (CDT) is a four-year DPhil research programme (or eight years if studying part-time). It will train the next generation of researchers in statistics and machine learning, who will develop widely-applicable novel methodology and theory and create application-specific methods, leading to breakthroughs in real-world problems in government, medicine, industry and science. 

This is the Oxford component of StatML, a CDT in Statistics and Machine Learning, co-hosted by Imperial College London and the University of Oxford. The programme will provide you with training in both cutting-edge research methodologies and the development of business and transferable skills – essential elements required by employers in industry and business.

You will undertake a significant, challenging and original research project, leading to the award of a DPhil. Given the breadth and depth of the research teams at Imperial College and at the University of Oxford, the proposed projects will range from theoretical to computational and applied aspects of statistics and machine learning, with a large number of projects involving strong methodological/theoretical developments together with a challenging real problem. A significant number of projects will be co-supervised with industry.

You will pursue two mini-projects during your first year (specific timings may vary for part-time students), with the expectation that one of them will lead to your main research project. At the admissions stage you will choose a mini-project. These mini-projects are proposed by the department's supervisory pool and industrial partners. You will be based at the home institution of your main supervisor of the first mini-project.

If your studentship is funded or co-funded by an external partner, the second mini-project will be with the same external partner but will explore a different question.

Alongside your research projects you will engage with taught courses each lasting for two weeks. Core topics will be taught during at the beginning of your first year (specific timings may vary for part-time students) and are:

  • Modern Statistical Theory
  • Statistical Machine Learning;
  • Causality; and
  • Bayesian methods and computation.

You will then begin your main DPhil project at the beginning of the third term (at the beginning of the fourth term for part-time students), which can be based on one of the two mini-projects. Where appropriate for the research, your project will be run jointly with the CDT's leading industrial partners, and you will have the chance to undertake a placement in data-intensive statistics with some of the strongest statistics groups in the USA, Europe and Asia.

If you are studying full-time, starting in the second year, you will teach approximately twelve contact hours per year in undergraduate and graduate courses in your host department. If you are studying part-time, teaching will begin in the third year and you will teach approximately six hours per year. This is mentored teaching, beginning with simple marking, to reach a point where individual students are leading whole classes of ten or twelve undergraduate students. Students will have the support of a mentor and get written feedback at the end of each block of teaching.

You will also be required to take a number of optional courses throughout the four years of the course, which could be made up of choices from the following list: Bayesian nonparametrics; high-dimensional statistics; advanced optimisation; networks; reinforcement learning; large language models; conformal inference, variational Bayes and advance Bayesian computations, dynamical and graphical modelling of multivariate time series, modelling events; and deep learning. Optional modules last two weeks and are delivered in a similar format to the core modules.

Many events bring StatML students and staff together across different peer groups and research groups, ranging from full cohort days and group research skills sessions to summer schools. These events support research and involve staff and students from both Oxford and Imperial coming together at both locations.

The Department of Statistics runs a seminar series in statistics and probability, and a graduate lecture series involving snapshots of the research interests of the department. Several journal-clubs run each term, reading and discussing new research papers as they emerge. These events bring research students together with academic and other research staff in the department to hear about on-going research, and provide an opportunity for networking and socialising.

Further information about part-time study

As a part-time student you will be required to attend modules and other cohort activities in Oxford (or sometimes London) for a minimum of 30 days each year. There will be no flexibility in the dates of modules or cohort events, though it is possible to spread your attendance at modules over the course of the four year programme (with agreement of your supervisor and the programme Directors). Attendance will be required during term-time (on a pro-rata basis) for cohort activities. These often take place on Mondays and Thursdays. Attendance will occasionally be required outside of term-time for cohort activities. 

You will have the opportunity to tailor your part-time study and skills training in liaison with your supervisor and programme Directors, and agree your pattern of attendance.

Supervision

The allocation of graduate supervision for this course is the responsibility of the Department of Statistics (Oxford) and/or the Department of Mathematics (Imperial). It is not always possible to accommodate the preferences of incoming graduate students to work with a particular member of staff. A supervisor may be found outside these departments.

You are matched to your supervisor for the first mini-project at the start of the course. Within the first year of the course, the student will have the opportunity to work with an alternative supervisor for a second mini-project. It is normal for one of these mini-projects to lead to the full DPhil project with the same supervisory team as was in place for the mini-project chosen. 

Typically, as a research student, you should expect to have meetings with your supervisor or a member of the supervisory team with a frequency of at least once every two weeks averaged across the year. The regularity of these meetings may be subject to variations according to the time of the year, and the stage that you are at in your research programme.

Each mini-project will be assessed on the basis of a report written by the student, by researchers from Imperial and Oxford.

Modules are assessed by a presentation in small groups on some material studied during the two-week module (known as micro-projects within the programme).

All students will be initially admitted to the status of Probationer Research Student (PRS). Within a maximum of six terms as a full-time PRS student or twelve terms as a part-time PRS student, you will be expected to apply for transfer of status from Probationer Research Student to DPhil status. This application is normally made by the fourth term for full-time students and by the eighth term for part-time students.

A successful transfer of status from PRS to DPhil status will require the submission of a thesis outline. Students who are successful at transfer will also be expected to apply for and gain confirmation of DPhil status to show that your work continues to be on track. This will need to done within nine terms of admission for full-time students and eighteen terms of admission for part-time students.

Both milestones normally involve an interview with two assessors (other than your supervisor) and therefore provide important experience for the final oral examination.

Full-time students will be expected to submit a thesis at four years from the date of admission. If you are studying part-time, you be required to submit your thesis after six or, at most, eight years from the date of admission. To be successfully awarded a DPhil in Statistics you will need to defend your thesis orally (viva voce) in front of two appointed examiners.

The final thesis is normally submitted for examination during the fourth year (or eighth year if studying part-time) and is followed by the viva examination. The final award for Oxford based students will be a DPhil awarded by the University of Oxford.

Graduate destinations

This is a new course and there are no alumni yet. StatML is dedicated to providing the organisation, environment and personnel needed to develop the future industrial and academic individuals doing world-leading research in statistics for modern day science, engineering and commerce, all exemplified by ‘big data’.

Changes to this course and your supervision

The University will seek to deliver this course in accordance with the description set out in this course page. However, there may be situations in which it is desirable or necessary for the University to make changes in course provision, either before or after registration. The safety of students, staff and visitors is paramount and major changes to delivery or services may have to be made in circumstances of a pandemic, epidemic or local health emergency. In addition, in certain circumstances, for example due to visa difficulties or because the health needs of students cannot be met, it may be necessary to make adjustments to course requirements for international study.

Where possible your academic supervisor will not change for the duration of your course. However, it may be necessary to assign a new academic supervisor during the course of study or before registration for reasons which might include illness, sabbatical leave, parental leave or change in employment.

For further information please see our page on changes to courses and the provisions of the student contract regarding changes to courses.

Entry requirements for entry in 2024-25

Proven and potential academic excellence.

The requirements described below are specific to this course and apply only in the year of entry that is shown. You can use our interactive tool to help you  evaluate whether your application is likely to be competitive .

Please be aware that any studentships that are linked to this course may have different or additional requirements and you should read any studentship information carefully before applying. 

Degree-level qualifications

As a minimum, applicants should hold or be predicted to achieve the following UK qualifications or their equivalent:

  • a first-class or strong upper second-class undergraduate degree with honours in mathematics, statistics, physics, computer science, engineering or a closely related subject. 

However, entrance is very competitive and most successful applicants have a first-class degree or the equivalent.

For applicants with a degree from the USA, the minimum GPA sought is 3.6 out of 4.0.

If your degree is not from the UK or another country specified above, visit our International Qualifications page for guidance on the qualifications and grades that would usually be considered to meet the University’s minimum entry requirements.

GRE General Test scores

No Graduate Record Examination (GRE) or GMAT scores are sought.

Other qualifications, evidence of excellence and relevant experience 

Publications are not expected but details of any publications may be included with the application.

English language proficiency

This course requires proficiency in English at the University's  standard level . If your first language is not English, you may need to provide evidence that you meet this requirement. The minimum scores required to meet the University's standard level are detailed in the table below.

*Previously known as the Cambridge Certificate of Advanced English or Cambridge English: Advanced (CAE) † Previously known as the Cambridge Certificate of Proficiency in English or Cambridge English: Proficiency (CPE)

Your test must have been taken no more than two years before the start date of your course. Our Application Guide provides further information about the English language test requirement .

Declaring extenuating circumstances

If your ability to meet the entry requirements has been affected by the COVID-19 pandemic (eg you were awarded an unclassified/ungraded degree) or any other exceptional personal circumstance (eg other illness or bereavement), please refer to the guidance on extenuating circumstances in the Application Guide for information about how to declare this so that your application can be considered appropriately.

You will need to register three referees who can give an informed view of your academic ability and suitability for the course. The  How to apply  section of this page provides details of the types of reference that are required in support of your application for this course and how these will be assessed.

Supporting documents

You will be required to supply supporting documents with your application. The  How to apply  section of this page provides details of the supporting documents that are required as part of your application for this course and how these will be assessed.

Performance at interview

Interviews are held as part of the admissions process for applicants who, on the basis of their written application, best meet the selection criteria.

Interviews may be held in person or over video link such as Zoom, normally with at least two interviewers. Interviews will include some technical questions on statistical topics relating to the StatML CDT. These questions will be adapted as far as possible to the applicant's own background training in statistics or machine learning.

How your application is assessed

Your application will be assessed purely on your proven and potential academic excellence and other entry requirements described under that heading.

References  and  supporting documents  submitted as part of your application, and your performance at interview (if interviews are held) will be considered as part of the assessment process. Whether or not you have secured funding will not be taken into consideration when your application is assessed.

An overview of the shortlisting and selection process is provided below. Our ' After you apply ' pages provide  more information about how applications are assessed . 

Shortlisting and selection

Students are considered for shortlisting and selected for admission without regard to age, disability, gender reassignment, marital or civil partnership status, pregnancy and maternity, race (including colour, nationality and ethnic or national origins), religion or belief (including lack of belief), sex, sexual orientation, as well as other relevant circumstances including parental or caring responsibilities or social background. However, please note the following:

  • socio-economic information may be taken into account in the selection of applicants and award of scholarships for courses that are part of  the University’s pilot selection procedure  and for  scholarships aimed at under-represented groups ;
  • country of ordinary residence may be taken into account in the awarding of certain scholarships; and
  • protected characteristics may be taken into account during shortlisting for interview or the award of scholarships where the University has approved a positive action case under the Equality Act 2010.

Processing your data for shortlisting and selection

Information about  processing special category data for the purposes of positive action  and  using your data to assess your eligibility for funding , can be found in our Postgraduate Applicant Privacy Policy.

Admissions panels and assessors

All recommendations to admit a student involve the judgement of at least two members of the academic staff with relevant experience and expertise, and must also be approved by the Director of Graduate Studies or Admissions Committee (or equivalent within the department).

Admissions panels or committees will always include at least one member of academic staff who has undertaken appropriate training.

Other factors governing whether places can be offered

The following factors will also govern whether candidates can be offered places:

  • the ability of the University to provide the appropriate supervision for your studies, as outlined under the 'Supervision' heading in the  About  section of this page;
  • the ability of the University to provide appropriate support for your studies (eg through the provision of facilities, resources, teaching and/or research opportunities); and
  • minimum and maximum limits to the numbers of students who may be admitted to the University's taught and research programmes.

Offer conditions for successful applications

If you receive an offer of a place at Oxford, your offer will outline any conditions that you need to satisfy and any actions you need to take, together with any associated deadlines. These may include academic conditions, such as achieving a specific final grade in your current degree course. These conditions will usually depend on your individual academic circumstances and may vary between applicants. Our ' After you apply ' pages provide more information about offers and conditions . 

In addition to any academic conditions which are set, you will also be required to meet the following requirements:

Financial Declaration

If you are offered a place, you will be required to complete a  Financial Declaration  in order to meet your financial condition of admission.

Disclosure of criminal convictions

In accordance with the University’s obligations towards students and staff, we will ask you to declare any  relevant, unspent criminal convictions  before you can take up a place at Oxford.

In January 2016 the Department of Statistics moved to occupy a newly-refurbished building in St Giles, near the centre of Oxford. The building has spaces for study and collaborative learning, including the library and large interaction and social area on the ground floor, as well as an open research zone on the second floor.

You will be provided with a computer and desk space in a shared office. You will have access to the Department of Statistics computing facilities and support, the department’s library, the Radcliffe Science Library and other University libraries, centrally-provided electronic resources and other facilities appropriate to your research topic. The provision of other resources specific to your DPhil project should be agreed with your supervisor as a part of the planning stages of the agreed project.

Tea and coffee facilities are provided in the Department. There are also opportunities for sporting interaction such as football and cricket.

The University's Department of Statistics is a world leader in research in probability, bioinformatics, mathematical genetics and statistical methodology, including computational statistics, machine learning and data science. 

You will be actively involved in a vibrant academic community by means of seminars, lectures, journal clubs, and social events. Research students are offered training in modern probability, stochastic processes, statistical methodology, computational methods and transferable skills, in addition to specialised topics relevant to specific application areas.

Much of the research in the Department of Statistics is either explicitly interdisciplinary or draws motivation from application areas, ranging from genetics, immunoinformatics, bioinformatics and cheminformatics, to finance and the social sciences.

The department is located on St Giles, in a building providing excellent teaching facilities and creating a highly visible centre for statistics in Oxford. Oxford’s Mathematical Sciences submission came first in the UK on all criteria in the 2021 Research Excellence Framework (REF).

View all courses   View taught courses View research courses

We expect that the majority of applicants who are offered a place on this course will also be offered a fully-funded scholarship specific to this course, covering course fees for the duration of their course and a living stipend.

For further details about searching for funding as a graduate student visit our dedicated Funding pages, which contain information about how to apply for Oxford scholarships requiring an additional application, details of external funding, loan schemes and other funding sources.

Please ensure that you visit individual college websites for details of any college-specific funding opportunities using the links provided on our college pages or below:

Please note that not all the colleges listed above may accept students on this course. For details of those which do, please refer to the College preference section of this page.

Annual fees for entry in 2024-25

Full-time study.

Further details about fee status eligibility can be found on the fee status webpage.

Part-time study

Information about course fees.

Course fees are payable each year, for the duration of your fee liability (your fee liability is the length of time for which you are required to pay course fees). For courses lasting longer than one year, please be aware that fees will usually increase annually. For details, please see our guidance on changes to fees and charges .

Course fees cover your teaching as well as other academic services and facilities provided to support your studies. Unless specified in the additional information section below, course fees do not cover your accommodation, residential costs or other living costs. They also don’t cover any additional costs and charges that are outlined in the additional information below.

Continuation charges

Following the period of fee liability , you may also be required to pay a University continuation charge and a college continuation charge. The University and college continuation charges are shown on the Continuation charges page.

Where can I find further information about fees?

The Fees and Funding  section of this website provides further information about course fees , including information about fee status and eligibility  and your length of fee liability .

Additional information

There are no compulsory elements of this course that entail additional costs beyond fees (or, after fee liability ends, continuation charges) and living costs. However, please note that, depending on your choice of research topic and the research required to complete it, you may incur additional expenses, such as travel expenses, research expenses, and field trips. You will need to meet these additional costs, although you may be able to apply for small grants from your department and/or college to help you cover some of these expenses.

Please note that you are required to attend in Oxford for a minimum of 30 days each year, and you may incur additional travel and accommodation expenses for this. Also, depending on your choice of research topic and the research required to complete it, you may incur further additional expenses, such as travel expenses, research expenses, and field trips. You will need to meet these additional costs, although you may be able to apply for small grants from your department and/or college to help you cover some of these expenses.

Living costs

In addition to your course fees, you will need to ensure that you have adequate funds to support your living costs for the duration of your course.

For the 2024-25 academic year, the range of likely living costs for full-time study is between c. £1,345 and £1,955 for each month spent in Oxford. Full information, including a breakdown of likely living costs in Oxford for items such as food, accommodation and study costs, is available on our living costs page. The current economic climate and high national rate of inflation make it very hard to estimate potential changes to the cost of living over the next few years. When planning your finances for any future years of study in Oxford beyond 2024-25, it is suggested that you allow for potential increases in living expenses of around 5% each year – although this rate may vary depending on the national economic situation. UK inflationary increases will be kept under review and this page updated.

If you are studying part-time your living costs may vary depending on your personal circumstances but you must still ensure that you will have sufficient funding to meet these costs for the duration of your course.

Students enrolled on this course will belong to both a department/faculty and a college. Please note that ‘college’ and ‘colleges’ refers to all 43 of the University’s colleges, including those designated as societies and permanent private halls (PPHs). 

If you apply for a place on this course you will have the option to express a preference for one of the colleges listed below, or you can ask us to find a college for you. Before deciding, we suggest that you read our brief  introduction to the college system at Oxford  and our  advice about expressing a college preference . For some courses, the department may have provided some additional advice below to help you decide.

The following colleges accept students for full-time study on this course:

  • Balliol College
  • Corpus Christi College
  • Exeter College
  • Hertford College
  • Jesus College
  • Keble College
  • Kellogg College
  • Lady Margaret Hall
  • Linacre College
  • Mansfield College
  • New College
  • Reuben College
  • St Cross College
  • St Edmund Hall
  • Worcester College

The following colleges accept students for part-time study on this course:

Before you apply

Our  guide to getting started  provides general advice on how to prepare for and start your application. You can use our interactive tool to help you  evaluate whether your application is likely to be competitive .

If it's important for you to have your application considered under a particular deadline – eg under a December or January deadline in order to be considered for Oxford scholarships – we recommend that you aim to complete and submit your application at least two weeks in advance . Check the deadlines on this page and the  information about deadlines  in our Application Guide.

Application fee waivers

An application fee of £75 is payable per course application. Application fee waivers are available for the following applicants who meet the eligibility criteria:

  • applicants from low-income countries;
  • refugees and displaced persons; 
  • UK applicants from low-income backgrounds; and 
  • applicants who applied for our Graduate Access Programmes in the past two years and met the eligibility criteria.

You are encouraged to  check whether you're eligible for an application fee waiver  before you apply.

Readmission for current Oxford graduate taught students

If you're currently studying for an Oxford graduate taught course and apply to this course with no break in your studies, you may be eligible to apply to this course as a readmission applicant. The application fee will be waived for an eligible application of this type. Check whether you're eligible to apply for readmission .

Application fee waivers for eligible associated courses

If you apply to this course and up to two eligible associated courses from our predefined list during the same cycle, you can request an application fee waiver so that you only need to pay one application fee.

The list of eligible associated courses may be updated as new courses are opened. Please check the list regularly, especially if you are applying to a course that has recently opened to accept applications.

Do I need to contact anyone before I apply?

Before submitting an application, you may find it helpful to contact a potential supervisor or supervisors from among the online profile of StatML academics based in Oxford. This will allow you to discuss the matching of your interests with those of the centre, although there is no guarantee that this specific individual will become your supervisor if you are accepted. Please ensure that you have researched the specialisms of the department and those of your potential supervisor(s) before making contact. More information can be found on the  StatML website .

You can either contact the academic staff member directly or route your enquiry via the Admissions Administrator using the contact details provided on this page.

Completing your application

You should refer to the information below when completing the application form, paying attention to the specific requirements for the supporting documents .

For this course, the application form will include questions that collect information that would usually be included in a CV/résumé. You should not upload a separate document. If a separate CV/résumé is uploaded, it will be removed from your application .

If any document does not meet the specification, including the stipulated word count, your application may be considered incomplete and not assessed by the academic department. Expand each section to show further details.

You will also need to  complete the declaration form  once you have applied for this course.  

Proposed field and title of research project

Proposed supervisor.

Under 'Proposed supervisor name' enter the name of the academic(s) who you would like to supervise your research. 

Referees: Three overall, academic preferred

Whilst you must register three referees, the department may start the assessment of your application if two of the three references are submitted by the course deadline and your application is otherwise complete. Please note that you may still be required to ensure your third referee supplies a reference for consideration.

Your references should generally be academic, though up to one professional reference will be accepted.

Your references will support intellectual ability, academic achievement, motivation and your ability to work in a group.

Official transcript(s)

Your transcripts should give detailed information of the individual grades received in your university-level qualifications to date. You should only upload official documents issued by your institution and any transcript not in English should be accompanied by a certified translation.

More information about the transcript requirement is available in the Application Guide.

Statement of purpose/personal statement: A maximum of 1,100 words

Your statement should be written in English and should specify the broad areas in which your research interests lie -- what motivates your interest in these fields, and why do you think you will succeed in the programme?

The personal statement should describe your academic and career plans, as well your motivation and your scientific interests. When writing your personal statement, please make sure it answers the following questions:

  • What are your machine learning/statistical interests?
  • Why do you think the Statistics and  Machine Learning CDT is the right choice for you?

If possible, please ensure that the word count is clearly displayed on the document.

Your statement will be assessed for:

  • your reasons for applying
  • evidence of understanding of the proposed area of study
  • your ability to present a coherent case in proficient English
  • your commitment to the subject, beyond the requirements of the degree course
  • your preliminary knowledge of the subject area and research techniques
  • your capacity for sustained and intense work
  • your reasoning ability
  • your ability to absorb new ideas, often presented abstractly, at a rapid pace.

Start or continue your application

You can start or return to an application using the relevant link below. As you complete the form, please  refer to the requirements above  and  consult our Application Guide for advice . You'll find the answers to most common queries in our FAQs.

As the admissions process for StatML will be run in parallel with Imperial College London, we ask that you please  complete the declaration form once you have applied to one or both of the institutions.

Application Guide   Apply - FT   Apply - PT   Declaration Form

ADMISSION STATUS

Open - applications are still being accepted

Up to a week's notice of closure will be provided on this page - no other notification will be given

12:00 midday UK time on:

Friday 1 March 2024 Applications may remain open after this deadline if places are still available - see below

A later deadline shown under 'Admission status' If places are still available,  applications may be accepted after 1 March . The 'Admissions status' (above) will provide notice of any later deadline.

*Three-year average (applications for entry in 2021-22 to 2023-24)

This course was previously known as Modern Statistics and Statistical Machine Learning 

Further information and enquiries

This course is offered by the University's Department of Statistics , in partnership with Imperial College London

  • Course page on the centre's website
  • Funding information from the centre
  • Academic and research staff  (incl. Imperial)
  • Departmental research in Oxford
  • Mathematical, Physical and Life Sciences
  • Residence requirements for full-time courses
  • Postgraduate applicant privacy policy

Course-related enquiries

Advice about contacting the department can be found in the How to apply section of this page

✉ [email protected] ☎ +44 (0)1865 272876  (Oxford)

Application-process enquiries

See the application guide

Visa eligibility for part-time study

We are unable to sponsor student visas for part-time study on this course. Part-time students may be able to attend on a visitor visa for short blocks of time only (and leave after each visit) and will need to remain based outside the UK.

University of Oxford Department of Computer Science

Oxford Applied and Theoretical Machine Learning Group

OATML

Pragmatic Approaches to Fundamental Research

phd machine learning oxford

Nam vel ante sit amet libero scelerisque facilisis eleifend vitae urna

phd machine learning oxford

Kelsey Doerksen featured in Vice-Chancellor's International Women's Day Event

06 Mar 2024

DPhil student Kelsey Doerksen will be a speaker at the Vice-Chancellor’s 2024 International Women’s Day Event, a discussion featuring panellists from across the University working in AI, who will share their views on making AI a force for gender equality and inclusion.

phd machine learning oxford

Lisa Schut gives talk at the Stanford HAI Fall Conference

24 Oct 2023

Together with Been Kim, Lisa Schut gave a talk at the Stanford HAI Fall Conference on New Horizons in Generative AI: Science, Creativity, and Society on ‘Leveraging AlphaZero to Improve our Understanding & Creativity in Chess’.

phd machine learning oxford

Yarin Gal Appointed Director of Research at Frontier AI Taskforce

20 Sep 2023

Yarin Gal has been appointed as the Director of Research at the new government Frontier AI Taskforce. As the Taskforce’s first progress report explains, Yarin’s appointment reflects his status as ‘a globally recognised leader in Machine Learning’.

phd machine learning oxford

OATML Student in TIME Magazine

15 Aug 2023

DPhil student Jan Brauner has published an op-ed in TIME Magazine. He discusses the similarities between extinction risk from AI and present-day harms from AI.

phd machine learning oxford

Publications from the group

Publications

phd machine learning oxford

Reproducibility and Code

phd machine learning oxford

See who's at the group

Group Members

We are researchers coming from varied backgrounds , including Computer Science, Maths & Stats, Engineering, and Physics. We come from academia (Oxford, Cambridge, MILA, Manchester, U of Amsterdam, U of Toronto, U of Cape Town, Yale, and others) and industry (Google, DeepMind, Twitter, Qualcomm, and startups). We include 5 Rhodes Scholars , 3 Clarendon Scholars , 3 DeepMind Scholars , one cancer research UK Scholar , with our students funded by many additional sources (AIMS CDT, Cyber CDT, industry, and more). If you'd like to join us take a look here . Current group members: Yarin Gal (Associate Professor) ● Freddie Kalaitzis (Senior Research Fellow, 2020) ● Tiarnan Doherty (Postdoc, 2022) ● Hazel Kim (PhD, 2023) ● Luckeciano Carvalho Melo (PhD, 2023) ● Matthew Kearney (PhD, 2023) ● Yonatan Gideoni (PhD, 2023) ● Katrina Dickson (Associate Member (Program Manager), 2023) ● Kunal Handa (MSc by Research, 2023) ● Ilia Shumailov (Associate Member (Senior Research Fellow), 2022) ● Gunshi Gupta (PhD, 2021) ● Kelsey Doerksen (PhD, 2021) ● Ruben Weitzman (PhD, 2021) ● Shreshth Malik (PhD, 2021) ● Jannik Kossen (PhD, 2020) ● Lisa Schut (PhD, 2020) ● Muhammed Razzak (PhD, 2020) ● Tuan Nguyen (Associate Member (PhD), 2020) ● Gabriel Jones (Associate Member (PhD), 2020) ● Freddie Bickford Smith (Associate Member (PhD), 2020) ● Angus Nicolson (Associate Member (PhD), 2020) ● Atılım Güneş Baydin (Associate Member (Faculty), 2020) ● Andrew Jesson (PhD, 2019) ● Jan Brauner (PhD, 2019) ● Jishnu Mukhoti (Associate Member (PhD), 2019) ● Pascal Notin (Associate Member (Senior Research Fellow), 2019) ● Tom Rainforth (Associate Member (Faculty), 2019) ● Tim G. J. Rudner (PhD, 2017) ● Sebastian Farquhar (Associate Member (Senior Research Fellow), 2017)

Group Invited Talks

Group Collaborators and Affiliates

phd machine learning oxford

Apply to the Group

Scholarships and how to apply.

Our cookies

We use cookies for three reasons: to give you the best experience on PGS, to make sure the PGS ads you see on other sites are relevant , and to measure website usage. Some of these cookies are necessary to help the site work properly and can’t be switched off. Cookies also support us to provide our services for free, and by click on “Accept” below, you are agreeing to our use of cookies .You can manage your preferences now or at any time.

Privacy overview

We use cookies, which are small text files placed on your computer, to allow the site to work for you, improve your user experience, to provide us with information about how our site is used, and to deliver personalised ads which help fund our work and deliver our service to you for free.

The information does not usually directly identify you, but it can give you a more personalised web experience.

You can accept all, or else manage cookies individually. However, blocking some types of cookies may affect your experience of the site and the services we are able to offer.

You can change your cookies preference at any time by visiting our Cookies Notice page. Please remember to clear your browsing data and cookies when you change your cookies preferences. This will remove all cookies previously placed on your browser.

For more detailed information about the cookies we use, or how to clear your browser cookies data see our Cookies Notice

Manage consent preferences

Strictly necessary cookies

These cookies are necessary for the website to function and cannot be switched off in our systems.

They are essential for you to browse the website and use its features.

You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. We can’t identify you from these cookies.

Functional cookies

These help us personalise our sites for you by remembering your preferences and settings. They may be set by us or by third party providers, whose services we have added to our pages. If you do not allow these cookies, then these services may not function properly.

Performance cookies

These cookies allow us to count visits and see where our traffic comes from, so we can measure and improve the performance of our site. They help us to know which pages are popular and see how visitors move around the site. The cookies cannot directly identify any individual users.

If you do not allow these cookies we will not know when you have visited our site and will not be able to improve its performance for you.

Marketing cookies

These cookies may be set through our site by social media services or our advertising partners. Social media cookies enable you to share our content with your friends and networks. They can track your browser across other sites and build up a profile of your interests. If you do not allow these cookies you may not be able to see or use the content sharing tools.

Advertising cookies may be used to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but work by uniquely identifying your browser and internet device. If you do not allow these cookies, you will still see ads, but they won’t be tailored to your interests.

Statistics and Machine Learning (DPhil)

University of oxford, different course options.

  • Key information

Course Summary

Tuition fees, entry requirements, similar courses at different universities, key information data source : idp connect, qualification type.

PhD/DPhil - Doctor of Philosophy

Subject areas

Statistics Artificial Intelligence (Ai)

Course type

About the course

The Modern Statistics and Statistical Machine Learning CDT is a four-year DPhil research programme (or eight years if studying part-time). It will train the next generation of researchers in statistics and statistical machine learning, who will develop widely-applicable novel methodology and theory and create application-specific methods, leading to breakthroughs in real-world problems in government, medicine, industry and science.

This is the Oxford component of StatML, an EPSRC Centre for Doctoral Training (CDT) in Modern Statistics and Statistical Machine Learning, co-hosted by Imperial College London and the University of Oxford. The CDT will provide students with training in both cutting-edge research methodologies and the development of business and transferable skills – essential elements required by employers in industry and business.

Each student will undertake a significant, challenging and original research project, leading to the award of a DPhil. Given the breadth and depth of the research teams at Imperial College and at the University of Oxford, the proposed projects will range from theoretical to computational and applied aspects of statistics and machine learning, with a large number of projects involving strong methodological/theoretical developments together with a challenging real problem. A significant number of projects will be co-supervised with industry.

The students will pursue two mini-projects during their first year (specific timings may vary for part-time students), with the expectation that one of them will lead to their main research project. At the admissions stage students will choose a mini-project. These mini-projects are proposed by our supervisory pool and industrial partners. Students will be based at the home institution of their main supervisor of the first mini-project.

Each mini-project will be assessed on the basis of a report written by the student, by researchers from Imperial and Oxford.

All students will be initially admitted to the status of Probationer Research Student (PRS). Within a maximum of six terms as a full-time PRS student or twelve terms as a full-time PRS student, you will be expected to apply for transfer of status from Probationer Research Student to DPhil status. This application is normally made by the fourth term for full-time students and by the eighth term for part-time students.

A successful transfer of status from PRS to DPhil status will require the submission of a thesis outline. Students who are successful at transfer will also be expected to apply for and gain confirmation of DPhil status to show that your work continues to be on track. This will need to done within nine terms of admission for full-time students and eighteen terms of admission for part-time students.

Both milestones normally involve an interview with two assessors (other than your supervisor) and therefore provide important experience for the final oral examination.

Graduate destinations

This is a new course and there are no alumni yet. StatML is dedicated to providing the organisation, environment and personnel needed to develop the future industrial and academic individuals doing world-leading research in statistics for modern day science, engineering and commerce, all exemplified by ‘big data’.

UK fees Course fees for UK students

For this course (per year)

International fees Course fees for EU and international students

As a minimum, applicants should hold or be predicted to achieve the equivalent of the following UK qualifications or their equivalent: a first-class or strong upper second-class undergraduate degree with honours in mathematics, statistics, physics, computer science, engineering or a closely related subject. However, entrance is very competitive and most successful applicants have a first-class degree or the equivalent. For applicants with a degree from the USA, the minimum GPA sought is 3.6 out of 4.0.

Applied Statistics in Health Sciences (online) MSc

University of strathclyde, applied statistics msc, applied statistics in finance msc, applied statistics in finance (online) msc, applied statistics with data science (online) msc.

Prospective Students

We are always looking for talented DPhil (PhD) students. However due to high demand, each year we can only admit a small number of students from a large applicant pool.

Frequently Asked Questions

What backgrounds do you expect DPhil students to have?

DPhil students typically have a top undergraduate or masters degree from a good university. Our students typically have very strong skills in the relevant fields, which depending on the area of study is typically a combination of computer science, mathematics, statistics or information engineering. Many have research experiences before starting their DPhil.

How can I apply to do a DPhil with you?

Please check out this page at the Department of Statistics website. Apply through the official channel as linked to from the Department of Statistics page.

What sources of funding are available?

The Department has funding for UK, European and international students. However funding for international students are significantly more competitive. In addition there is a range of competitive scholarships available from Oxford colleges, as well as from overseas national governments.

Should I contact potential supervisors?

This is not necessary, though you are welcome to email them if you have specific research related questions. As many supervisors receive many more enquiries than they have time for, please do not expect replies to all emails. This does not imply that your application will be received any less favourably.

Is a DPhil different from a PhD?

DPhil is just Oxford’s terminology for PhD, though we tend to refer to DPhil students as those that are admitted directly to work with a specific supervisor or supervisors, while those admitted to centres for doctoral training as CDT students.

What is a CDT?

A CDT is a centre for doctoral training. These are specialised four-year programmes funded by EPSRC, the UK’s Engineering and Physical Sciences Research Council. The Department of Statistics along with Imperial College London runs a CDT on Modern Statistics and Statistical Machine Learning called StatML , covering all research areas of our group.

What is the difference between the DPhil and the StatML programmes?

We admit students through two streams: the standard Department of Statistics DPhil programme and the StatML CDT. The CDT programme is one year longer, and involves intensive teaching modules bringing students to the forefront of research, and starts off with two 10-week miniprojects where students get to explore two topics in depth. After their miniprojects, the students decide on their research topic and supervisor(s).

Do you accept interns?

We may accept a small number of interns, however we do not have funding for internships. Also, UKBA rules around overseas internships mean we will likely not be able to accept international internships from outside the EU.

Machine Learning in Mathematics & Theoretical Physics

ML in MTP Conference Photo

Programme description

Machine Learning in Mathematics & Theoretical Physics is an intensive one-week research school primarily designed for PhD students, with master students and more senior researchers very welcome to apply as well. The program will be held between 17-21 July 2023  at the University of Oxford and hosted by the Department of Physics.

Topics will include:

  • A quick review of ML basics
  • Knot Theory with Reinforcement Learning
  • String Theory Compactifications
  • Calabi-Yau Manifolds and Ricci-flat Metrics
  • Heuristic Search Techniques: Genetic Algorithms and Quantum Annealing
  • Classification of Fano Varieties
  • Lattice Field Theories and Generative Models

Programme Details

Dates: 17-21 July 2023

Venue:  Martin Wood Lecture Theatre, Clarendon Laboratory, Oxford OX1 3PU

Program Lecturers:  Steve Abel (Durham University),  Lara Anderson (Virginia Tech), Miranda Cheng (Academia Sinica Taiwan, Amsterdam University), James Gray (Virginia Tech), Andras Juhasz (Oxford University),   Alexander Kasprzyk (Nottingham University),  Magdalena Larfors  (Uppsala University),  Fabian Ruehle (Northeastern University)

Tutors: Callum Brodie (Virginia Tech), Thomas Harvey (Oxford University), Elli Heyes (City University London), Edward Hirst (Queen Mary University London), Luca Nutricati (Durham University), Yidi Qi (Northeastern University), Sara Veneziale (Imperial College London)

Organisers:  Andrei Constantin  (University of Oxford),  Yang-Hui He  (London Institute for Mathematical Sciences)

Administration Contact:  Michelle Jose  (Rudolf Peierls Centre for Theoretical Physics, University of Oxford)

Application procedure:

To attend the five day programme, you should first make an application here . Numbers will be limited and those interested are advised to make an early application.

Application deadline:  Wednesday, 31 st May 2023 . We will inform you about the outcome of your application shortly after the deadline. Students are asked to provide a letter of recommendation, e.g. from their PhD advisor. Any letters of recommendation should be emailed to  [email protected] with the applicant's name clearly indicated in the subject line by 31 st May 2023.  Postdoctoral and more senior applicants are not required to provide letters of recommendation. 

Fees (payable upon confirmation of successful application): Research students £150, Early career researchers £250. There will be no charge for subsistence costs. All other participants (e.g. those working in industry) will be charged a registration fee of £250 and will be asked to make their own accommodation arrangements. 

All UK-based participants must pay their own travel costs.  For overseas-based participants, support will be available to contribute towards travel costs.

If you have questions regarding the  programme, please send an email to:  [email protected] .  

Using AND between your search terms narrows your search as it instructs the database that all your search terms must appear (in any order).

For example, Engineering science AND Robotics

The Machine Learning Research group's work and people involved

phd machine learning oxford

Machine Learning Research Group

Our Research

The Machine Learning Research Group (MLRG) sits within Information Engineering in the Department of Engineering Science of the University of Oxford. We are one of the core groupings that make up the wider community of Oxford Machine Learning & AI (Artificial Intelligence). The MLRG is particularly well integrated with the Oxford-Man Institute of Quantitative Finance , being co-located in the same building and with many faculty having joint affiliation.

The sub-groups that constitute the MLRG are united in the development of robust machine learning and in its principled application to problems in science, engineering and commerce. Our work encompasses fundamental research into Bayesian theory, machine learning on graphs, physics-inspired inference and optimisation. Our methodology is similarly broad, with active research in probabilistic numerics, reinforcement learning, neural networks, Bayesian nonparametrics, Bayesian optimisation, learning theory, natural language processing, and maximum-entropy methods. Applications span numerous domains including astronomy, automation & employment, control, ecology, disaster response, finance, signal processing & multi-agent systems.

Machine Learning  -  AI  -  Data Science - Bayesian modelling

phd machine learning oxford

Who's involved

Professor Natalia Ares

Professor Natalia Ares

Associate Professor

Dr Jan-Peter Calliess

Dr Jan-Peter Calliess

Senior Research Fellow

Dr Xiaowen Dong

Dr Xiaowen Dong

Departmental Lecturer

Professor Michael Osborne

Professor Michael Osborne

Professor of Engineering Science

Dr Steven Reece

Dr Steven Reece

Professor Stephen Roberts

Professor Stephen Roberts

RAEng/Man Group Chair in Machine Learning

Dr Stefan Zohren

Dr Stefan Zohren

Senior Research Fellow, Oxford-Man Institute

More about this group

EPSRC CDT in Statistics and Machine Learning at Imperial College London and the University of Oxford

Doctoral project.

Each student will undertake a significant, challenging and original research project, leading to the award of a PhD (at Imperial) or a DPhil (at Oxford). Given the breadth and depth of the research teams at Imperial College and at the University of Oxford, the proposed projects will range from theoretical to computational and applied aspects of statistics and machine learning, with a large number of projects involving strong methodological/theoretical developments together with a challenging real problem. A significant number of projects will be co-supervised with industry.

The students will pursue 2 mini-projects during their first year, with the expectation that one of them will lead to  their main research project .

The process for students and projects to be matched up will be as follows. At the admission stage, students will choose one individual mini-project. These mini-projects are proposed by our supervisory pool and our industrial partners. Students will be based at the home institution of the main supervisor of the first mini-project.

During their first 3 months in the CDT, students will be working on this mini-project. During months 4-6 of their PhD, students will be working on a second mini-project. For students whose studentship is funded or co-funded by an external partner, the second mini-project will be with the same external partner – but it will explore a different question. Each mini-project will be assessed, on the basis of a report written by the student, by researchers from Imperial and Oxford. The students will then begin their main PhD/DPhil project, which can be based on one of the two mini-projects.

Course details

Artificial intelligence concepts: introduction to machine learning (online).

There are no time-tabled sessions on this course. Using a specially designed virtual learning environment this online course guides students through weekly pathways of directed readings and learning activities. Students interact with their tutor and the other course participants through tutor-guided, text-based forum discussions. There are no ‘live-time’ video meetings meaning you can study flexibly in your own time under the direct tuition of an expert. For further information please click here

artificial intelligence,  n.

The capacity of computers or other machines to exhibit or simulate intelligent behaviour; the field of study concerned with this.

source: Oxford English Dictionary

Artificial Intelligence (AI) has become ingrained in the fabric of our society, often in seamless and pervasive ways that may escape our attention day-to-day. The ability of machines to sense, process information, make decisions and learn from experience is a transformative tool for organisations, from governments to big business. However, these technologies pose challenges, including social and ethical dilemmas.

This Introduction to Machine Learning sheds light on the methods at the heart of the AI revolution, introducing fundamental concepts motivating Machine Learning and differentiating types of Machine Learners. This course studies various approaches covering supervised and unsupervised learning, including clustering, neural networks, and deep learning. It considers the application of Machine Learning to long-standing problems like natural language processing and the challenges and opportunities Machine Learning presents for the global economy. It is aimed at a general audience, including professionals whose work brings them into contact with AI and those with no more than a passing acquaintance with AI.

This is part of a series of courses that aim to confer an appreciation of how AI has already transformed our world, explain the fundamental concepts and workings of AI, and equip us with a better understanding of how AI will shape our society so that we can converse fluently in the language of the future.

This course makes extensive use of mathematical notation consistent with its level as a first-year undergraduate course (FHEQ level 4) in a mathematically-inclined discipline, such as economics, engineering, or computer science. The course does not involve any coding and instead focuses on concepts in Artificial Intelligence for a general audience.

Programme details

Unit 1: Computational learning theory

  • Mathematically formalising the process of learning
  • The theory behind machine learning algorithms
  • PAC: Probably Approximately Correct learning
  • Hypothesis spaces and model accuracy
  • The theoretical lower bound in learning theory

Unit 2: Associative memories

  • Listing versus associative memories
  • Data storage in biological neural networks
  • The Hopfield model with the Hebbian and Storkey learning rules
  • Capacity and Hopfield networks

Unit 3: Supervised learning

  • Identifying and formulating supervised learning problems
  • Passing inputs and outputs to learning algorithms
  • Performance metrics for learning algorithms
  • The theoretical limitations of supervised learning

Unit 4: Unsupervised learning

  • Supervised versus unsupervised learning
  • K-means clustering
  • Methods of hierarchical clustering
  • The Apriori algorithm
  • Principal component analysis

Unit 5: Reinforcement learning

  • Concepts and terminologies in reinforcement learning
  • Markov decision processes and Bellman equations
  • Implementing reinforcement learning
  • Understanding Q-learning
  • Identifying applications of reinforcement learning

Unit 6: Neural networks

  • The analogy between the human brain and artificial neural nets
  • The McCulloch-Pitts neuron
  • The architecture of a multilayer perceptron
  • Loss functions and the backpropagation algorithm
  • Training neural networks

Unit 7: Deep learning

  • Deep learning methods and representation learning
  • Convolution and pooling layers for image recognition
  • The inner workings of recurrent neural networks

Unit 8: Natural Language Processing

  • Defining natural language processing
  • Classifying natural language processing applications
  • Real-world applications of natural language processing

Unit 9: Where Machine Learning can go wrong

  • How biases arise in data collection and analysis
  • The misuse and malicious use of machine learning
  • Poor model fitting: underfitting and overfitting
  • Defending against attacks on machine learning algorithms

Unit 10: Effects of AI on the world economy

  • The historical impact of innovation on job markets and the global economy
  • Expert predictions on the effects of Artificial Intelligence innovation
  • Winners and losers: fields and occupations impacted by Artificial Intelligence
  • Global domestic product and the economic impact of Artificial Intelligence
  • The AI revolution, policymakers and governments

Recommended reading

There is no essential reading associated with this course.

Digital Certification

Credit Application Transfer Scheme (CATS) points 

To earn credit (CATS points) for your course you will need to register and pay an additional £10 fee for each course you enrol on. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online. If you do not register when you enrol, you have up until the course start date to register and pay the £10 fee. 

See more information on CATS point

Coursework is an integral part of all online courses and everyone enrolled will be expected to do coursework, but only those who have registered for credit will be awarded CATS points for completing work at the required standard. If you are enrolled on the Certificate of Higher Education, you need to indicate this on the enrolment form but there is no additional registration fee. 

Digital credentials

All students who pass their final assignment, whether registered for credit or not, will be eligible for a digital Certificate of Completion. Upon successful completion, you will receive a link to download a University of Oxford digital certificate. Information on how to access this digital certificate will be emailed to you after the end of the course. The certificate will show your name, the course title and the dates of the course you attended. You will be able to download your certificate or share it on social media if you choose to do so. 

Please note that assignments are not graded but are marked either pass or fail. 

Dr Noureddin Sadawi

Dr Noureddin Sadawi specialises in machine/deep learning and data science. He has several years’ experience in various areas involving data manipulation and analysis. He received his PhD from the University of Birmingham. He is the winner of two international scientific software development contests - at TREC2011 and CLEF2012.

Noureddin is an avid scientific software researcher and developer with a passion for learning and teaching new technologies. He is an experienced scientific software developer and data analyst. Over the last few years, he has been using R and Python as his preferred programming languages.

He has also been involved in several projects spanning a variety of fields such as bioinformatics, textual/image/video data analysis, drug discovery, omics data analysis and computer network security. He has taught at multiple universities in the UK and has worked as a software engineer in different roles. Currently he holds the following part-time roles: senior content developer and lecturer at the University of London; international trainer with O'Reilly and Pearson; short course trainer and instructor at Goldsmiths University, London as well as a lecturer at the University of Oxford. He is the founder of SoftLight LTD , a London-based company that specialises in data science and machine/deep learning where he works as a consultant providing advice and expertise in these areas. Currently he is a member of the organising committee of this international conference: https://ilcict.ly/ . A list of his publications can be found here .

Course aims

  • To introduce fundamental concepts underpinning Machine Learning approaches
  • To study a wide variety of Machine Learning tools, from clustering to Deep Learning
  • To identify the opportunities and challenges presented by modern Machine Learning methods

Learning outcomes

By the end of this course, students should:

  • Understand the concepts and theories underpinning Machine Learning
  • Understand the different types of Machine Learners
  • Have detailed knowledge of specific Machine Learning methodologies
  • Be able to identify suitable Machine Learning approaches for real-world applications
  • Be able to critically assess the potential impact and pitfalls of Machine Learning tools.

Assessment methods

You will be set two pieces of work for the course. The first of 500 words is due halfway through your course. This does not count towards your final outcome but preparing for it, and the feedback you are given, will help you prepare for your assessed piece of work of 1,500 words due at the end of the course. The assessed work is marked pass or fail.

English Language Requirements

We do not insist that applicants hold an English language certification, but warn that they may be at a disadvantage if their language skills are not of a comparable level to those qualifications listed on our website. If you are confident in your proficiency, please feel free to enrol. For more information regarding English language requirements please follow this link:  https://www.conted.ox.ac.uk/about/english-language-requirements

Application

Please use the 'Book' or 'Apply' button on this page. Alternatively, please complete an Enrolment form for short courses | Oxford University Department for Continuing Education

Level and demands

FHEQ level 4, 10 weeks, approx 10 hours per week, therefore a total of about 100 study hours.

IT requirements

This course is delivered online; to participate you must to be familiar with using a computer for purposes such as sending email and searching the Internet. You will also need regular access to the Internet and a computer meeting our recommended  minimum computer specification.

Terms & conditions for applicants and students

Information on financial support

View a sample page to see if this course is for you

phd machine learning oxford

Michael Hutchinson

Phd student in statistical machine learning at the university of oxford, statistics department, university of oxford.

Hi I’m Michael! I’m interested in all things Machine Learning!

I am currently a PhD student at the University of Oxford through the StatML course, supervised by Yee Whye Teh and Max Welling . Before that I completed a Masters of Engineering at the University of Cambridge, supervised by Dr Rich E. Turner .

Recently I have been working on two main topics: Diffusion models and related topics and Geometric and Equivariant Deep/Probabilistic Learning

Previously I’ve worked on COVID-19 statistical modeling efforts , Federated Learning , Architecture Search of Bayesian Neural Networks , and Differential Privacy for Federated and Continual Bayesian Learning .

Broadly I’m interested in interface between statistical and deep learning, and trying to bring more principled statistical methods into deep learning.

  • Geometric Learning
  • Bayesian Machine Learning
  • Climbing, Hockey, Rowing
  • Reading Fiction and Philosophy

PhD in Statistical Machine Learning, 2019-2023

University College, University of Oxford

MEng in Information and Computer Engineering, 2018-2019

Christs College, University of Cambridge

BA in Engineering, 2015-2018

Publications

Geometric neural diffusion processes, metropolis sampling for constrained diffusion models, diffusion models for constrained domains, riemannian score-based generative modelling, spectral diffusion processes, riemannian diffusion schrödinger bridge, federeated functional variational inference, vector-valued gaussian processes on riemannian manifolds via gauge equivariant projected kernels, efficient bayesian inference of instantaneous re-production numbers at fine spatial scales, with an application to mapping and nowcasting the covid-19 epidemic in british local authorities, lietransformer: equivariant self-attention for lie groups, equivariant learning of stochastic fields: gaussian processes and steerable conditional neural processes, age groups that sustain resurging covid-19 epidemics in the united states, technical document 3: effectiveness and resource requirements of test, trace and isolate strategies, state-level tracking of covid-19 in the united states, report 21: estimating covid-19 cases and reproduction number in brazil, a sub-national analysis of the rate of transmission of covid-19 in italy, differentially private federated variational inference.

Statistical Machine Learning

Prof. pier palamara, university of oxford, hilary term 2020.

Course offered to Part B students (SB2b) and MSc students (SM4)

New: Revisions (Trinity Term)

  • Part B 2017 Q1, Q2 (excluding c), Q3
  • Part C 2016 Q1 (a-b), Q2 (a-b)
  • Part C 2015 Q1, Q2 (a-c), Q3 (b)
  • MSc 2017 Q7, Q8 (excluding b-c)
  • MSc 2016 Q6 (a-b), Q7
  • MSc 2014 Q7
  • MSc 2012 Q6, Q7
  • MSc and Part B 2018
  • MSc and Part B 2019

General Information

Recommended prerequisites: Part A A9 Statistics and A8 Probability. SB2a Foundations of Statistical Inference useful by not essential.

Aims and Objectives: Machine learning studies methods that can automatically detect patterns in data, and then use these patterns to predict future data or other outcomes of interest. It is widely used across many scientific and engineering disciplines. This course covers statistical fundamentals of machine learning, with a focus on supervised learning and empirical risk minimisation. Both generative and discriminative learning frameworks are discussed and a variety of widely used classification algorithms are overviewed.

Synopsis Visualisation and dimensionality reduction: principal components analysis, biplots and singular value decomposition. Multidimensional scaling. K-means clustering. Introduction to supervised learning. Evaluating learning methods with training/test sets. Bias/variance trade-off, generalisation and overfitting. Cross-validation. Regularisation. Performance measures, ROC curves. K-nearest neighbours as an example classifier. Linear models for classification. Discriminant analysis. Logistic regression. Generative vs Discriminative learning. Naive Bayes models. Decision trees, bagging, random forests, boosting. Neural networks and deep learning.

Reading C. Bishop, Pattern Recognition and Machine Learning, Springer, 2007. T. Hastie, R. Tibshirani, J Friedman, Elements of Statistical Learning, Springer, 2009. K. Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012.

Further Reading B. D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, 1996. G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.

Problem Sheets

Resources for python.

  • Google's Python Class

Resources for R

  • Installation
  • Part A Statistical Programming at Oxford
  • DataCamp tutorial
  • Coursera R programming course
  • Intro to tidyverse (advanced)

Background Review Aids

  • Matrix and Gaussian identities - short useful reference for machine learning.
  • Linear Algebra Review and Reference - useful selection for machine learning.
  • Video reviews on Linear Algebra by Zico Kolter.
  • Video reviews on Multivariate Calculus and SVD by Aaditya Ramdas.
  • The Matrix Cookbook - extensive reference.

Leave your feedback to the instructors, anonymously:    click here

OxML2024- Logo.png

  • Are OxML summer schools organised by the University of Oxford? No. OxML schools are organised by AI for Global Goals (an independent company). Note that, CIFAR and Oxford University's Deep Medicine Program are the collaborators/partners in running this year's school, i.e., they contribute with their expertise in areas such as designing the program, contributing speakers, and so on.
  • Who can apply for OxML school? Everyone is welcome to apply for our machine learning summer schools regardless of their origin, nationality, and country of residence. Our primary target audience are (1) PhD students with good technical background whose research topics are related to ML, plus (2) researchers and engineers in both academia and industry with similar/advanced levels of technical knowledge. All applicants are subject to a selection process; we aim to select strongly motivated participants, who are interested in broadening their knowledge of the advanced topics in the field of ML/DL and their applications.
  • Can I apply for multiple/all of the courses offered by AI for Global Goals this year? Everyone who believes they would benefit from attending multiple or all of our courses, is welcome to apply for our upcoming ML courses
  • Do you issue a certificate of completion? Yes, participants will receive an electronic certificate of attendance at the end of the school including the subjects covered during OxML school and number of hours of lectures, workshops, unconference sessions, etc. This information can be used by each university to calculate the corresponding number of ECTS credits. Note: Certificates of attendance issued for the OxML participants are issued by AI for Global Goals and not endorsed (e.g., as a degree of certificate) by any other organisations or Universities.
  • I've attended the previous editions of the OxML schools. Can I apply again?" Yes, you can still attend OxML schools this year if your application is successful. Attending the previous editions of our ML schools will not influence your chance of being accepted this year.
  • What is the registration fee and what does it cover? There is no application fee, however, successful participants will need to pay a registration fee to secure their seats at the school. The registration fee for participants attending the in-person event in Oxford (UK) will cover access to all the lectures, refreshments, lunch, and workshops during the event. All participants, regardless of attending in-person or online, will have access to our community portal containing the lecture notes and lecture recordings for a period of time after the event. More details about the registration fee will be shared in the application form.
  • How can I book an accommodation in Oxford? All participants are responsible for securing their own accommodations in Oxford during OxML Summer School. If you are looking for an accommodation close to the Oxford Mathematical Institute, there are a number of options available: You can visit the Conference Oxford website (https://conference-oxford.com/bb-self-catering) which offers a range of B&B options at the University. Alternatively, there are a number of hotels located close to the Oxford Mathematical Institute that you can book through various online booking platforms. We recommend booking your accommodation as early as possible to secure your preferred option, as Oxford is a popular destination for tourists and students alike.
  • Is there any scholarships available? Who is eligible to apply for this grant? Successful applicants who are currently in full-time employment (or are full-time students) in organisations that are based in low-middle-income countries (LMIC) will be entitled to a 50% discount for the ONLINE registration fee. There are a limited number of scholarships (discount on the registration fee) available for full-time students from under-represented groups in AI, who would like to attend the school in-person. You can apply for the scholarship by emailing us on [email protected] giving a short justification for your request. The funding will vary on a case-by-case basis and will cover a portion of the registration fee.
  • Can the organisers waive the registration fee? All successful applicants are required to pay the registration fee. However, those employed by/studying in organisations that are based in LMIC, will be entitled to a 50% discount on the registration fee for the ONLINE format of the school.
  • How can I add my OxML experience/certificate to my LinkedIn page? On your LinkedIn profile, select the "Licenses & Certifications”, right below the Education section. In that field, you can type in your OxML certificate of participation information. A menu displaying companies will appear as you type in the correct Issuing Organisation (AI for Global Goals) or (OxML) not the University of Oxford!
  • Do I need a visa to attend the summer school? Depending on your nationality, you might need a visa to travel to the UK. This would be a short-stay, single or multiple entry visa. Please check this page to see if you need a UK visa. If you get accepted to the school and you need a visa, contact us at [email protected] and we will send you an invitation letter.
  • Can I download the lecture recordings? All lecture recordings are available for OxML participants to watch online for a limited time (about 2 months) after the end of each school. You can download all the lecture notes/handouts, however the recordings are not downloadable.
  • My application was not successful this year. Why have I not been accepted? We want to ensure an engaging and diverse audience. Our diverse global participants are selected using a ranking system that favours applicants who are graduate students, whose research areas are closer to the scope of the school, and would benefit more from the offered courses.
  • What is the cancellation policy? If, for any reason, you have to cancel your registration, 30 days or more prior to the start of the school, you can be refunded up to 50% of the registration fee. If you cancel your registration less than 30 days prior to the start of the event, then you might not be entitled to receive any refunds. Note: if you need a visa to travel to the UK, please apply for it as soon as you receive the notification of acceptance. If you delay your visa application and won't be able to attend the school, you might not be eligible to receive a refund.
  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Media
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business Ethics
  • Business History
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic History
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Computational Linguistics (2nd edn)

  • < Previous chapter
  • Next chapter >

13 Machine Learning

Raymond J. Mooney is a Professor in the Department of Computer Science at the University of Texas at Austin. He received his PhD in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 170 published research papers, primarily in the areas of machine learning and natural language processing. He was President of the International Machine Learning Society from 2008 to 2011 and is a Fellow of AAAI, ACM, and ACL.

  • Published: 05 April 2018
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter introduces symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. It also briefly reviews unsupervised learning, in which new concepts are formed from unannotated examples by clustering them into coherent groups. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word sense disambiguation and anaphora resolution. Applications to a variety of these problems are reviewed.

13.1 Introduction

Broadly interpreted, machine learning is the study of computational systems that improve performance on some task with experience ( Langley 1996 ; Mitchell 1997 ). However, the term is sometimes used to refer specifically to methods that represent learned knowledge in a declarative, symbolic form as opposed to more numerically orientated statistical or neural-network training methods (see Chapter 11 ). In particular, we will review supervised learning methods that represent learned knowledge in the form of interpretable decision trees, logical rules, and stored instances. Supervised learning methods acquire knowledge from data that a human expert has explicitly annotated with category labels or structural information such as parse trees. Decision trees are classification functions represented as trees in which the nodes are feature tests, the branches are feature values, and the leaves are class labels. Rules are implications in either propositional or predicate logic used to draw deductive inferences from data. A variety of algorithms exist for inducing knowledge in both of these forms from training examples. In contrast, instance-based (case-based, memory-based) methods simply remember past training instances and make a decision about a new case based on its similarity to specific past examples. This chapter reviews basic methods for each of these three supervised approaches to symbolic machine learning. Specifically, we review top-down induction of decision trees, rule induction (including inductive logic programming), and nearest-neighbour instance-based learning methods. We also review a couple of methods for unsupervised learning , which does not require expert human annotation of examples, but rather forms its own concepts by clustering unlabelled examples into coherent groups.

As described in previous chapters, understanding natural language requires a large amount of knowledge about morphology, syntax, semantics, and pragmatics as well as general knowledge about the world. Acquiring and encoding all of this knowledge is one of the fundamental impediments to developing effective and robust language-processing systems. Like the statistical methods (see Chapters 11 and 12 ), machine learning methods offer the promise of automating the acquisition of this knowledge from annotated or unannotated language corpora. A potential advantage of symbolic learning methods over statistical methods is that the acquired knowledge is represented in a form that is more easily interpreted by human developers and more similar to representations used in manually developed systems. Such interpretable knowledge potentially allows for greater scientific insight into linguistic phenomena, improvement of learned knowledge through human editing, and easier integration with manually developed systems. Each of the machine learning methods we review has been applied to a variety of problems in computational linguistics, including morphological generation and analysis, part-of-speech tagging, syntactic parsing, word-sense disambiguation, semantic analysis, information extraction, and anaphora resolution. We briefly survey some of these applications and summarize the current state of the art in the application of symbolic machine learning to computational linguistics.

13.2 Supervised Learning for Categorization

Most machine learning methods concern the task of categorizing examples described by a set of features. Supervised learning methods train on a set of specific examples which a human expert has labelled with the correct category and induce a general function for categorizing future unlabelled examples. It is generally assumed that a fixed set of n discrete-valued or real-valued features, { f 1 , …, f n }, is used to describe examples, and that the task is to assign an example to one of m disjoint categories { c 1 , …, c m }. For example, consider the task of deciding which of the following three sense categories is the correct interpretation of the semantically ambiguous English noun ‘interest’ given a full sentence in which it appears as context (see Chapter 27 ).

c 1 : readiness to give attention

c 2 : advantage, advancement, or favour

c 3 : money paid for the use of money

The following might be a reasonable set of features for this problem:

W+ i : the word appearing i positions after ‘interest’ for i = 1,2,3

W− i : the word appearing i positions before ‘interest’ for i = 1,2,3

K i : a binary-valued feature for a selected keyword for 1 = 1, …, k , where K i is true if the i th keyword appears anywhere in the current sentence. For example, some relevant keywords for ‘interest’ might be ‘attracted’, ‘expressed’, ‘payments’, and ‘bank’.

The learning system is given a set of supervised training examples for which the correct category is given. For example:

c 1 : ‘John expressed a strong interest in computers.’

c 2 : ‘War in East Timor is not in the interest of the nation.’

c 3 : ‘Acme Bank charges very high interest.’

Learning curves for disambiguating ‘line’

In this case, the values of the relevant features must first be determined in a straightforward manner from the text of the sentence. From these labelled examples, the system must produce a procedure for accurately categorizing future examples.

Categorization systems are typically evaluated on the accuracy of their predictions as measured by the percentage of examples that are correctly classified. Experiments for estimating this accuracy for a particular task are performed by randomly splitting a representative set of labelled examples into two sets, a training set used to induce a classifier, and an independent and disjoint test set used to measure its classification accuracy. Averages over multiple splits of the data into training and test sets provide more accurate estimates and give information on the variation in performance across training and test samples. Since labelling large amounts of training data can be a time-consuming task, it is also useful to look at learning curves in which the accuracy is measured repeatedly as the size of the training set is increased, providing information on how well the system generalizes from various amounts of training data. Figure 13.1 shows sample learning curves for a variety of systems on a related task of semantically disambiguating the word ‘line’ into one of six possible senses ( Mooney 1996 ). Mitchell (1997) provides a basic introduction to machine learning, including discussion on experimental evaluation.

13.2.1 Decision Tree Induction

Decision trees are classification functions represented as trees in which the nodes are feature tests, the branches are feature values, and the leaves are class labels. Here we will assume that all features are discrete-valued; however, the approach has been extended to continuous features as well. An example is classified by starting at the root and recursively traversing the tree to a leaf by following the path dictated by its feature values. A sample tree for the ‘interest’ problem is shown in Figure 13.2 . For simplicity, assume that all of the unseen extra branches for W + 1 and W + 2 are leaves labelled c 1 . This tree can be paraphrased as follows: If the word ‘bank’ appears anywhere in the sentence assign sense c 3 ; otherwise if the word following ‘interest’ is ‘rate’, assign sense c 3 , but if the following word is ‘of’ and the word two before is ‘in’ (as in ‘ … in the interest of … ’), then assign sense c 2 ; in all other cases assign sense c 1 .

Sample decision tree for disambiguating ‘interest’

The goal of learning is to induce a decision tree that is consistent with the training data. Since there are many trees consistent with a given training set, most approaches follow ‘Occam’s razor’ and try to induce the simplest tree according to some complexity measure, such as the number of leaves or the depth of the tree. Since computing a minimal tree according to such measures is an NP-hard problem (i.e. a computational problem for which there is no known efficient, polynomial-time algorithm), most algorithms perform a fairly simple greedy search to efficiently find an approximately minimal tree. The standard approach is a divide-and-conquer algorithm that constructs the tree top-down, first picking a feature for the root of the tree and then recursively creating subtrees for each value of the selected splitting feature. Pseudocode for such an algorithm is shown in Figure 13.3 .

The size of the constructed tree critically depends on the heuristic used to pick the splitting feature. A standard approach is to pick the feature that maximizes the expected reduction in the entropy, or disorder, of the data with respect to category ( Quinlan 1986 ).

The entropy of a set of data, S , with respect to category is defined as:

where S i is the subset of S in category i (1 ≤ i ≤ m ). The closer the data is to consisting purely of examples in a single category, the lower the entropy. A good splitting feature fractions the data into subsets with low entropy. This is because the closer the subsets are to containing examples in only a single category, the closer they are to terminating in a leaf, and the smaller will be the resulting subtree. Therefore, the best split is selected as the feature, f i , that results in the greatest information gain defined as:

Decision tree induction algorithm

where j ranges over the possible values v ij of f i , and S ij is the subset of S with value v ij for feature f i . The expected entropy of the resulting subsets is computed by weighting their entropies by their relative size | S ij |/| S |.

The resulting algorithm is computationally very efficient (linear in the number of examples) and in practice can quickly construct relatively small trees from large amounts of data. The basic algorithm has been enhanced to handle many practical problems that arise when processing real data, such as noisy data, missing feature values, and real-valued features ( Quinlan 1993 ). Consequently, decision tree methods are widely used in data mining applications where very large amounts of data need to be processed ( Fayyad et al. 1996 ). The most effective recent improvements to decision tree algorithms have been methods for constructing multiple alternative decision trees from the same training data, and then classifying new instances based on a weighted vote of these multiple hypotheses ( Quinlan 1996 ).

Sample rules for disambiguating ‘interest’

13.2.2 Rule Induction

Classification functions can also be symbolically represented by a set of rules, or logical implications. This is equivalent to representing each category in disjunctive normal form (DNF), i.e. a disjunction of conjunctions of feature-value pairs, where each rule is a conjunction corresponding to a disjunct in the formula for a given category. For example, the decision tree in Figure 13.2 can also be represented by the rules in Figure 13.4 , assuming that c 1 is the default category that is assigned if none of the rules apply.

Decision trees can be translated into a set of rules by creating a separate rule for each path from the root to a leaf in the tree ( Quinlan 1993 ). However, rules can also be directly induced from training data using a variety of algorithms ( Langley 1996 ; Mitchell 1997 ). The general goal is to construct the smallest rule set (the one with the least number of symbols) that is consistent with the training data. Again, the problem of learning the minimally complex hypothesis is NP-hard, and therefore heuristic search is typically used to induce only an approximately minimal definition. The standard approach is to use a form of greedy set-covering, where at each iteration, a new rule is learned that attempts to cover the largest set of examples of a particular category without covering examples of other categories. These examples are then removed, and additional rules are learned to cover the remaining examples of the category. Pseudocode for this process is shown in Figure 13.5 , where ConstructRule( P, N, Features ) attempts to learn a conjunction covering as many of the positive examples in P as possible without covering any of the negative examples in N .

Rule induction covering algorithm

There are two basic approaches to implementing ConstructRule. Top-down ( general-to-specific ) approaches start with the most general ‘empty’ rule ( True → c i ), and repeatedly specialize it until it no longer covers any of the negative examples in N. Bottom-up ( specific-to-general ) approaches start with a very specific rule whose antecedent consists of the complete description of one of the positive examples in P , and repeatedly generalize it until it begins to cover negative examples in N . Since top-down approaches tend to construct simpler (more general) rules, they are generally more popular. Figure 13.6 presents a top-down algorithm based on the approach used in the Foil system ( Quinlan 1990 ). At each step, a new condition, f i = v i j ⁠ , is added to the rule and the examples that fail to satisfy this condition are removed. The best specializing feature-value pair is selected based on preferring to retain as many positive examples as possible while removing as many negatives as possible. A gain heuristic analogous to the one used in decision trees can be defined as follows:

where P i j and N i j are as defined in Figure 13.6 . The first term, | P i j | ⁠ , encourages coverage of a large number of positives and the second term encourages an increase in the percentage of covered examples that are positive (decrease in the percentage of covered examples that are negative).

This and similar rule-learning algorithms have been demonstrated to efficiently induce small and accurate rule sets from large amounts of realistic data. Like decision tree methods, rule-learning algorithms have also been enhanced to handle noisy data and real-valued features ( Clark and Niblett 1989 ; Cohen 1995 ). More significantly, they also have been extended to learn rules in first-order predicate logic, a much richer representation language. Predicate logic allows for quantified variables and relations and can represent concepts that are not expressible using examples described as feature vectors. For example, the following rules, written in Prolog syntax (where the conclusion appears first), define the relational concept of an uncle:

uncle(X,Y):- brother(X,Z), parent(Z,Y). uncle(X,Y):- husband(X,Z), sister(Z,W), parent(W,Y).

Top-down rule construction algorithm

The goal of inductive logic programming (ILP) or relational learning is to infer rules of this sort, given a database of background facts and logical definitions of other relations ( Lavrač and Džeroski 1994 ). For example, an ILP system can learn the above rules for uncle (the target predicate ), given a set of positive and negative examples of uncle relationships and a set of facts for the relations parent, brother, sister, and husband (the background predicates ) for the members of a given extended family, such as:

uncle(Tom,Frank),uncle(Bob,John), ¬ uncle(Tom,Cindy), ¬ uncle(Bob,Tom) parent(Bob,Frank), parent(Cindy,Frank), parent(Alice,John), parent(Tom,John), brother(Tom,Cindy), sister(Cindy,Tom), husband(Tom,Alice), husband(Bob,Cindy).

Alternatively, logical definitions for brother and sister could be supplied and these relations could be inferred from a more complete set of facts about only the ‘basic’ predicates: parent, spouse, and gender.

The rule construction algorithm in Figure 13.6 is actually a simplification of the method used in the Foil ILP system ( Quinlan 1990 ). In the case of predicate logic, Foil starts with an empty rule for the target predicate ( ⁠ P ( X 1 , … , X r ) : − . ⁠ ) and repeatedly specializes it by adding conditions to the antecedent of the rule chosen from the space of all possible literals of the following forms:

Q i ( V 1 , … , V r )

n o t ( Q i ( V 1 , … , V r ) )

n o t ( X i = X j )

where Q i are the background predicates, X i are the existing variables used in the current rule, and V 1 , … , V r are a set of variables where at least one is an existing variable (one of the X i ) but the others can be newly introduced. A slight modification of the Gain heuristic in equation (13.3) is used to select the best literal.

ILP systems have been used to successfully acquire interesting and comprehensible rules for a number of realistic problems in engineering and molecular biology, such as determining the cancer-causing potential of various chemical structures ( Bratko and Muggleton 1995 ). Unlike most methods which require ‘feature engineering’ to reformat examples into a fixed list of features, ILP methods can induce rules directly from unbounded data structures such as strings, stacks, and trees (which are easily represented in predicate logic). However, since they are searching a much larger space of possible rules in a more expressive language, they are computationally more demanding and therefore are currently restricted to processing a few thousand examples compared to the millions of examples that can be potentially handled by feature-based systems.

13.2.3 Instance-Based Categorization

Unlike most approaches to learning for categorization, instance-based learning methods (also called case-based or memory-based methods) do not construct an abstract function definition, but rather categorize new examples based on their similarity to one or more specific training examples ( Stanfill and Waltz 1986 ; Aha et al. 1991 ). Training generally requires just storing the training examples in a database, although it may also require indexing the examples to allow for efficient retrieval. Categorizing new test instances is performed by determining the closest examples in the database according to some distance metric.

For real-valued features, the standard approach is to use Euclidean distance , where the distance between two examples is defined as:

where f i ( x ) is the value of the feature f i for example x . For discrete-valued features, the difference, ( f i ( x ) − f i ( y ) ) ⁠ , is generally defined to be 1 if they have the same value for f i and 0 otherwise (i.e. the Hamming distance ). In order to compensate for differences in scale between different features, the values of all features are frequently rescaled to the interval [0,1]. An alternative metric widely used in information retrieval and other language applications is cosine similarity ( Manning et al. 2008 ), which uses the cosine of the angle between two examples’ feature vectors as a measure of their similarity. Cosine similarity is easily computed using the following formula:

Since 0 ≤ C o s S i m ( x , y ) ≤ 1 ⁠ , we can transform cosine similarity into a distance measure by using d ( x , y ) = 1 − C o s S i m ( x , y ) ⁠ . Such an angle-based distance metric has been found to be more effective for the high-dimensional sparse feature vectors frequently found in language applications. Intuitively, such distance measures are intended to measure the dissimilarity of two examples.

A standard algorithm for categorizing new instances is the k-nearest-neighbour method ( Cover and Hart 1967 ). The k closest examples to the test example according to the distance metric are found, and the example is assigned to the majority class for these examples. Pseudocode for this process is shown in Figure 13.7 . The reason for picking k examples instead of just the closest one is to make the method robust by basing decisions on more evidence than just one example, which could be noisy. To avoid ties, an odd value for k is normally used; typical values are 3 and 5.

K-nearest-neighbour categorization algorithm

The basic nearest-neighbour method has been enhanced with techniques for weighting features in order to emphasize features that are most useful for categorization, and for selecting a subset of examples for storage in order to reduce the memory requirements of retaining all training examples ( Stanfill and Waltz 1986 ; Aha et al. 1991 ; Cost and Salzberg 1993 ).

13.3 Unsupervised Learning by Clustering

In many applications, obtaining labelled examples for supervised learning is difficult or expensive. Unlike supervised categorization, clustering is a form of unsupervised learning that creates its own categories by partitioning unlabelled examples into sets of similar instances. There are many clustering methods based on either a measure of instance similarity or a generative probabilistic model ( Manning and Schütze 1999 ; Jain et al. 1999 ). This section briefly reviews two widely used clustering methods, hierarchical agglomerative and k-means, that are both based on instance similarity metrics such as those used in supervised instance-based categorization as discussed in section 13.2.3 .

13.3.1 Hierarchical Agglomerative Clustering

Hierarchical agglomerative clustering (HAC) is a simple iterative method that builds a complete taxonomic hierarchy of classes given a set of unlabelled instances. HAC constructs a binary-branching hierarchy bottom-up, starting with an individual instance in each group and repeatedly merging the two most similar groups to former larger and larger clusters until all instances are grouped into a single category at the root. For example, the hierarchy in Figure 13.8 might be constructed from six examples of the word ‘interest’ used in context. In this simplified example, the three senses of ‘interest’ discussed earlier have been automatically ‘discovered’ as the three lowest internal nodes in this tree.

Sample hierarchical clustering for the word ‘interest’

Pseudocode for the HAC algorithm is shown in Figure 13.9 . The function D i s t a n c e ( c i , c j ) measures the dissimilarity of two clusters, which are both sets of examples. There are several alternative methods for determining the distance between clusters based on the distances between their individual instances. Assuming that the dissimilarity or distance between two instances is measured using the function d(x, y ), the three standard approaches are:

Hierarchical agglomerative clustering algorithm

Single Link: Cluster distance is based on the closest instances in the two clusters:

Complete Link: Cluster distance is based on the farthest instances in the two clusters:

Group Average: Cluster distance is based on the average distance between two instances in the merged cluster:

The distance measure between instances, d(x, y) , can be any of the standard metrics used in instance-based learning as discussed in section 13.2.3 .

One of the primary limitations of HAC is its computational complexity. Since it requires comparing all pairs of instances, the first minimization step takes O ( n 2 ) time, where n is the number of examples. For the large text corpora typically needed to perform useful unsupervised language learning, this quadratic complexity can be problematic.

13.3.2 K-Means Clustering

A common alternative approach to clustering is the k-means algorithm, which, as its name implies, computes a set of k mean vectors, each representing a prototypical instance of one of the clusters. Similar to instance-based learning, an example is assigned to a cluster based on which of the k prototype vectors it is closest to. Rather than building a hierarchical taxonomy like HAC, k-means produces a flat clustering, where the number of clusters, k , must be provided by the user. The algorithm employs an iterative refinement approach, where in each iteration the algorithm produces a new and improved partitioning of the instances into k clusters until it converges to a fix-point. To initialize the process, the mean vectors are simply set to a set of k instances, called seeds , randomly selected from the training data.

K-means clustering algorithm

Given a set of instance vectors, C , representing a particular cluster, the mean vector for the cluster (a.k.a. the prototype or centroid), μ ( C ) ⁠ , is computed as follows:

An instance x → is assigned to the cluster C j , whose mean vector m → j minimizes d ( x → , m → j ) ⁠ , where, again, d(x, y ) can be any of the standard distance metrics discussed in section 13.2.3 .

The iterative refinement algorithm for k-means is shown in Figure 13.10 . K-means is an any time algorithm, in that it can be stopped after any iteration and return an approximate solution. Assuming a fixed number of iterations, the time complexity of the algorithm is clearly O ( n ) ⁠ , i.e. linear in the number of training examples n , unlike HAC which is O ( n 2 ) ⁠ . Therefore, k-means scales to large data sets more effectively than HAC.

The goal of k-means is to form a coherent set of categories by forming groups whose instances are tightly clustered around their centroid. Therefore, k-means tries to optimize the total distance between instances and their cluster means as defined by:

Truly minimizing this objective function is computationally intractable (i.e. NP-hard); however, it can be shown that for several distance functions (e.g. Euclidean and cosine) that each iteration of k-means improves the value of this objective and that the algorithm converges to a local minimum ( Dhillon and Modha 2001 ). Since the random seed instances determine the algorithm’s starting place, running k-means multiple times with different random initializations (called random restarts ) can improve the quality of the best solution that is found.

13.4 Applications to Computational Linguistics

The supervised and unsupervised learning methods we have discussed in this chapter have been applied to a range of problems in computational linguistics. This section surveys applications to a variety of problems in language processing, starting with morphological and lexical problems and ending with discourse-level tasks.

13.4.1 Morphology

Symbolic learning has been applied to several problems in morphology (see Chapter 2 ). In particular, decision tree and ILP methods have been applied to the problem of generating the past tense of an English verb, a task frequently studied in cognitive science and neural networks as a touchstone problem in language acquisition. In fact, there has been significant debate whether or not rule-learning is an adequate cognitive model of how children learn this task ( Rumelhart and McClelland 1986 ; Pinker and Prince 1988 ; Macwhinney and Leinbach 1991 ). Typically, the problem is studied in its phonetic form, in which a string of phonemes for the present tense is mapped to a string of phonemes for the past tense. The problem is interesting since one must learn the regular transformation of adding ‘ed’, as well as particular irregular patterns such as that illustrated by the examples ‘sing’ → ‘sang’, ‘ring’ → ‘rang’, and ‘spring’ → ‘sprang’.

Decision tree algorithms were applied to this task and found to significantly outperform previous neural-network models in terms of producing correct past-tense forms for independent test words ( Ling and Marinov 1993 ; Ling 1994 ). In this study, verbs were restricted to 15 phonemes encoded using the UNIBET ASCII standard, and 15 separate trees were induced, one for producing each of the output phoneme positions using all 15 of the input phonemes as features. Below is the encoding for the mapping ‘act’ → ‘acted’, where underscore is used to represent a blank.

ILP rule-learning algorithms have also been applied to this problem and shown to outperform decision trees ( Mooney and Califf 1995 ). In this case, a definition for the predicate Past(X,Y) was learned for mapping an unbounded list of UNIBET phonemes to a corresponding list for the past tense (e.g. Past([&,k,t],[&,k,t,I,d])) using a predicate for appending lists as part of the background. A definition was learned in the form of a decision list in which rules are ordered and the first rule that applies is chosen. This allows first checking for exceptional cases and falling through to a default rule if none apply. The ILP system learns a very concise and comprehensible definition for the past-tense transformation using this approach. Similar ILP methods have also been applied to learning morphology in other European languages ( Manandhar et al. 1998 ; Kazakov and Manandhar 1998 ; Kazakov and Manandhar 2001 ).

13.4.2 Part-of-Speech Tagging

Tagging each word with its appropriate part-of-speech (POS) based on context is a useful first step in syntactic analysis (see Chapter 23 ). In addition to statistical methods that have been successfully applied to this task, decision tree induction ( Màrquez et al. 1999 ), rule induction ( Brill 1995 ), and instance-based categorization ( Daelemans et al. 1996 ) have also been successfully used to learn POS taggers.

The features used to determine the POS of a word generally include the POS tags in a window of two to three words on either side. Since during testing these tags must also be determined by the classifier, either only the previous tags are used or an iterative procedure is used to repeatedly update all tags until convergence is reached. For known words, a dictionary provides a set of possible POS categories. For unknown words, all POS categories are possible but additional morphological features, such as the last few characters of the word and whether or not it is capitalized, are typically used as additional input features. Using such techniques, symbolic learning systems can obtain high accuracies similar to those obtained by other POS tagging methods, i.e. in the range of 96–97%.

13.4.3 Word-Sense Disambiguation

As illustrated by the ‘interest’ problem introduced earlier, machine learning methods can be applied to determining the sense of an ambiguous word based on context (see Chapter 27 ). As also illustrated by this example, a variety of features can be used as helpful cues for this task. In particular, collocational features that specify words that appear in specific locations a few words before or after the ambiguous word are useful features, as are binary features indicating the presence of particular words anywhere in the current or previous sentence. Other potentially useful features include the parts-of-speech of nearby words, and general syntactic information, such as whether an ambiguous noun appears as a subject, direct object, or indirect object.

Instance-based methods have been applied to disambiguating a variety of words using a combination of all of these types of features ( Ng and Lee 1996 ). A feature-weighted version of nearest neighbour was used to disambiguate 121 different nouns and 70 verbs chosen for being both frequent and highly ambiguous. Fine-grained senses from WordNet were used, resulting in an average of 7.8 senses for the nouns and 12 senses for the verbs. The training set consisted of 192,800 instances of these words found in text sampled from the Brown corpus and the Wall Street Journal and labelled with correct senses. Testing on an independent set of 14,139 examples from the Wall Street Journal gave an accuracy of 68.6% compared to an accuracy of 63.7% from choosing the most common sense, a standard baseline for comparison. Since WordNet is known for making fine sense distinctions, these results may seem somewhat low. For some easier problems the results were more impressive, such as disambiguating ‘interest’ into one of six fairly fine-grained senses with an accuracy of 90%.

Decision tree and rule induction have also been applied to sense disambiguation. Figure 13.1 shows results for disambiguating the word ‘line’ into one of six senses using only binary features representing the presence or absence of all known words in the current and previous sentence ( Mooney 1996 ). Tree learning (C4.5), rule learning (PFOIL), and nearest neighbour perform comparably on this task and somewhat worse than simple neural network (Perceptron) and statistical (Naive Bayes) methods. A more recent project presents results on learning decision trees to disambiguate all content words in a financial corpus with an average accuracy of 77% ( Paliouras et al. 1999 ). Additional information on supervised learning approaches to word-sense disambiguation can be found in Chapter 27 .

In addition, unsupervised learning has been used to automatically discover or induce word senses from unannotated text by clustering featural descriptions of the contexts in which a particular word appears. As illustrated in Figure 13.8 for the word ‘interest’, clustering word occurrences can automatically create groups of instances that form coherent senses. Such discovered senses can be evaluated by comparing them to dictionary senses created by lexicographers. Variations of both HAC and k-means clustering have been used to induce word sense and been shown to produce meaningful distinctions that agree with traditional senses ( Schutze 1998 ; Purandare and Pedersen 2004 ).

13.4.4 Syntactic Parsing

Perhaps the most well-studied problem in computational linguistics is the syntactic analysis of sentences (see Chapters 4 and 25 ). In addition to statistical methods that have been successfully applied to this task, decision tree induction ( Magerman 1995 ; Hermjakob and Mooney 1997 ; Haruno et al. 1999 ), rule induction ( Brill 1993 ), and instance-based categorization ( Cardie 1993 ; Argamon et al. 1998 ) have also been successfully employed to learn syntactic parsers.

One of the first learning methods applied to parsing the Wall Street Journal (WSJ) corpus of the Penn treebank ( Marcus et al. 1993 ) employed statistical decision trees ( Magerman 1995 ). Using a set of features describing the local syntactic context, including the POS tags of nearby words and the node labels of neighbouring (previously constructed) constituents, decision trees were induced for determining the next parsing operation. Instead of growing the tree to completely fit the training data, pruning was used to create leaves for subsets that still contained a mixture of classes. These leaves were then labelled with class probability distributions estimated from the subset of the training data reaching that leaf. During testing, the system performs a search for the highest probability parse, where the probability of a parse is estimated by the product of the probabilities of its individual parsing actions (as determined by the decision tree). After training on approximately 40,000 WSJ sentences and testing on 1,920 additional ones, the system obtained a labelled precision (percentage of constructed constituents whose span and grammatical phrase label are both correct) of 84.5% and labelled recall (percentage of actual constituents that were found with both the correct span and grammatical label) of 84.0%.

13.4.5 Semantic Parsing

Learning methods have also been applied to mapping sentences directly into logical form (see Chapter 5 ) by inducing a parser from training examples consisting of sentences paired with semantic representations. Below is a sample training pair from an application involving English queries about a database of US geography:

What is the capital of the state with the highest population? answer(C, (capital(S,C), largest(P, (state(S), population(S,P))))).

Unfortunately, since constructing useful semantic representations for sentences is very difficult unless restricted to a fairly specific application, there is a noticeable lack of large corpora annotated with detailed semantic representations.

However, ILP has been used to induce domain-specific semantic parsers written in Prolog from examples of natural-language questions paired with logical Prolog queries ( Zelle and Mooney 1996 ; Ng and Zelle 1997 ). In this project, parser induction is treated as a problem of learning rules to control the actions of a generic shift-reduce parser. During parsing, the current context is maintained in a stack and a buffer containing the remaining input. When parsing is complete, the stack contains the representation of the input sentence. There are three types of operators used to construct logical forms. One is the introduction onto the stack of a predicate needed in the sentence representation due to the appearance a word or phrase at the front of the input buffer. A second type of operator unifies variables appearing in different stack items. Finally, an item on the stack may be embedded as an argument of another one. ILP is used to learn conditions under which each of these operators should be applied, using the complete stack and input buffer as context, so that the resulting parser deterministically produces the correct semantic output for all of the training examples.

This technique has been used to induce natural-language interfaces to several database query systems, such as the US geography application illustrated above. In one experiment using a corpus of 250 queries annotated with logical form, after training on 225 examples, the system was able to answer an average of 70% of novel queries correctly compared to 57% for an interface developed by a human programmer. Similar results were obtained for semantic parsing of other languages after translating the corpus into Spanish, Turkish, and Japanese ( Thompson and Mooney 1999 ). More recently, a variety of statistical learning methods have also been used to learn even more accurate semantic parsers in multiple languages ( Zettlemoyer and Collins 2005 ; Mooney 2007 ; Lu et al. 2008 ).

13.4.6 Information Extraction

Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document (see Chapter 38 ). Figure 13.11 shows an example of extracting values for a set of labelled slots from a job announcement posted to an Internet newsgroup. Information extraction systems require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning.

Information extraction example

A number of rule induction methods have recently been applied to learning patterns for information extraction ( Freitag 1998 ; Soderland 1999 ; Califf and Mooney 1999 ). Given training examples of texts paired with filled templates, such as that shown in Figure 13.11 , these systems learn pattern-matching rules for extracting the appropriate slot fillers from text. Some systems assume that the text has been preprocessed by a POS tagger or a syntactic parser; others use only patterns based on unprocessed text. Figure 13.12 shows a sample rule constructed for extracting the transaction amount from a newswire article about corporate acquisition ( Califf and Mooney 1999 ). This rule extracts the value ‘undisclosed’ from phrases such as ‘sold to the bank for an undisclosed amount’ or ‘paid Honeywell an undisclosed price’. The pre-filler pattern consists of two pattern elements: (1) a word whose POS is noun or proper noun, and (2) a list of at most two unconstrained words. The filler pattern requires the word ‘undisclosed’ tagged as an adjective. The post-filler pattern requires a word in the WordNet semantic category ‘price’.

Such systems have acquired extraction rules for a variety of domains, including apartment ads, university web pages, seminar announcements, terrorist news stories, and job announcements. After training on a couple of hundred examples, such systems are generally able to learn rules as accurate as those resulting from a time-consuming human development effort. The standard metrics for evaluating information extraction are precision , the percentage of extracted fillers that are correct, and recall , the percentage of correct fillers that are successfully extracted. On most tasks that have been studied, current systems are generally able to achieve precisions in the mid 80% range and recalls in the mid 60% range.

Sample learned information extraction rule

13.4.7 Anaphora Resolution

Resolving anaphora, or identifying multiple phrases that refer to the same entity, is another difficult language-processing problem (see Chapter 30 ). Anaphora resolution can be treated as a categorization problem by classifying pairs of phrases as either co-referring or not. Given a corpus of texts tagged with co-referring phrases, positive examples can be generated as all co-referring phrase pairs and negative examples as all phrase pairs within the same document that are not marked as co-referring. Both decision tree ( Aone and Bennett 1995 ; McCarthy and Lehnert 1995 ) and instance-based methods ( Cardie 1992 ) have been successfully applied to resolving various types of anaphora.

In particular, decision tree induction has been used to construct systems for general noun phrase co-reference resolution. Examples are described using features of both of the individual phrases, such as the semantic and syntactic category of the head noun; as well as features describing the relationship between the two phrases, such as whether the first phrase precedes the second and whether the semantic class of the first phrase subsumes that of the second. In one experiment ( Aone and Bennett l995 ), after training on 1,971 anaphora from 295 texts and testing on 1,359 anaphora from an additional 200 texts, the learned decision tree obtained a precision (percentage of co-references found that were correct) of 86.7% and a recall (percentage of true co-references that were found) of 69.7%. These results were superior to those obtained using a previous, hand-built co-reference procedure (precision 72.9%, recall 66.5%).

Ng (2010) is a survey of the supervised ML approaches to coreference resolution.

Unsupervised learning has also been applied to anaphora resolution. By clustering the phrases in a document describing entities, co-references can be determined without using any annotated training data. Such an unsupervised clustering approach to anaphora resolution has been shown to be competitive with supervised approaches ( Cardie and Wagstaff 1999 ). Ng (2008) reports a generative model for unsupervised coreference resolution which views coreference as an expectation maximization (EM) clustering process.

More anaphora and coreference resolution studies employing Machine Learning techniques are outlined in Chapter 30 of this Handbook .

13.5 Further Reading and Relevant Resources

Introductory textbooks on machine learning include Mitchell (1997) , ( Langley 1996 ), and Bishop (2006) . Russell and Norvig’s (2016) textbook on Artificial Intelligence features chapters covering or relevant to Machine Learning. An online course is available at Coursera ( https://www.coursera.org/learn/machine-learning ). The major conference in machine learning is the International Conference on Machine Learning (ICML) organized by the International Machine Learning Society (IMLS; < http://www.machinelearning.org/ >). Other relevant conferences include the conference and workshop on Machine Learning and Computational Neuroscience which is held every December and the international conference on learning representations (ICLR) which focus is on representation learning (see also Chapter 14 of this volume). Papers on machine learning for computational linguistics are regularly published at the Annual Meeting of the Association for Computational Linguistics (ACL; < http://www.aclweb.org/ >) as well as the Conference on Empirical Methods in Natural Language Processing (EMNLP) organized by the ACL Special Interest Group on Linguistic Data (SIGDAT; < http://www.cs.jhu.edu/~yarowsky/sigdat.html >) and the Conference on Computational Natural Language Learning (CoNLL) organized by the ACL Special Interest Group on Natural Language Learning (SIGNLL; < http://ifarm.nl/signll/ >). The reader is referred to the Journal of Machine Learning Research (JMLR) as a major journal in the field. For online resources and communities, visit the website Kaggle ( www.kaggle.com ) which is the largest online community of machine learning practitioners, statisticians, and data miners. ‘Deep learning’ is a new neural-network approach that is attracting significant attention; more information is available at < http://deeplearning.net/ > (see also Chapter 15 of this volume). Recent applications of neural networks to machine translation ( Sutskever et al. 2014 ) and lexical semantics ( Mikolov et al. 2013 ) are particularly notable.

Aha, David W. , Dennis F. Kibler , and Marc K. Albert ( 1991 ). ‘ Instance-Based Learning Algorithms ’, Machine Learning , 6(1): 37–66.

Google Scholar

Aone, Chinatsu and Scott W. Bennett (1995). ‘Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies’. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL-95) , Cambridge, MA, 122–129. Stroudsburg, PA: Association for Computational Linguistics.

Argamon, Shlomo , Ido Dagan , and Yuval Krymolowski (1998). ‘A Memory-Based Approach to Learning Shallow Natural Language Patterns’. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (joint with the 17th International Conference on Computational Linguistics) (ACL/COLING-98) , Montreal, Quebec, 67–73. Stroudsburg, PA: Association for Computational Linguistics.

Bishop, Christopher M. ( 2006 ). Pattern Recognition and Machine Learning . New York: Springer.

Google Preview

Bratko, Ivan and Stephen Muggleton ( 1995 ). ‘ Applications of Inductive Logic Programming ’, Communications of the Association for Computing Machinery , 38(11): 65–70.

Brill, Eric (1993). ‘Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach’. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL-93) , Columbus, OH, 259–265. Stroudsburg, PA: Association for Computational Linguistics.

Brill, Eric ( 1995 ). ‘ Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging ’, Computational Linguistics , 21(4): 543–565.

Califf, Mary Elaine and Raymond J. Mooney (1999). ‘Relational Learning of Pattern-Match Rules for Information Extraction’. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99) , Orlando, FL, 328–334. Palo Alto, CA: AAAI Press.

Cardie, Claire (1992). ‘Learning to Disambiguate Relative Pronouns’. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92) , San Jose, CA, 38–43. Palo Alto, CA: AAAI Press.

Cardie, Claire (1993). ‘A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis’. In Proceedings of the Eleventh National Conference on Articial Intelligence (AAAI-93) , Washington, DC, 798–803. Palo Alto, CA: AAAI Press.

Cardie, Claire and Kiri Wagstaff (1999). ‘Noun Phrase Coreference as Clustering’. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99) , University of Maryland, 82–89. San Francisco, CA: Morgan Kaufmann.

Clark, Peter and Tim Niblett ( 1989 ). ‘ The CN2 Induction Algorithm ’, Machine Learning , 3: 261–284.

Cohen, William W. (1995). ‘Fast Effective Rule Induction’. In Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) , 115–123. San Francisco, CA: Morgan Kaufmann.

Cost, Scott and Steven Salzberg ( 1993 ). ‘ A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features ’, Machine Learning , 10(1): 57–78.

Cover, Thomas M. and Peter E. Hart ( 1967 ). ‘ Nearest Neighbor Pattern Classification ’, IEEE Transactions on Information Theory , 13: 21–27.

Daelemans, Walter , Jakub Zavrel , Peter Berck , and Steven Gillis ( 1996 ). ‘ MBT: A Memory-Based Part of Speech Tagger-Generator ’. In Proceedings of the Fourth Workshop on Very Large Corpora , 14–27. Copenhagen: ACL SIGDAT.

Dhillon, Inderjit S. and Dharmendra S. Modha ( 2001 ). ‘ Concept Decompositions for Large Sparse Text Data Using Clustering ’, Machine Learning , 42: 143–175.

Fayyad, Usama M. , Gregory Piatetsky-Shapiro , and Padhraic Smyth ( 1996 ). ‘From Data Mining to Knowledge Discovery’. In Usama M. Fayyad , Gregory Piatetsky-Shapiro , Padhraic Smyth , and Ramasamy Uthurusamy (eds), Advances in Knowledge Discovery and Data Mining , 1–34. Cambridge, MA: MIT Press.

Freitag, Dayne (1998). ‘Toward General-Purpose Learning for Information Extraction’. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (joint with the 17th International Conference on Computational Linguistics) (ACL/COLING-98) , Montreal, Quebec, 404–408. Stroudsburg, PA: Association for Computational Linguistics.

Haruno, Masahiko , Satoshi Shirai , and Yoshifumi Ooyama ( 1999 ). ‘ Using Decision Trees to Construct a Practical Parser ’, Machine Learning , 34: 131–150.

Hermjakob, Ulf and Raymond J. Mooney (1997). ‘Learning Parse and Translation Decisions from Examples with Rich Context’. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97) , Madrid, Spain, 482–489. Stroudsburg, PA: Association for Computational Linguistics.

Jain, Anil K. , M. N. Murty , and Patrick J. Flynn ( 1999 ). ‘ Data Clustering: A Review ’, ACM Computing Surveys , 31(3): 264–323.

Kazakov, Dimitar and Suresh Manandhar (1998). ‘A Hybrid Approach to Word Segmentation’. In Proceedings of the 8th International Workshop on Inductive Logic Programming (ILP-98) , 125–134. London: Springer.

Kazakov, Dimitar and Suresh Manandhar ( 2001 ). ‘ Unsupervised learning of word segmentation rules with genetic algorithms and inductive logic programming ’, Machine Learning , 43, 121–162.

Langley, Pat ( 1996 ). Elements of Machine Learning . San Francisco, CA: Morgan Kaufmann.

Lavrač, Nada and Saso Džeroski ( 1994 ). Inductive Logic Programming: Techniques and Applications . New York: Ellis Horwood.

Ling, Charles X. ( 1994 ). ‘ Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models ’, Journal of Artificial Intelligence Research , 1: 209–229.

Ling, Charles X. and Marin Marinov ( 1993 ). ‘ Answering the Connectionist Challenge: A Symbolic Model of Learning the Past Tense of English Verbs ’, Cognition , 49(3): 235–290.

Lu, Wei , Hwee Tou Ng , Wee Sun Lee , and Luke S. Zettlemoyer (2008). ‘A Generative Model for Parsing Natural Language to Meaning Representations’. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP-08) , Honolulu, HI. Stroudsburg, PA: Association for Computational Linguistics.

MacWhinney, Brian and Jared Leinbach ( 1991 ). ‘ Implementations Are Not Conceptualizations: Revising the Verb Model ’, Cognition , 40: 291–296.

Magerman, David M. (1995). ‘Statistical Decision-Tree Models for Parsing’. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL-95) , Cambridge, MA, 276–283. Stroudsburg, PA: Association for Computational Linguistics.

Manandhar, Suresh , Saso Dzeroski , and Tomaz Erjavec (1998). ‘Learning Multilingual Morphology with CLOG’. In Proceedings of the 8th International Workshop on Inductive Logic Programming (ILP-98) , 135–144. London: Springer.

Manning, Christopher D. , Prabhakar Raghavan , and Hinrich Schütze ( 2008 ). Introduction to Information Retrieval . Cambridge: Cambridge University Press.

Manning, Christopher D. and Hinrich Schütze ( 1999 ). Foundations of Statistical Natural Language Processing . Cambridge, MA: MIT Press.

Marcus, Mitchell , Beatrice Santorini , and Mary Ann Marcinkiewicz ( 1993 ). ‘ Building a Large Annotated Corpus of English: The Penn Treebank ’, Computational Linguistics , 19(2): 313–330.

Màrquez, Lluís , Lluís Padró , and Horacio Rodríguez ( 1999 ). ‘A Machine Learning Approach to POS Tagging’, Machine Learning , 39(1): 59–91.

McCarthy, John and Wendy Lehnert (1995). ‘Using Decision Trees for Coreference Resolution’. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) , 1050–1055. San Francisco, CA: Morgan Kaufmann.

Mikolov, Tomas , Kai Chen , Greg Corrado , and Jeffrey Dean (2013). ‘Efficient Estimation of Word Representations in Vector Space’. In International Conference on Learning Representations (ICLR) .

Mitchell, Tom ( 1997 ). Machine Learning . New York: McGraw-Hill.

Mooney, Raymond J. (1996). ‘Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning’. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-96) , Philadelphia, PA, 82–91. Stroudsburg, PA: Association for Computational Linguistics.

Mooney, Raymond J. ( 2007 ). ‘Learning for Semantic Parsing’. In A. Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing: Proceedings of the 8th International Conference (CICLing 2007) , Mexico City, 311–324. Berlin: Springer.

Mooney, Raymond J. and Mary Elaine Califf ( 1995 ). ‘ Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs ’, Journal of Artificial Intelligence Research , 3: 1–24.

Ng, Hwee Tou and Hian Beng Lee (1996). ‘Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach’. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL-96) , Santa Cruz, CA, 40–47. Stroudsburg, PA: Association for Computational Linguistics.

Ng, Hwee Tou and John Zelle ( 1997 ). ‘ Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing ’, AI Magazine , 18(4): 45–64.

Paliouras, Georgios , Vangelis Karkaletsis , and Constantine D. Spyropoulos (1999). ‘Learning Rules for Large Vocabulary Word Sense Disambiguation’. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99) , Stockholm, Sweden, 674–679. San Francisco, CA: Morgan Kaufmann.

Ng, Vincent (2008). ‘Unsupervised models for coreference resolution’. In Proceedings of the Conference on Empirical Natural Language Processing (EMNLP) , Honolulu, Hawaii, 640–649.

Ng, Vincent (2010). ‘Supervised Noun Phrase Coreference Resolution: the First Fifteen Years’. In Proceedings of the 48th Meeting of the Association for Computational Linguistics (ACL) , Uppsala, Sweden, 1396–1411.

Pinker, Steven and Alan Prince ( 1988 ). ‘On Language and Connectionism: Analysis of a Parallel Distributed Model of Language Acquisition’. In Steven Pinker and Jacques A. Mehler (eds), Connections and Symbols , 73–193. Cambridge, MA: MIT Press.

Purandare, Amruta and Ted Pedersen (2004). ‘SenseClusters—Finding Clusters that Represent Word Senses’. In Proceedings of Human Language Technology Conference/North American Association for Computational Linguistics Annual Meeting (HLT-NAACL-2004) , Boston, MA, 26–29. Palo Alto, CA: AAAI Press.

Quinlan, J. Ross ( 1986 ). ‘ Induction of Decision Trees ’, Machine Learning , 1(1): 81–106.

Quinlan, J. Ross ( 1990 ). ‘ Learning Logical Definitions from Relations ’, Machine Learning , 5(3): 239–266.

Quinlan, J. Ross ( 1993 ). C4.5: Programs for Machine Learning . San Mateo, CA: Morgan Kaufmann.

Quinlan, J. Ross (1996). ‘Bagging, Boosting, and C4.5’. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96) , Portland, OR, 725–730. Palo Alto, CA: AAAI Press.

Rumelhart, David E. and James L. McClelland ( 1986 ). ‘On Learning the Past Tense of English Verbs’. In David E. Rumelhart and James L. McClelland (eds), Parallel Distributed Processing , vol. II, 216–271. Cambridge, MA: MIT Press.

Russell, Stuart and Peter Novig ( 2016 ). Artificial Intelligence: A Modern Approach (3rd, global edition). Harlow, England: Pearson.

Schutze, Hinrich ( 1998 ). ‘ Automatic Word Sense Discrimination ’, Computational Linguistics , 24(1): 97–123.

Soderland, Stephen ( 1999 ). ‘ Learning Information Extraction Rules for Semi-Structured and Free Text ’, Machine Learning , 34: 233–272.

Stanfill, Craig and David L. Waltz ( 1986 ). ‘ Toward Memory-Based Reasoning ’, Communications of the Association for Computing Machinery , 29: 1213–1228.

Sutskever, Ilya , Oriol Vinyals , and Quoc V. Le ( 2014 ). ‘Sequence to Sequence Learning with Neural Networks’. In Z. Ghahramani , M. Welling , C. Cortes , N. D. Lawrence , and K. Q. Weinberger (eds), Advances in Neural Information Processing Systems 27: 28th Annual Conference on Neural Information Processing Systems (NIPS) , 2690–2698. New York: Curran Associates.

Thompson, Cynthia A. and Raymond J. Mooney ( 1999 ). ‘Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces’. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99) , Orlando, FL, 487–493. Palo Alto, CA: AAAI Press.

Zelle, John M. and Raymond J. Mooney ( 1996 ). ‘Learning to Parse Database Queries Using Inductive Logic Programming’. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96) , Portland, OR, 1050–1055. Palo Alto, CA: AAAI Press.

Zettlemoyer, Luke S. and Michael Collins ( 2005 ). ‘Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars’. In Proceedings of 21st Conference on Uncertainty in Artificial Intelligence (UAI-2005) , 658–666. Edinburgh, Scotland: AUAI Press.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Machine Learning & Data Science Foundations

Online Graduate Certificate

Be a Game Changer

Harness the power of big data with skills in machine learning and data science, your pathway to the ai workforce.

Organizations know how important data is, but they don’t always know what to do with the volume of data they have collected. That’s why Carnegie Mellon University designed the online Graduate Certificate in Machine Learning & Data Science Foundations; to teach technically-savvy professionals how to leverage AI and machine learning technology for harnessing the power of large scale data systems.   

Computer-Science Based Data Analytics

When you enroll in this program, you will learn foundational skills in computer programming, machine learning, and data science that will allow you to leverage data science in various industries including business, education, environment, defense, policy and health care. This unique combination of expertise will give you the ability to turn raw data into usable information that you can apply within your organization.  

Throughout the coursework, you will:

  • Practice mathematical and computational concepts used in machine learning, including probability, linear algebra, multivariate differential calculus, algorithm analysis, and dynamic programming.
  • Learn how to approach and solve large-scale data science problems.
  • Acquire foundational skills in solution design, analytic algorithms, interactive analysis, and visualization techniques for data analysis.

An online Graduate Certificate in Machine Learning & Data Science from Carnegie Mellon will expand your possibilities and prepare you for the staggering amount of data generated by today’s rapidly changing world. 

A Powerful Certificate. Conveniently Offered. 

The online Graduate Certificate in Machine Learning & Data Science Foundations is offered 100% online to help computer science professionals conveniently fit the program into their busy day-to-day lives. In addition to a flexible, convenient format, you will experience the same rigorous coursework for which Carnegie Mellon University’s graduate programs are known. 

For Today’s Problem Solvers

This leading certificate program is best suited for:

  • Industry Professionals looking to deliver value to companies by acquiring in-demand data science, AI, and machine learning skills. After completing the program, participants will acquire the technical know-how to build machine learning models as well as the ability to analyze trends.
  • Recent computer science degree graduates seeking to expand their skill set and become even more marketable in a growing field. Over the past few years, data sets have grown tremendously. Today’s top companies need data science professionals who can leverage machine learning technology.   

At a Glance

Start Date May 2024

Application Deadlines Rolling Admissions

We are still accepting applications for a limited number of remaining spots to start in Summer 2024. Apply today to secure your space in the program.

Program Length 12 months

Program Format 100% online

Live-Online Schedule 1x per week for 90 minutes in the evening

Taught By School of Computer Science

Request Info

Questions? There are two ways to contact us. Call 412-501-2686 or send an email to  [email protected]  with your inquiries .

Program Name Change

To better reflect the emphasis on machine learning in the curriculum, the name of this certificate has been updated from Computational Data Science Foundations to Machine Learning & Data Science Foundations.

Although the name has changed, the course content, faculty, online experience, admissions requirements, and everything else has remained the same. Questions about the name change? Please contact us.

Looking for information about CMU's on-campus Master of Computational Data Science degree? Visit the program's website to learn more.  Admissions consultations with our team will only cover the online certificate program.

A National Leader in Computer Science

Carnegie Mellon University is world renowned for its technology and computer science programs. Our courses are taught by leading researchers in the fields of Machine Learning, Language Technologies, and Human-Computer Interaction. 

phd machine learning oxford

Number One  in the nation for our artificial intelligence programs.

phd machine learning oxford

Number One  in the nation  for our programming language courses.

phd machine learning oxford

Number Four  in the nation for the caliber of our computer science programs.

William & Mary

  • Arts & Sciences
  • Computer Science
  • About/Contact Us

2024 Virtual Summer Bootcamp on Deep Learning and its Applications: Register by May 20

This year's theme is "Exploring Fundamental Deep Learning Models and their Applications in Healthcare, Physics, and Autonomous Driving." The bootcamp is designed to equip both undergraduate and graduate students with the essential knowledge and skills in deep learning models, enabling them to apply these technologies in real-world scenarios. Participants will engage in lectures, hands-on tutorials, and projects to gain a comprehensive understanding of deep learning concepts and their practical applications. No background in machine learning is required For the full schedule and further details, please visit our website: AI Bootcamp 2024 To register for the bootcamp, please fill out the form at: Registration Form

If you have any questions, please do not hesitate to contact Prof. Huajie Shao at [[hshao]] .

2024 Summer Bootcamp

Follow W&M on Social Media:

Williamsburg, Virginia

  • Accessibility
  • Consumer Information
  • Non-Discrimination Notice
  • Privacy & Security
  • See us on facebook
  • See us on twitter
  • See us on youtube
  • See us on linkedin
  • See us on instagram

Two key brain systems are central to psychosis, Stanford Medicine-led study finds

When the brain has trouble filtering incoming information and predicting what’s likely to happen, psychosis can result, Stanford Medicine-led research shows.

April 11, 2024 - By Erin Digitale

test

People with psychosis have trouble filtering relevant information (mesh funnel) and predicting rewarding events (broken crystal ball), creating a complex inner world. Emily Moskal

Inside the brains of people with psychosis, two key systems are malfunctioning: a “filter” that directs attention toward important external events and internal thoughts, and a “predictor” composed of pathways that anticipate rewards.

Dysfunction of these systems makes it difficult to know what’s real, manifesting as hallucinations and delusions. 

The findings come from a Stanford Medicine-led study , published April 11 in  Molecular Psychiatry , that used brain scan data from children, teens and young adults with psychosis. The results confirm an existing theory of how breaks with reality occur.

“This work provides a good model for understanding the development and progression of schizophrenia, which is a challenging problem,” said lead author  Kaustubh Supekar , PhD, clinical associate professor of psychiatry and behavioral sciences.

The findings, observed in individuals with a rare genetic disease called 22q11.2 deletion syndrome who experience psychosis as well as in those with psychosis of unknown origin, advance scientists’ understanding of the underlying brain mechanisms and theoretical frameworks related to psychosis.

During psychosis, patients experience hallucinations, such as hearing voices, and hold delusional beliefs, such as thinking that people who are not real exist. Psychosis can occur on its own and isa hallmark of certain serious mental illnesses, including bipolar disorder and schizophrenia. Schizophrenia is also characterized by social withdrawal, disorganized thinking and speech, and a reduction in energy and motivation.

It is challenging to study how schizophrenia begins in the brain. The condition usually emerges in teens or young adults, most of whom soon begin taking antipsychotic medications to ease their symptoms. When researchers analyze brain scans from people with established schizophrenia, they cannot distinguish the effects of the disease from the effects of the medications. They also do not know how schizophrenia changes the brain as the disease progresses. 

To get an early view of the disease process, the Stanford Medicine team studied young people aged 6 to 39 with 22q11.2 deletion syndrome, a genetic condition with a 30% risk for psychosis, schizophrenia or both. 

test

Kaustubh Supekar

Brain function in 22q11.2 patients who have psychosis is similar to that in people with psychosis of unknown origin, they found. And these brain patterns matched what the researchers had previously theorized was generating psychosis symptoms.

“The brain patterns we identified support our theoretical models of how cognitive control systems malfunction in psychosis,” said senior study author  Vinod Menon , PhD, the Rachael L. and Walter F. Nichols, MD, Professor; a professor of psychiatry and behavioral sciences; and director of the  Stanford Cognitive and Systems Neuroscience Laboratory .

Thoughts that are not linked to reality can capture the brain’s cognitive control networks, he said. “This process derails the normal functioning of cognitive control, allowing intrusive thoughts to dominate, culminating in symptoms we recognize as psychosis.”

Cerebral sorting  

Normally, the brain’s cognitive filtering system — aka the salience network — works behind the scenes to selectively direct our attention to important internal thoughts and external events. With its help, we can dismiss irrational thoughts and unimportant events and focus on what’s real and meaningful to us, such as paying attention to traffic so we avoid a collision.

The ventral striatum, a small brain region, and associated brain pathways driven by dopamine, play an important role in predicting what will be rewarding or important. 

For the study, the researchers assembled as much functional MRI brain-scan data as possible from young people with 22q11.2 deletion syndrome, totaling 101 individuals scanned at three different universities. (The study also included brain scans from several comparison groups without 22q11.2 deletion syndrome: 120 people with early idiopathic psychosis, 101 people with autism, 123 with attention deficit/hyperactivity disorder and 411 healthy controls.) 

The genetic condition, characterized by deletion of part of the 22nd chromosome, affects 1 in every 2,000 to 4,000 people. In addition to the 30% risk of schizophrenia or psychosis, people with the syndrome can also have autism or attention deficit hyperactivity disorder, which is why these conditions were included in the comparison groups.

The researchers used a type of machine learning algorithm called a spatiotemporal deep neural network to characterize patterns of brain function in all patients with 22q11.2 deletion syndrome compared with healthy subjects. With a cohort of patients whose brains were scanned at the University of California, Los Angeles, they developed an algorithmic model that distinguished brain scans from people with 22q11.2 deletion syndrome versus those without it. The model predicted the syndrome with greater than 94% accuracy. They validated the model in additional groups of people with or without the genetic syndrome who had received brain scans at UC Davis and Pontificia Universidad Católica de Chile, showing that in these independent groups, the model sorted brain scans with 84% to 90% accuracy.

The researchers then used the model to investigate which brain features play the biggest role in psychosis. Prior studies of psychosis had not given consistent results, likely because their sample sizes were too small. 

test

Vinod Menon

Comparing brain scans from 22q11.2 deletion syndrome patients who had and did not have psychosis, the researchers showed that the brain areas contributing most to psychosis are the anterior insula (a key part of the salience network or “filter”) and the ventral striatum (the “reward predictor”); this was true for different cohorts of patients.

In comparing the brain features of people with 22q11.2 deletion syndrome and psychosis against people with psychosis of unknown origin, the model found significant overlap, indicating that these brain features are characteristic of psychosis in general.

A second mathematical model, trained to distinguish all subjects with 22q11.2 deletion syndrome and psychosis from those who have the genetic syndrome but without psychosis, selected brain scans from people with idiopathic psychosis with 77.5% accuracy, again supporting the idea that the brain’s filtering and predicting centers are key to psychosis.

Furthermore, this model was specific to psychosis: It could not classify people with idiopathic autism or ADHD.

“It was quite exciting to trace our steps back to our initial question — ‘What are the dysfunctional brain systems in schizophrenia?’ — and to discover similar patterns in this context,” Menon said. “At the neural level, the characteristics differentiating individuals with psychosis in 22q11.2 deletion syndrome are mirroring the pathways we’ve pinpointed in schizophrenia. This parallel reinforces our understanding of psychosis as a condition with identifiable and consistent brain signatures.” However, these brain signatures were not seen in people with the genetic syndrome but no psychosis, holding clues to future directions for research, he added.

Applications for treatment or prevention

In addition to supporting the scientists’ theory about how psychosis occurs, the findings have implications for understanding the condition — and possibly preventing it.

“One of my goals is to prevent or delay development of schizophrenia,” Supekar said. The fact that the new findings are consistent with the team’s prior research on which brain centers contribute most to schizophrenia in adults suggests there may be a way to prevent it, he said. “In schizophrenia, by the time of diagnosis, a lot of damage has already occurred in the brain, and it can be very difficult to change the course of the disease.”

“What we saw is that, early on, functional interactions among brain regions within the same brain systems are abnormal,” he added. “The abnormalities do not start when you are in your 20s; they are evident even when you are 7 or 8.”

Our discoveries underscore the importance of approaching people with psychosis with compassion.

The researchers plan to use existing treatments, such as transcranial magnetic stimulation or focused ultrasound, targeted at these brain centers in young people at risk of psychosis, such as those with 22q11.2 deletion syndrome or with two parents who have schizophrenia, to see if they prevent or delay the onset of the condition or lessen symptoms once they appear. 

The results also suggest that using functional MRI to monitor brain activity at the key centers could help scientists investigate how existing antipsychotic medications are working. 

Although it’s still puzzling why someone becomes untethered from reality — given how risky it seems for one’s well-being — the “how” is now understandable, Supekar said. “From a mechanistic point of view, it makes sense,” he said.

“Our discoveries underscore the importance of approaching people with psychosis with compassion,” Menon said, adding that his team hopes their work not only advances scientific understanding but also inspires a cultural shift toward empathy and support for those experiencing psychosis. 

“I recently had the privilege of engaging with individuals from our department’s early psychosis treatment group,” he said. “Their message was a clear and powerful: ‘We share more similarities than differences. Like anyone, we experience our own highs and lows.’ Their words were a heartfelt appeal for greater empathy and understanding toward those living with this condition. It was a call to view psychosis through a lens of empathy and solidarity.”

Researchers contributed to the study from UCLA, Clinica Alemana Universidad del Desarrollo, Pontificia Universidad Católica de Chile, the University of Oxford and UC Davis.

The study was funded by the Stanford Maternal and Child Health Research Institute’s Uytengsu-Hamilton 22q11 Neuropsychiatry Research Program, FONDEYCT (the National Fund for Scientific and Technological Development of the government of Chile), ANID-Chile (the Chilean National Agency for Research and Development) and the U.S. National Institutes of Health (grants AG072114, MH121069, MH085953 and MH101779).

Erin Digitale

About Stanford Medicine

Stanford Medicine is an integrated academic health system comprising the Stanford School of Medicine and adult and pediatric health care delivery systems. Together, they harness the full potential of biomedicine through collaborative research, education and clinical care for patients. For more information, please visit med.stanford.edu .

Artificial intelligence

Exploring ways AI is applied to health care

Stanford Medicine Magazine: AI

IMAGES

  1. Innovative PhD Thesis Machine Learning Research Guidance

    phd machine learning oxford

  2. Innovative PhD Thesis Machine Learning Research Guidance

    phd machine learning oxford

  3. PHD in machine learning or data science, is it worth?

    phd machine learning oxford

  4. Innovative PhD Thesis on Machine Learning Projects (Top 5 Latest)

    phd machine learning oxford

  5. Machine Learning Phd Online

    phd machine learning oxford

  6. Oxford University Machine Learning Phd

    phd machine learning oxford

VIDEO

  1. Mini PHD Machine Call 9602393248 #newsong #bajrangijaitpura

  2. December 4th, 2018

  3. Machine Learning Project || Build a Multi Class Text Classification Model || Naïve Bayes Example

  4. 1. What is Machine Learning?

  5. Carbon footprint

  6. A difficult problem with an elegant solution

COMMENTS

  1. Statistics and Machine Learning (EPSRC CDT)

    The Statistics and Machine Learning (StatML) Centre for Doctoral Training (CDT) is a four-year DPhil research programme (or eight years if studying part-time). It will train the next generation of researchers in statistics and machine learning, who will develop widely-applicable novel methodology and theory and create application-specific ...

  2. StatML

    Welcome to StatML - an EPSRC funded Center for Doctoral Training in Statistics and Machine Learning. StatML is a cohort-based doctoral programme based at Imperial and Oxford. The Statistics and Machine Learning programme is a four-year PhD/DPhil research programme (or longer if studying part-time). It trains the next generation of researchers ...

  3. Oxford Applied and Theoretical Machine Learning Group

    The Oxford Applied and Theoretical Machine Learning Group (OATML) is a research group within the Department of Computer Science of the University of Oxford led by Prof Yarin Gal. We come from academia (Oxford, Cambridge, MILA, McGill, U of Amsterdam, U of Toronto, Yale, and others) and industry (Google, DeepMind, Twitter, Qualcomm, and startups). We follow pragmatic approaches to fundamental ...

  4. Machine Learning

    Overview. Machine learning techniques enable us to automatically extract features from data so as to solve predictive tasks, such as speech recognition, object recognition, machine translation, question-answering, anomaly detection, medical diagnosis and prognosis, automatic algorithm configuration, personalisation, robot control, time series ...

  5. Computational Statistics and Machine Learning

    Research in Statistical Machine Learning spans Bayesian probabilistic and optimization based learning of graphical models, nonparametric models and deep neural networks, and complements research in Monte Carlo methods for related classes of problems. Research in Applied Statistics motivates the more theoretical work in this group and some staff ...

  6. Statistics and Machine Learning (DPhil) at University of Oxford

    The Modern Statistics and Statistical Machine Learning CDT is a four-year DPhil research programme (or eight years if studying part-time). ... This is the Oxford component of StatML, an EPSRC Centre for Doctoral Training (CDT) in Modern Statistics and Statistical Machine Learning, co-hosted by Imperial College London and the University of ...

  7. Prospective Students

    Machine Learning, Computational Statistics and Statistical Methodologies at the Department of Statistics, University of Oxford We are always looking for talented DPhil (PhD) students. However due to high demand, each year we can only admit a small number of students from a large applicant pool.

  8. DPhil in Statistics

    The Department of Statistics in the University of Oxford is a world leader in research in probability, bioinformatics, mathematical genetics and statistical methodology, including computational statistics and machine learning. ... the Academy for PhD Training in Statistics; this is a joint venture with a group of leading university statistics ...

  9. Machine Learning in Mathematics & Theoretical Physics

    Programme description. Machine Learning in Mathematics & Theoretical Physics is an intensive one-week research school primarily designed for PhD students, with master students and more senior researchers very welcome to apply as well. The program will be held between 17-21 July 2023 at the University of Oxford and hosted by the Department of ...

  10. The Machine Learning Research group's work and people involved

    The Machine Learning Research Group (MLRG) sits within Information Engineering in the Department of Engineering Science of the University of Oxford. We are one of the core groupings that make up the wider community of Oxford Machine Learning & AI (Artificial Intelligence). The MLRG is particularly well integrated with the Oxford-Man Institute ...

  11. Doctoral Project

    EPSRC CDT in Statistics and Machine Learning at Imperial College London and the University of Oxford. Doctoral Project. Each student will undertake a significant, challenging and original research project, leading to the award of a PhD (at Imperial) or a DPhil (at Oxford). Given the breadth and depth of the research teams at Imperial College ...

  12. Artificial Intelligence Concepts: Introduction to Machine Learning

    Machine Learning methods lie at the heart of the Artificial Intelligence revolution. This course introduces fundamental concepts to explain Machine Learning methods, potential, and pitfalls to a general audience. ... Join us in Oxford or online. Our open-access short courses include day events, ... More than 35 part-time graduate certificates ...

  13. Modern Statistics and Statistical Machine Learning

    The Modern Statistics and Statistical Machine Learning CDT at University of Oxford is a four-year DPhil research programme (or eight years if studying part-time). University of Oxford Multiple locations . ... Below you will find PhD's scholarship opportunities for Modern Statistics and Statistical Machine Learning.

  14. Michael Hutchinson

    Hi I'm Michael! I'm interested in all things Machine Learning! I am currently a PhD student at the University of Oxford through the StatML course, supervised by Yee Whye Teh and Max Welling.Before that I completed a Masters of Engineering at the University of Cambridge, supervised by Dr Rich E. Turner.. Recently I have been working on two main topics: Diffusion models and related topics ...

  15. SB2b/SM4: Statistical Machine Learning 2020

    Prof. Pier Palamara, University of Oxford, Hilary term 2020. Course offered to Part B students (SB2b) and MSc students (SM4) ... Machine learning studies methods that can automatically detect patterns in data, and then use these patterns to predict future data or other outcomes of interest. It is widely used across many scientific and ...

  16. FAQ

    Everyone is welcome to apply for our machine learning summer schools regardless of their origin, nationality, and country of residence. Our primary target audience are (1) PhD students with good technical background whose research topics are related to ML, plus (2) researchers and engineers in both academia and industry with similar/advanced levels of technical knowledge.

  17. Machine Learning as a Tool for Hypothesis Generation*

    Predictive patterns in a single data set alone are rarely useful; they become insightful when they can be generalized. Currently, that generalization is done by people, and people can only generalize things they understand. The predictors produced by machine learning algorithms are, however, notoriously opaque—hard-to-decipher "black boxes."

  18. Machine Learning

    He received his PhD in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 170 published research papers, primarily in the areas of machine learning and natural language processing. ... 'Machine Learning', in Ruslan Mitkov (ed.), The Oxford Handbook of Computational Linguistics, 2nd edn, Oxford Handbooks (2022 ...

  19. CMU's Online Graduate Certificate in Machine Learning and Data Science

    Program Name Change. To better reflect the emphasis on machine learning in the curriculum, the name of this certificate has been updated from Computational Data Science Foundations to Machine Learning & Data Science Foundations.. Although the name has changed, the course content, faculty, online experience, admissions requirements, and everything else has remained the same.

  20. 2024 Virtual Summer Bootcamp on Deep Learning and its Applications

    This year's theme is "Exploring Fundamental Deep Learning Models and their Applications in Healthcare, Physics, and Autonomous Driving." The bootcamp is designed to equip both undergraduate and graduate students with the essential knowledge and skills in deep learning models, enabling them to apply these technologies in real-world scenarios.

  21. Two key brain systems are central to psychosis, Stanford Medicine-led

    Comparing brain scans from 22q11.2 deletion syndrome patients who had and did not have psychosis, the researchers showed that the brain areas contributing most to psychosis are the anterior insula (a key part of the salience network or "filter") and the ventral striatum (the "reward predictor"); this was true for different cohorts of patients.