module specification

CS7052 - Machine Learning (2023/24)

Module specification Module approved to run in 2023/24
Module title Machine Learning
Module level Masters (07)
Credit rating for module 20
School School of Computing and Digital Media
Total study hours 200
 
52 hours Assessment Preparation / Delivery
100 hours Guided independent study
48 hours Scheduled learning & teaching activities
Assessment components
Type Weighting Qualifying mark Description
Coursework 60%   Coursework (2,500 words + artefact)
Unseen Examination 40%   Unseen exam (2 hours)
Running in 2023/24

(Please note that module timeslots are subject to change)
Period Campus Day Time Module Leader
Autumn semester North Wednesday Morning

Module summary

This module provides a comprehensive overview of the methods for machine learning and data analytics suitable for use in the data analysis and Big Data Analytics. It also provides practical skills for working with various tools for data analysis and Big Data Analytics inside and outside the platform such as Python, R, Spark, etc. The knowledge and skills obtained can be used in many tasks where the data analysis is of crucial importance for the competitiveness and the effectiveness of the businesses – customer profiling, product recommendations, market trends analysis, cybersecurity, etc. Some basic programming skills using languages for numerical processing such as Python or R or other relevant languages can be an advantage.

Prior learning requirements

N/A

Syllabus

1. Machine Learning techniques for numerical and symbolic data. Classification, Regression, Clustering, Reduction.
2. Machine Learning techniques for text data based on methods such as linguistic methods and naive Bayesian approach.
3. Using Python or other capable languages for analysing data. Data structures and control operations for streamline data processing.
4. Data visualisation using Python or other capable languages based on visualisation library.
5. Machine learning using Python or other capable languages based on Machine Learning libraries such as TensorFlow and scikit-learn.
6. Natural language processing using Python or other capable languages based on Natural Language Processing libraries such as NLTK.
7. Machine Learning using Spark on Hadoop based on Machine Learning library MLlib.
8. Legal, Ethical & Professional Issues and the impact of Machine Learning on society

Balance of independent study and scheduled teaching activity

The module will combine lectures, which define the concepts, explain the methods and describe the tools, and practical hands-on workshops, which allow the students to apply the methods in real-life scenarios for data analysis. The students will practice in the lab of the Cyber Security Research Centre, working on real datasets using the tools available.

Blended learning: use the university’s VLE and online tools to provide and deliver content, assessment and feedback, to encourage active learning and to enhance students’ engagement and learning experience.

Learning outcomes

After completing the module, the students should be able to choose suitable methods and available tools and to construct algorithms, programs and components for data analysis using general programming languages, like Python or R, and specialised tools for machine learning, such as Python scikit-learn library, or Spark.

LO1 Reveal a deep understanding of and demonstrate familiarity with the different methods for data analysis and machine learning and assess competently their advantages and limitations
LO2 Develop competence and confidence to make choice of suitable methods and tools for data analysis in various business scenarios to drive organisational success.
LO3 Display familiarity with the various tools and technologies for analysis of real-life datasets like Python, Spark and R
LO4 Develop competent skills in data visualisation and its tools such as matplotlib
LO5 Understand the legal, ethical & professional Issues and estimate the impact of Machine Learning on society

Assessment strategy

The assessment is carried out through coursework and in-class test. The examination will test the students’ retention, an understanding and insight drawn from the entire course (LO1-LO4).

The coursework focuses on applying different methods for machine learning and data analysis of industrial data obtained from public sources. The students will be required to collect data and write some programs for data analysis in Python, R, Java and/or Scala and visualise the results (LO1-LO7). Plagiarism will be limited due to each student working on a different dataset and using different tools for analysis. Some aspects of the coursework will also prepare students for their curricular projects.

The module will be passed on the aggregate mark of all assessment items.

Bibliography

Textbooks:

https://londonmet.rl.talis.com/modules/cs7052

           Core

1. Ethem Alpaydin. Introduction to Machine Learning. The MIT Press Cambridge, Massachusetts London, England, 3rd edition (2015); ISBN-10: 8120350782
https://londonmet.rl.talis.com/modules/cs7052.html

2. Chris Albon. Machine Learning with Python Cookbook. O′Reilly (23 Mar. 2018). ISBN-10: 9781491989388

https://londonmet.rl.talis.com/modules/cs7052.html

           Additional
1. Igor Milovanovic, Dimitry Foures and Giuseppe Vettigli. Python Data Visualization Cookbook. Packt Publishing; 2nd Revised edition edition (30 Nov. 2015). ISBN-10: 1784396699

2. Steven Bird, Ewan Klein and Edward Loper. Natural Language Processing with Python. O'Reilly Media; 1 edition (10 July 2009). ISBN-10: 0596516495

3. Holden Karau and Rachel Warren. High Performance Spark. O′Reilly; 1 edition (2 Jun. 2017). ISBN-10: 9781491943205
           Handbooks
1. Trevor Hastie, Robert Tibshirani and Jerome Friedman (2009). The Elements of Statistical Learning - Data Mining, Inference, and Prediction. 2nd edition. ISBN-10: 0387848576

2. Andrew Kelleher, Adam Kelleher. Applied Machine Learning for Data Scientists and Software Engineers: Framing the First Steps Toward Successful Execution. Addison Wesley, 1st edition (2017); ISBN-10: 0134116542

3. Jake Vander Plas. Python Data Science Handbook: Essential Tools for Working with Data, O'Reilly, 1st Edition (2016); ISBN: 1491912057

4. Benjamin Bengfort, Rebecca Bilbro and Tony Ojeda. Applied Text Analysis with Python. O′Reilly; 1 edition (22 Jun. 2018). ISBN-10: 9781491963043
Journals:
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Neural Networks and Learning Systems

Websites:

Python: https://docs.python.org/3/tutorial/
scikit-learn: https://scikit-learn.org/stable/tutorial/index.html
NLTK: https://www.guru99.com/nltk-tutorial.html
Spark: https://spark.apache.org/docs/latest/quick-start.html
Spark: https://www.tutorialspoint.com/apache_spark/
MLlib: http://spark.apache.org/docs/1.2.1/mllib-guide.html
Matplotlib: https://matplotlib.org/tutorials/index.html
Pyplot: https://matplotlib.org/tutorials/introductory/pyplot.html