Module catalogue

CS7079 - Data Warehousing and Big Data (2020/21)

Module specification

Module approved to run in 2020/21

Module title

Data Warehousing and Big Data

Module level

Masters (07)

Credit rating for module

School

School of Computing and Digital Media

Total study hours

200

48 hours	Scheduled learning & teaching activities
50 hours	Assessment Preparation / Delivery
102 hours	Guided independent study

Assessment components

Type	Weighting	Qualifying mark	Description
Coursework	60%		Based on a given business scenario develop a data warehouse scheme, populate it with test data, demonstrate OLAP-basedda
Unseen Examination	40%		Show knowledge of the technologies and tools behind Data warehousing and Big Data processing, demonstrate skills for fea

Running in 2020/21

(Please note that module timeslots are subject to change)

Period	Campus	Day	Time	Module Leader
Autumn semester	North	Thursday	Morning

Module summary

The module aims to strengthen students’ skills in data technologies ranging from database and data warehousing to Big Data. First, it will provide students with good understanding of database concepts and database management systems in reference to modern enterprise-level database development. Once gaining good skills in database development, students will be able to study and gain an in-depth understanding of data warehousing which include concepts and analytical foundations as well as data warehousing development. Through intensive hands-on sessions, the students will be able to get familiar with related technological trends and development in the field. the module will leverage a portfolio of SQL server tools that include SQL Server DBMS, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS) and SQL Server Analysis Service (SSAS) to provide hands-on experience in implementing a reporting solution through a combination of assignments and lab exercises.

The module introduces also the foundation of Big data management based on Apache Hadoop platform and provides students with a broad introduction to Big Data technologies. This will involve hands-on sessions, designed for data analysts, business intelligence specialists, developers, administrators or anyone who has a desire to learn how to process and manage massive and complex data to infer knowledge from data. Topics include Hadoop, HDFS, MapReduce, Spark, Sqoop, Hive, Pig and MLlib.

Prior learning requirements

N/A

Syllabus

A brief outline of the indicative syllabus in narrative form identifying key subject areas to be addressed in discrete elements of the course

• Introduction of database models and system architecture LO1

• Data processing using DBMS; data definition and manipulation using SQL LO1

• Introduction of Data warehouse concepts and analytical foundations. LO2

• Data warehouse development; system architecture and data transformation; investigation of techniques for distributing and mining data. LO2

• An introduction to Big Data technology stack, emerging trends and use cases where Big Data outperforms traditional data warehouse. LO3

• An overview of the function components of Big Data technology stack including open source tools like Hadoop, HDFS, MapReduce, Yarn, Spark, Storm, Hive for massively parallel on-disk data processing. LO4

• An overview of batch and real-time data ingestion patterns using Apache Flume and Kafka, data transformation techniques and generation of summary statistics using Apache Spark. LO3, LO4

• Data Analytics on Hadoop platform using Apache Spark for data analysis on HDFS LO4

Balance of independent study and scheduled teaching activity

The delivery and the teaching of the materials will be through a mixture of lectures, workshops, and laboratory and tutorial sessions and under the following strategy: the first hour of lectures will be delivered to introduce concepts and principles of the module’s topics. The second hour will be run in a form of workshop to further explain approaches through real life examples. Each lecture will be followed either by a laboratory or a seminar. Seminar time might be used to facilitate group meetings to cultivate research oriented skills or to introduce students to state of the art not covered in the specific syllabus. For the self-study exercises and assessment, students are expected to spend time on unsupervised work in the computer laboratories, searching primary sources of information in the library and in private study. It is also expected that students will dedicate hours for coursework and case study implementation and for summative assessment (final exam). The teaching and learning methods will encourage open and self-directed learning, deepen students’ understanding and stress analytical skills.

Blended learning: use the university’s VLE and online tools to provide and deliver content, assessment and feedback, to encourage active learning and to enhance students’ engagement and learning experience.

Learning outcomes

LO1 Understand and demonstrate familiarity with the operation of DBMS systems and appreciate the complexity of developing real-life applications.

LO2 Learn about the process of developing, configuring, utilising and managing of data warehouse applications in a variety of contexts.

LO3 Display an understanding of the principles of organisation, validation, transformation and analysing large volumes of data on specialized platforms (BigData) from various data sources – files, databases, server logs, etc.

LO4 Demonstrate that one comprehend the advantage and limitations of Big Data technologies, including predictive analytics and build the confidence to interpret data as insights to drive organisational success.

Assessment strategy

The assessment will consist of one coursework and an unseen examination. The examination will test the students’ retention, an understanding and insight drawn from the entire course (LO1, LO2, LO3, LO4). The Coursework will have one piece of assignment that assesses part of the practical aspects of the module; Students will be given a case study that will be a scaled down version of a real life Big data application. Also in the coursework students will be required to demonstrate their awareness of recent research developments and current Big Data technology trends and writing an essay in which they will contrast new approaches to conventional ones (LO3, LO4). Some aspects of the coursework will also prepare students for their curricular projects.

The module will be passed on the aggregate mark of all assessment items

Bibliography

Core Textbook:

1. Ponniah, P. (2001), Data warehousing fundamentals, Joh Wiley & Sons; ISBN: 0-471-41254-6
2. White, T. (2015), Hadoop: The definite Guide. Sebastopol: O’Reilly & Associates. ISBN-10: 1491901632

Other Texts:

3. Inmon, W.H (2005), ‘Building the data warehouse’ 4th Edition’ Joh Wiley,
ISBN: 978-0-764-59944-6.
4. Rainardi, V. (2007), Building a data warehouse with examples in SQL server, Apress; ISBN-13:978-1-59059-931-0
5. Grover, M., Malaska, T., Seidman, J., Shapira, G. (2015), Hadoop Application Architectures: Designing Real-World Big Data Applications, O'Reilly: ISBN 1491900083
Journals:

1. International Journal of Data Warehousing and Mining; ISSN: 1548-3924
2. International Journal of Data Warehousing and Mining, DOI:10.4018/IJDWM.2018010102

Websites:

1. https://www.tutorialspoint.com/dwh/dwh_data_warehousing.htm
2. https://www.sas.com/en_gb/insights/big-data/what-is-big-data.html

Electronic Databases:

https://www.sas.com/en_us/insights/data-management/data-warehouse.html

Social Media Sources:
https://insidebigdata.com/2018/10/06/4-major-ways-big-data-impacting-social-media-marketing/

Other

module specification