CLO | Cloud Computing and Big Data |
Cloud computing and big data techniques are changing the way we collect, analyse, store and use data. This course looks at the theoretical and practical technologies behind big data and cloud computing.
Frequency
This subject has been discontinued; no further courses are planned.
Objectives
The aims of this course are to show how cloud computing and big data techniques can be used to solve massive scale problems. The course will aim to introduce students to both the theoretical background of cloud computing as well as the practical applications. The processing of large datasets using Big Data techniques, map-reduce and other techniques will be a large focus. In addition the course will cover approaches to building applications and managing them on the cloud.
Contents
The course will cover the following topics:
Origins and background of Cloud Computing Grids, Parallel computing, Functional programming, Infrastructure as a Service
Using Cloud services Amazon EC2 fundamentals, Concepts of IaaS, Openstack and Private Cloud
Map-reduce and Big Data analytics Map-reduce theory, Hadoop, Hive and Pig, Functional decomposition
Theory of Cloud Computing CAP Theorem, Eventual Consistency, Shared Nothing architectures, Dynamo algorithm; Amdahl’s law, Gustafson’s Law, Karp-Flatt Metric; Lambda Architecture, Multi-tenancy, PaaS and SaaS models
NoSQL databases and scalable data storage alternatives, graph databases, Mongo and Cassandra
Case studies and examples
Futures and alternatives to Map Reduce Real time stream analytics, Generalized functional decomposition, Apache Spark and Storm, Futures
Requirements
The course introduces the fundamentals of creating big data processing systems in the cloud.
Practicals will require programming in Python, as well as the use of the UNIX command line / bash shell. While students do not need significant experience in Python itself, some serious programming experience is required as the course exercises require the students to write big data analytics code. This course is not suitable for students who have no practical experience in writing code.
Other aspects that will be helpful for the course are:
- Functional programming experience: we use extensive lambda expressions and functional patterns.
- Simple SQL expressions
- Distributed computing and basics of IP networking
Students who have not programmed in Python are expected to use the resources on Python to gain experience before the class. There are also pointers to resources on the command line and SQL. All students are expected to complete the pre-study exercise which looks at lambda expressions in Python.