Skip to main content

Synthetic Population Generation

Supervisors

Imran Hashmi
(Associate Professor National University of Sciences and Technology (NUST) Islamabad, Senior member IEEE Society; ACM Professional Member; Society of Computer Simulation Associate Professor National University of Sciences and Technology (NUST) Islamabad, Senior member IEEE Society; ACM Professional Member; Society of Computer Simulation)

Suitable for

MSc in Advanced Computer Science
Mathematics and Computer Science, Part C
Computer Science and Philosophy, Part C
Computer Science, Part C

Abstract

Abstract:

This project proposes a novel approach of generating synthetic populations of a selected region using Census data (and available statistics). Traditional methods primarily focus on statistical modelling, but these methods lack accuracy and ability to fit the marginals/summary tables. The proposed approach integrates combinatorial optimization for the generation of persons [1] and graph neural networks (GNNs) for predicting household compositions. The integration promises to yield more realistic synthetic populations, vital for numerous applications ranging from urban planning to market research.

Motivation:

The necessity to create synthetic populations, synthetic mobility, and synthetic environments stems from an urgent need to understand and adapt to the complex dynamics of contemporary urban spaces and establish tools and techniques for next-generation urban intelligence [6]. Hence they serve as necessary ingredients for city-scale Digital Twin development. Synthetic populations enable an understanding of demographic distributions and behavioural patterns, serving as a pivotal tool for policy formulation and urban planning. The generation of synthetic mobility patterns, grounded on reliable datasets like travel surveys, offers predictive insights into daily movement patterns, facilitating the optimization of transportation infrastructures and services. Meanwhile, synthetic environments provide a computational playground where urban planners and policymakers can experiment and envisage urban transformations in a controlled, risk-free setting, aiding in the creation of more sustainable and inclusive urban spaces. Together, these synthetic creations forge a data-driven pathway to envisioning, analysing, and moulding future urban landscapes, embodying a convergence of technology and urbanism that holds the potential to revolutionize city planning and development.

Related Work:

Traditional methods for synthetic population generation, while effective in reproducing statistical summaries, often struggle with capturing complex relationships and generating realistic micro-level data. Mahmood et al., 2023 proposes a novel approach that integrates combinatorial optimization for individual person generation and to predict household compositions [1]. Similar projects like UrbanSim [2] and SynPop2 [3] emphasize microsimulation, while Deep learning approaches like GeoDa explore GNNs for household structures [4]. This project builds on these advancements, offering a potentially powerful framework for future synthetic population generation.

Goals and Objectives:

Interested students will engage with a research group at the Department of Computer Science, University of Oxford:

• To develop a method/extend that synthesizes populations preserving the statistical and relational fidelity of the original Census data.

• To extend combinatorial optimization method [1] for individual person generation ensuring maximal attribute representation.

• To harness the power of GNNs to predict and design realistic household structures and compositions.

• To evaluate the efficacy and accuracy of the proposed method against traditional synthetic population generation techniques.

Technical Details:

1.Data Source: The primary data source is Census data. This encompasses demographics, socio-economic metrics, household compositions, and geographic distributions.

2.Generating Persons using Combinatorial Optimization:

• Approach: Adopt combinatorial optimization techniques to select and create individual synthetic persons. The challenge lies in the vast number of potential combinations.

• Method: Using optimization algorithms, the aim is to determine the best combination of attributes for each synthetic person to ensure statistical consistency with the original Census data.

3. Generating Household Composition using GNNs and Edge Predictions:

• Approach: Recognizing households as graphs, where members are nodes and relationships are edges, we use GNNs to predict the structure and composition of these households.

• Method: Train a GNN on existing household data. Post-training, the network can predict edge formations (relationships) between nodes (persons), thus forming a synthetic household. The edge prediction mechanism helps to ensure realistic household structures.

Technical Requirements and Prerequisites:

1.Software & Tools:

- Programming environments: Python, TensorFlow or PyTorch for GNNs.

2.Hardware:

- High-performance computing cluster for training GNNs and optimization tasks.

- Adequate storage for Census data and synthetic populations.

3. Data:

- Complete and cleaned Census data.

- Training and test splits for GNNs.

4. Skills and Knowledge:

- Familiarity with combinatorial optimization techniques.

- Proficiency in deep learning, especially GNNs, Knowledge of edge prediction techniques

- Understanding of Census data structure and attributes.

Potential Outcomes:

  1. A comprehensive system capable of generating synthetic populations that statistically and relationally match the original Census data.
  2. Demonstration of the advantages of integrating combinatorial optimization with GNNs over traditional methods.
  3. Validation of the synthetic populations against real-world scenarios to test for realism and utility.
  4. A potential framework for adapting the methodology to other datasets beyond Census data, making it a universal tool for synthetic population generation.

With the ever-increasing need for realistic data in research and industry, such innovative approaches are pivotal. This project not only promises advancements in synthetic data generation but also fosters interdisciplinary collaborations between optimization, deep learning, and social sciences. The outcome of this project will provide a foundation for developing robust agent-based models and digital twin at scale.

References:

[1] Imran Mahmood, Nicholas Bishop, Ioannis Zachos, Anisoara Calinescu, and Michael Wooldridge, "A Multi-Objective Combinatorial Optimisation Framework For Large-scale Hierarchical Population Synproject", 37th annual European Simulation and Modelling Conference Toulouse, France 2023

[2] UrbanSim: https://cloud.urbansim.com/docs/general/documentation/urbansim.html

[3] SynPop2: https://github.com/cran/synthpop/blob/master/R/syn.r

[4] GeoDa: http://geodacenter.github.io/

[5] UK Census Data: https://www.nomisweb.co.uk/

[6] Alaa, Ahmed, Boris Van Breugel, Evgeny S. Saveliev, and Mihaela van der Schaar. "How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models." In International Conference on Machine Learning, pp. 290-306. PMLR, 2022.