
Research Project
College of AI, Tsinghua University
Nov 2025
Designed and evaluated an efficiency-oriented coding-agent framework for software engineering tasks, aiming to reduce token usage and latency on SWE-Bench/Lite without sacrificing fix accuracy. Conducted a systematic 4×4 benchmark across major agent scaffolds and frontier LLMs under unified budgets and tools, and built a reproducible evaluation pipeline with structured trajectory logging for thoughts, actions, results, token consumption, and runtime. Defined practical efficiency KPIs and analyzed bottlenecks such as redundant retrieval, repeated test execution, and patch regeneration loops to inform more resource-efficient agent design.
Research Project
College of AI, Tsinghua University
Nov 2025
Designed and evaluated an efficiency-oriented coding-agent framework for software engineering tasks, aiming to reduce token usage and latency on SWE-Bench/Lite without sacrificing fix accuracy. Conducted a systematic 4×4 benchmark across major agent scaffolds and frontier LLMs under unified budgets and tools, and built a reproducible evaluation pipeline with structured trajectory logging for thoughts, actions, results, token consumption, and runtime. Defined practical efficiency KPIs and analyzed bottlenecks such as redundant retrieval, repeated test execution, and patch regeneration loops to inform more resource-efficient agent design.

Independent Research Project Poster
Microsoft / Imperial College London IRP, Advised by Prof. Lee Stott
Sep 2025
University campuses rely on multiple systems to maintain safety, but these often remain reactive and fragmented. This project has developed a Security Agent as part of a broader multi-agent campus management framework, using Microsoft Azure OpenAI Services and cloud-based APIs. The agent processes access logs, sensors and camera inputs in real-time to detect intrusions, assess anomalies and coordinate incident responses. A web-based dashboard allows staff to visualize incidents, monitor alerts, track access logs, and interact with the system through a natural language AI assistant embedded via OpenAI’s tool-calling API. In contrast to traditional rule-based surveillance setups, the system incorporates autonomous decision-making, inter-agent communication and long-term incident tracking. The system was successfully deployed on Azure App Services using synthetic, privacy-preserving data generated from realistic campus scenarios. In testing, the agent flagged unauthorized access, visualized spatial-temporal alert patterns and helped bridge communication between departments through structured incident logs. Through this project, we observed that modular cloud-based agents could ease daily campus operations by streamlining communication and reducing repetitive tasks. This project demonstrates the feasibility of building useful AI tools with minimal code and suggests that the same approach could be adapted to domains such as hospitals, factories, or public infrastructure systems.
Independent Research Project Poster
Microsoft / Imperial College London IRP, Advised by Prof. Lee Stott
Sep 2025
University campuses rely on multiple systems to maintain safety, but these often remain reactive and fragmented. This project has developed a Security Agent as part of a broader multi-agent campus management framework, using Microsoft Azure OpenAI Services and cloud-based APIs. The agent processes access logs, sensors and camera inputs in real-time to detect intrusions, assess anomalies and coordinate incident responses. A web-based dashboard allows staff to visualize incidents, monitor alerts, track access logs, and interact with the system through a natural language AI assistant embedded via OpenAI’s tool-calling API. In contrast to traditional rule-based surveillance setups, the system incorporates autonomous decision-making, inter-agent communication and long-term incident tracking. The system was successfully deployed on Azure App Services using synthetic, privacy-preserving data generated from realistic campus scenarios. In testing, the agent flagged unauthorized access, visualized spatial-temporal alert patterns and helped bridge communication between departments through structured incident logs. Through this project, we observed that modular cloud-based agents could ease daily campus operations by streamlining communication and reducing repetitive tasks. This project demonstrates the feasibility of building useful AI tools with minimal code and suggests that the same approach could be adapted to domains such as hospitals, factories, or public infrastructure systems.

Research Project
May 2025
A data-driven ride-hailing recommendation system for NYC that uses TLC trip data to uncover the most profitable times and locations for drivers. By combining demand forecasting, hourly pay prediction, spatial-temporal analysis, and interactive visualizations, the project helps drivers make smarter operating decisions and improve earnings.
Research Project
May 2025
A data-driven ride-hailing recommendation system for NYC that uses TLC trip data to uncover the most profitable times and locations for drivers. By combining demand forecasting, hourly pay prediction, spatial-temporal analysis, and interactive visualizations, the project helps drivers make smarter operating decisions and improve earnings.

Coursework Project
Big Data Analytics, Advised by Prof. Rossella Arcucci
May 2025
Implemented a five-part wildfire analytics project on the Ferguson Fire dataset, covering linear and nonlinear data compression, satellite data fusion, and data assimilation in both compressed and latent spaces. Built separate notebooks for PCA/autoencoder-based compression and BLUE-style assimilation pipelines, and evaluated performance using reconstruction MSE, latent/reduced-space error, and execution time. The project focused on balancing compression efficiency with physical information preservation for real-world wildfire modelling and satellite-informed state updates.
Coursework Project
Big Data Analytics, Advised by Prof. Rossella Arcucci
May 2025
Implemented a five-part wildfire analytics project on the Ferguson Fire dataset, covering linear and nonlinear data compression, satellite data fusion, and data assimilation in both compressed and latent spaces. Built separate notebooks for PCA/autoencoder-based compression and BLUE-style assimilation pipelines, and evaluated performance using reconstruction MSE, latent/reduced-space error, and execution time. The project focused on balancing compression efficiency with physical information preservation for real-world wildfire modelling and satellite-informed state updates.

Research Project
Advanced Programming
Mar 2025
A C++ project for 2D and 3D image processing, designed to handle image filters, orthographic projections, and volume slicing for inputs such as standard images and CT scan data. The project implemented a modular class-based architecture covering images, volumes, filters, projections, and slices, with command-line interaction and generated image outputs for both 2D and 3D tasks.
Research Project
Advanced Programming
Mar 2025
A C++ project for 2D and 3D image processing, designed to handle image filters, orthographic projections, and volume slicing for inputs such as standard images and CT scan data. The project implemented a modular class-based architecture covering images, volumes, filters, projections, and slices, with command-line interaction and generated image outputs for both 2D and 3D tasks.

Coursework Project
Inversion and Optimisation
Feb 2025
A numerical optimization project on nonlinear timestepping and adjoint methods for a coupled ODE system. By combining implicit solvers, adjoint gradient computation, and quasi-Newton optimization, the project recovered initial conditions from observations and analyzed the system’s sensitivity and long-term dynamical behaviour.
Coursework Project
Inversion and Optimisation
Feb 2025
A numerical optimization project on nonlinear timestepping and adjoint methods for a coupled ODE system. By combining implicit solvers, adjoint gradient computation, and quasi-Newton optimization, the project recovered initial conditions from observations and analyzed the system’s sensitivity and long-term dynamical behaviour.

Coursework Project
Inversion and Optimisation
Feb 2025
A numerical methods project on the two-dimensional steady-state advection–diffusion equation, focusing on finite difference discretisation and linear solver analysis. The project constructed sparse linear systems using a five-point stencil and upwind scheme, examined key matrix properties such as symmetry, rank, nullity, condition number, and sparsity, and compared direct and iterative solution strategies for large-scale problems. It also analyzed how the diffusion coefficient affects matrix conditioning and numerical stability, highlighting the role of iterative solvers and preconditioning in advection-dominated regimes.
Coursework Project
Inversion and Optimisation
Feb 2025
A numerical methods project on the two-dimensional steady-state advection–diffusion equation, focusing on finite difference discretisation and linear solver analysis. The project constructed sparse linear systems using a five-point stencil and upwind scheme, examined key matrix properties such as symmetry, rank, nullity, condition number, and sparsity, and compared direct and iterative solution strategies for large-scale problems. It also analyzed how the diffusion coefficient affects matrix conditioning and numerical stability, highlighting the role of iterative solvers and preconditioning in advection-dominated regimes.

Coursework Project
Inversion and Optimisation, Advised by Dr. Simon Warder
Feb 2025
A scientific computing project on acoustic full waveform inversion using Devito to reconstruct subsurface velocity models from seismic shot records. The project implemented gradient-based inversion and analyzed how space order, absorbing boundaries, step size, and grid resolution affect convergence, stability, and reconstruction quality.
Coursework Project
Inversion and Optimisation, Advised by Dr. Simon Warder
Feb 2025
A scientific computing project on acoustic full waveform inversion using Devito to reconstruct subsurface velocity models from seismic shot records. The project implemented gradient-based inversion and analyzed how space order, absorbing boundaries, step size, and grid resolution affect convergence, stability, and reconstruction quality.

Coursework Project
Inversion and Optimisation
Feb 2025
A numerical optimization project comparing Newton’s method and gradient descent on a nonlinear objective function. The project implemented gradient and Hessian calculations, studied step-size effects on convergence, and used Hessian eigenvalue analysis and visualization to explain optimization speed and stability.
Coursework Project
Inversion and Optimisation
Feb 2025
A numerical optimization project comparing Newton’s method and gradient descent on a nonlinear objective function. The project implemented gradient and Hessian calculations, studied step-size effects on convergence, and used Hessian eigenvalue analysis and visualization to explain optimization speed and stability.

Research Project
Deep Learning, Advised by Prof. Thomas M. Davison
Jan 2025
A deep learning project for forecasting lightning strikes across the U.S. using multi-channel storm imagery and lightning time-series data. The project explored ConvLSTM and U-Net architectures to predict future VIL radar frames, reconstruct missing VIL information from satellite channels, and generate probability maps of lightning strikes, followed by Poisson-based post-processing to recover discrete strike events.
Research Project
Deep Learning, Advised by Prof. Thomas M. Davison
Jan 2025
A deep learning project for forecasting lightning strikes across the U.S. using multi-channel storm imagery and lightning time-series data. The project explored ConvLSTM and U-Net architectures to predict future VIL radar frames, reconstruct missing VIL information from satellite channels, and generate probability maps of lightning strikes, followed by Poisson-based post-processing to recover discrete strike events.

Coursework Project
Deep Learning
Dec 2024
A deep learning project on recovering missing measurements in multi-decade sequential weather data. The project built PyTorch data pipelines for paired corrupted and uncorrupted daily records, designed and trained a Transformer-based imputation model, and reconstructed missing values in a held-out test decade. It further explored feature engineering with first- and second-order differences, normalization, masking strategies, and hyperparameter tuning to improve imputation quality across multiple weather variables.
Coursework Project
Deep Learning
Dec 2024
A deep learning project on recovering missing measurements in multi-decade sequential weather data. The project built PyTorch data pipelines for paired corrupted and uncorrupted daily records, designed and trained a Transformer-based imputation model, and reconstructed missing values in a held-out test decade. It further explored feature engineering with first- and second-order differences, normalization, masking strategies, and hyperparameter tuning to improve imputation quality across multiple weather variables.

Coursework Project
Deep Learning, Advised by Prof. Carlos Cueto Mondejar
Dec 2024
A deep learning project on recovering missing regions in corrupted brain MRI images. The project used diffusion-generated synthetic data for training, designed corruption patterns to match the test set, and compared an autoencoder baseline with a U-Net model. The final U-Net achieved improved pixel-level reconstruction quality on unseen MRI images.
Coursework Project
Deep Learning, Advised by Prof. Carlos Cueto Mondejar
Dec 2024
A deep learning project on recovering missing regions in corrupted brain MRI images. The project used diffusion-generated synthetic data for training, designed corruption patterns to match the test set, and compared an autoencoder baseline with a U-Net model. The final U-Net achieved improved pixel-level reconstruction quality on unseen MRI images.

Research Project
Applied Computationl / Data Science, Advised by Dr. James Percival
Nov 2024
A flood risk prediction and analysis tool for the UK that estimates flood risk from rivers, seas, and surface water, as well as median house prices and local authority information for postcodes and arbitrary locations. The project combines predictive modeling, geospatial analysis, rainfall and water-level lookup, and interactive visualizations to support flood risk assessment and identify areas at immediate risk.
Research Project
Applied Computationl / Data Science, Advised by Dr. James Percival
Nov 2024
A flood risk prediction and analysis tool for the UK that estimates flood risk from rivers, seas, and surface water, as well as median house prices and local authority information for postcodes and arbitrary locations. The project combines predictive modeling, geospatial analysis, rainfall and water-level lookup, and interactive visualizations to support flood risk assessment and identify areas at immediate risk.

Coursework Project
Data Science and Machine Learning
Nov 2024
Built and evaluated classification models to predict passenger transportation outcomes using demographic, spending, and travel features. Compared tuned Logistic Regression and KNN models with 5-fold cross-validation, and selected Logistic Regression as the final model based on stronger AUC-ROC and overall classification performance. Further optimized decision thresholds for different operational scenarios, including balanced error control and high-recall settings where false negatives were more costly.
Coursework Project
Data Science and Machine Learning
Nov 2024
Built and evaluated classification models to predict passenger transportation outcomes using demographic, spending, and travel features. Compared tuned Logistic Regression and KNN models with 5-fold cross-validation, and selected Logistic Regression as the final model based on stronger AUC-ROC and overall classification performance. Further optimized decision thresholds for different operational scenarios, including balanced error control and high-recall settings where false negatives were more costly.

Coursework Project
Data Science and Machine Learning
Nov 2024
Built an end-to-end regression pipeline to predict significant wave height (Hsig) from oceanic and environmental variables for maritime risk assessment. The project combined data preprocessing, feature engineering, and model selection, including temperature binning, wind–temperature interaction terms, imputation, scaling, and categorical encoding. A linear regression baseline was compared against ensemble methods using 5-fold cross-validation and RandomizedSearchCV, with a tuned Random Forest selected as the final model, achieving strong predictive performance on unseen test data.
Coursework Project
Data Science and Machine Learning
Nov 2024
Built an end-to-end regression pipeline to predict significant wave height (Hsig) from oceanic and environmental variables for maritime risk assessment. The project combined data preprocessing, feature engineering, and model selection, including temperature binning, wind–temperature interaction terms, imputation, scaling, and categorical encoding. A linear regression baseline was compared against ensemble methods using 5-fold cross-validation and RandomizedSearchCV, with a tuned Random Forest selected as the final model, achieving strong predictive performance on unseen test data.

Coursework Project
Computational Mathematics, Advised by Prof. Matthew Piggott
Oct 2024
Investigated core ideas in computational mathematics through a series of theoretical and numerical studies, including model verification and validation, polynomial and spline interpolation, numerical quadrature on truncated infinite domains, and higher-order Taylor-series solvers for ordinary differential equations. Implemented idealized examples in Python to compare methods, analyze convergence behavior, and evaluate error using RMS and max norms. The project emphasized not only obtaining accurate numerical results, but also understanding why a model or algorithm works, when it fails, and how verification, validation, and calibration differ in practice.
Coursework Project
Computational Mathematics, Advised by Prof. Matthew Piggott
Oct 2024
Investigated core ideas in computational mathematics through a series of theoretical and numerical studies, including model verification and validation, polynomial and spline interpolation, numerical quadrature on truncated infinite domains, and higher-order Taylor-series solvers for ordinary differential equations. Implemented idealized examples in Python to compare methods, analyze convergence behavior, and evaluate error using RMS and max norms. The project emphasized not only obtaining accurate numerical results, but also understanding why a model or algorithm works, when it fails, and how verification, validation, and calibration differ in practice.

Coursework Project
Numerical Programming, Advised by Dr. Marijan Beg
Oct 2024
Developed a Metropolis–Hastings Monte Carlo simulator for a two-dimensional Heisenberg spin lattice to explore magnetic skyrmion formation. The project implemented core energy terms including Zeeman, uniaxial anisotropy, exchange, and Dzyaloshinskii–Moriya interactions, alongside spin-lattice visualization, code optimization, documentation, and packaging. The final workflow used simulation-driven energy minimization to search for equilibrium spin configurations and investigate whether a magnetic skyrmion could emerge.
Coursework Project
Numerical Programming, Advised by Dr. Marijan Beg
Oct 2024
Developed a Metropolis–Hastings Monte Carlo simulator for a two-dimensional Heisenberg spin lattice to explore magnetic skyrmion formation. The project implemented core energy terms including Zeeman, uniaxial anisotropy, exchange, and Dzyaloshinskii–Moriya interactions, alongside spin-lattice visualization, code optimization, documentation, and packaging. The final workflow used simulation-driven energy minimization to search for equilibrium spin configurations and investigate whether a magnetic skyrmion could emerge.

Entrepreneurial Project National First Prize
China-US Young Maker Competition
Aug 2024
An entrepreneurial project focused on a liquid-cooled underwater thruster for diving and underwater imaging scenarios. The project proposes a self-developed split-type liquid-cooled motor design that removes dynamic shaft sealing to improve waterproof reliability and diving depth, while using active liquid cooling to reduce motor temperature by over 35% and increase effective payload by 30%. The product targets recreational divers, professional divers, and underwater photographers, and won the National First Prize in the China-US Young Maker Competition.
Entrepreneurial Project National First Prize
China-US Young Maker Competition
Aug 2024
An entrepreneurial project focused on a liquid-cooled underwater thruster for diving and underwater imaging scenarios. The project proposes a self-developed split-type liquid-cooled motor design that removes dynamic shaft sealing to improve waterproof reliability and diving depth, while using active liquid cooling to reduce motor temperature by over 35% and increase effective payload by 30%. The product targets recreational divers, professional divers, and underwater photographers, and won the National First Prize in the China-US Young Maker Competition.

Bachelor's Thesis Best Paper Award
Undergraduate Program
May 2024
Cardiovascular disease is a leading cause of mortality globally, making precise and efficient diagnosis crucial. Electrocardiography (ECG) serves as a conventional diagnostic tool for cardiovascular diseases, capable of detecting various cardiac rhythm abnormalities. However, relying solely on physicians for diagnosis is complex and time-consuming. Therefore, computer-aided diagnosis, utilizing machine learning algorithms to preliminarily identify cardiac rhythm abnormalities from ECG data, holds substantial practical significance. While convolutional neural network (CNN) models have demonstrated efficacy in ECG recognition, challenges persist due to insufficient local data in healthcare institutions and privacy concerns prohibiting data sharing across institutions. Federated learning, a model where clients can jointly train global models locally without sharing raw data, presents a solution to the above issue. Nevertheless, traditional centralized federated learning still faces risks of inefficiency and unreliability. This paper introduces an efficient privacy-preserving arrhythmia detection method based on decentralized federated learning, addressing the reliance on a central server in centralized federated learning for arrhythmia detection. The results of our simulation experiments on real-world datasets demonstrate the method's high level of detection accuracy and efficiency, with detection accuracy exceeding 95% and training efficiency significantly surpassing that of centralized federated learning under equivalent parameter settings.
Bachelor's Thesis Best Paper Award
Undergraduate Program
May 2024
Cardiovascular disease is a leading cause of mortality globally, making precise and efficient diagnosis crucial. Electrocardiography (ECG) serves as a conventional diagnostic tool for cardiovascular diseases, capable of detecting various cardiac rhythm abnormalities. However, relying solely on physicians for diagnosis is complex and time-consuming. Therefore, computer-aided diagnosis, utilizing machine learning algorithms to preliminarily identify cardiac rhythm abnormalities from ECG data, holds substantial practical significance. While convolutional neural network (CNN) models have demonstrated efficacy in ECG recognition, challenges persist due to insufficient local data in healthcare institutions and privacy concerns prohibiting data sharing across institutions. Federated learning, a model where clients can jointly train global models locally without sharing raw data, presents a solution to the above issue. Nevertheless, traditional centralized federated learning still faces risks of inefficiency and unreliability. This paper introduces an efficient privacy-preserving arrhythmia detection method based on decentralized federated learning, addressing the reliance on a central server in centralized federated learning for arrhythmia detection. The results of our simulation experiments on real-world datasets demonstrate the method's high level of detection accuracy and efficiency, with detection accuracy exceeding 95% and training efficiency significantly surpassing that of centralized federated learning under equivalent parameter settings.

Course Paper
Machine Learning
Jan 2024
Lung cancer mortality has been rising steadily and has become a major public health concern. In research on lung cancer, the relationship between environmental quality and lung cancer has received increasing attention. This paper uses a neural network regression algorithm to predict lung cancer mortality under environmental quality factors and compares its performance with other baseline methods. It also examines the relative impact of different air pollutants on lung cancer mortality. Experiments on a real-world dataset show that the neural network regression algorithm outperforms other methods in this task, and that PM2.5 has the greatest influence on lung cancer mortality among the environmental factors considered.
Course Paper
Machine Learning
Jan 2024
Lung cancer mortality has been rising steadily and has become a major public health concern. In research on lung cancer, the relationship between environmental quality and lung cancer has received increasing attention. This paper uses a neural network regression algorithm to predict lung cancer mortality under environmental quality factors and compares its performance with other baseline methods. It also examines the relative impact of different air pollutants on lung cancer mortality. Experiments on a real-world dataset show that the neural network regression algorithm outperforms other methods in this task, and that PM2.5 has the greatest influence on lung cancer mortality among the environmental factors considered.

Course Paper
Data Analytics for Smart Cities
Oct 2023
With the continuous development of information technology, marketing strategies have become increasingly diverse. In Macau, as a tourism-oriented city, travelers’ and users’ responses to different promotional strategies can strongly influence advertising effectiveness. This course paper focuses on advertisement strategy analysis and presents a system based on face detection and eye tracking to evaluate user attention from the consumer perspective. The project includes an MTCNN-based face detection method, compared against the YOLO series of object detectors, and a real-time eye-tracking method based on pupil localization with geometric shape features under a 2D mapping model. The results show that the proposed approach achieves better accuracy and generalizability than the compared methods.
Course Paper
Data Analytics for Smart Cities
Oct 2023
With the continuous development of information technology, marketing strategies have become increasingly diverse. In Macau, as a tourism-oriented city, travelers’ and users’ responses to different promotional strategies can strongly influence advertising effectiveness. This course paper focuses on advertisement strategy analysis and presents a system based on face detection and eye tracking to evaluate user attention from the consumer perspective. The project includes an MTCNN-based face detection method, compared against the YOLO series of object detectors, and a real-time eye-tracking method based on pupil localization with geometric shape features under a 2D mapping model. The results show that the proposed approach achieves better accuracy and generalizability than the compared methods.

Course Paper
Big Data Analytics for Tourism Industry
May 2022
This paper investigates the relationship between provincial internet popularity and local tourist volume by using internet popularity data from 2011 to 2018 to predict tourist arrivals in the following year. It provides a quantitative comparison of linear regression, logistic regression, and multilayer perceptron models for this forecasting task, and further compares these approaches with time-series forecasting methods. The results show that internet popularity is correlated with tourist volume, and that with a sufficient amount of data, the multilayer perceptron performs slightly better than linear regression for this type of prediction problem.
Course Paper
Big Data Analytics for Tourism Industry
May 2022
This paper investigates the relationship between provincial internet popularity and local tourist volume by using internet popularity data from 2011 to 2018 to predict tourist arrivals in the following year. It provides a quantitative comparison of linear regression, logistic regression, and multilayer perceptron models for this forecasting task, and further compares these approaches with time-series forecasting methods. The results show that internet popularity is correlated with tourist volume, and that with a sufficient amount of data, the multilayer perceptron performs slightly better than linear regression for this type of prediction problem.

Course Paper
Big Data Processing and Applications
Apr 2022
The current friend recommendation systems in social networks are still at an early stage of development. Most existing approaches rely primarily on the number of mutual friends between users, which often leads to limited accuracy and a narrow recommendation range. This paper studies a basic friend recommendation algorithm based on mutual-friend counts. Using the original dataset model, it applies MapReduce to process, aggregate, and organize the dataset, extract and count mutual friends, and rank the results accordingly. In addition, a monkey algorithm is introduced to optimize recommendation accuracy and expand the recommendation range. The proposed method simulates real interpersonal relationship networks and achieves higher accuracy together with broader recommendation coverage.
Course Paper
Big Data Processing and Applications
Apr 2022
The current friend recommendation systems in social networks are still at an early stage of development. Most existing approaches rely primarily on the number of mutual friends between users, which often leads to limited accuracy and a narrow recommendation range. This paper studies a basic friend recommendation algorithm based on mutual-friend counts. Using the original dataset model, it applies MapReduce to process, aggregate, and organize the dataset, extract and count mutual friends, and rank the results accordingly. In addition, a monkey algorithm is introduced to optimize recommendation accuracy and expand the recommendation range. The proposed method simulates real interpersonal relationship networks and achieves higher accuracy together with broader recommendation coverage.