Projects - Taozhi Chen

2025

Efficient Software Development Agent Framework

Research Project

College of AI, Tsinghua University

Nov 2025

Designed and evaluated an efficiency-oriented coding-agent framework for software engineering tasks, aiming to reduce token usage and latency on SWE-Bench/Lite without sacrificing fix accuracy. Conducted a systematic 4×4 benchmark across major agent scaffolds and frontier LLMs under unified budgets and tools, and built a reproducible evaluation pipeline with structured trajectory logging for thoughts, actions, results, token consumption, and runtime. Defined practical efficiency KPIs and analyzed bottlenecks such as redundant retrieval, repeated test execution, and patch regeneration loops to inform more resource-efficient agent design.

Efficient Software Development Agent Framework

Research Project

College of AI, Tsinghua University

Nov 2025

Smart Campus Management System Using Microsoft Azure AI Agent Services

Independent Research Project Poster

Microsoft / Imperial College London IRP, Advised by Prof. Lee Stott

Sep 2025

University campuses rely on multiple systems to maintain safety, but these often remain reactive and fragmented. This project has developed a Security Agent as part of a broader multi-agent campus management framework, using Microsoft Azure OpenAI Services and cloud-based APIs. The agent processes access logs, sensors and camera inputs in real-time to detect intrusions, assess anomalies and coordinate incident responses. A web-based dashboard allows staff to visualize incidents, monitor alerts, track access logs, and interact with the system through a natural language AI assistant embedded via OpenAI’s tool-calling API. In contrast to traditional rule-based surveillance setups, the system incorporates autonomous decision-making, inter-agent communication and long-term incident tracking. The system was successfully deployed on Azure App Services using synthetic, privacy-preserving data generated from realistic campus scenarios. In testing, the agent flagged unauthorized access, visualized spatial-temporal alert patterns and helped bridge communication between departments through structured incident logs. Through this project, we observed that modular cloud-based agents could ease daily campus operations by streamlining communication and reducing repetitive tasks. This project demonstrates the feasibility of building useful AI tools with minimal code and suggests that the same approach could be adapted to domains such as hospitals, factories, or public infrastructure systems.

Smart Campus Management System Using Microsoft Azure AI Agent Services

Independent Research Project Poster

Microsoft / Imperial College London IRP, Advised by Prof. Lee Stott

Sep 2025

NYC Taxi Trip Analysis - Interactive Spatial-Temporal Recommendation System

Research Project

Data Observatory

May 2025

A data-driven ride-hailing recommendation system for NYC that uses TLC trip data to uncover the most profitable times and locations for drivers. By combining demand forecasting, hourly pay prediction, spatial-temporal analysis, and interactive visualizations, the project helps drivers make smarter operating decisions and improve earnings.

NYC Taxi Trip Analysis - Interactive Spatial-Temporal Recommendation System

Research Project

Data Observatory

May 2025

Ferguson Wildfire Data Compression and Assimilation

Coursework Project

Big Data Analytics, Advised by Prof. Rossella Arcucci

May 2025

Implemented a five-part wildfire analytics project on the Ferguson Fire dataset, covering linear and nonlinear data compression, satellite data fusion, and data assimilation in both compressed and latent spaces. Built separate notebooks for PCA/autoencoder-based compression and BLUE-style assimilation pipelines, and evaluated performance using reconstruction MSE, latent/reduced-space error, and execution time. The project focused on balancing compression efficiency with physical information preservation for real-world wildfire modelling and satellite-informed state updates.

Ferguson Wildfire Data Compression and Assimilation

Coursework Project

Big Data Analytics, Advised by Prof. Rossella Arcucci

May 2025

Image Filters, Projections and Slices

Research Project

Advanced Programming

Mar 2025

A C++ project for 2D and 3D image processing, designed to handle image filters, orthographic projections, and volume slicing for inputs such as standard images and CT scan data. The project implemented a modular class-based architecture covering images, volumes, filters, projections, and slices, with command-line interaction and generated image outputs for both 2D and 3D tasks.

Image Filters, Projections and Slices

Research Project

Advanced Programming

Mar 2025

Nonlinear Timestepping and Adjoint-Based Inversion

Coursework Project

Inversion and Optimisation

Feb 2025

A numerical optimization project on nonlinear timestepping and adjoint methods for a coupled ODE system. By combining implicit solvers, adjoint gradient computation, and quasi-Newton optimization, the project recovered initial conditions from observations and analyzed the system’s sensitivity and long-term dynamical behaviour.

Nonlinear Timestepping and Adjoint-Based Inversion

Coursework Project

Inversion and Optimisation

Feb 2025

Finite Difference Solvers for the 2D Steady-State Advection-Diffusion Equation

Coursework Project

Inversion and Optimisation

Feb 2025

A numerical methods project on the two-dimensional steady-state advection–diffusion equation, focusing on finite difference discretisation and linear solver analysis. The project constructed sparse linear systems using a five-point stencil and upwind scheme, examined key matrix properties such as symmetry, rank, nullity, condition number, and sparsity, and compared direct and iterative solution strategies for large-scale problems. It also analyzed how the diffusion coefficient affects matrix conditioning and numerical stability, highlighting the role of iterative solvers and preconditioning in advection-dominated regimes.

Finite Difference Solvers for the 2D Steady-State Advection-Diffusion Equation

Coursework Project

Inversion and Optimisation

Feb 2025

Full Waveform Inversion for Acoustic Velocity Model Reconstruction

Coursework Project

Inversion and Optimisation, Advised by Dr. Simon Warder

Feb 2025

A scientific computing project on acoustic full waveform inversion using Devito to reconstruct subsurface velocity models from seismic shot records. The project implemented gradient-based inversion and analyzed how space order, absorbing boundaries, step size, and grid resolution affect convergence, stability, and reconstruction quality.

Full Waveform Inversion for Acoustic Velocity Model Reconstruction

Coursework Project

Inversion and Optimisation, Advised by Dr. Simon Warder

Feb 2025

Newton and Gradient Descent Methods for Nonlinear Optimization

Coursework Project

Inversion and Optimisation

Feb 2025

A numerical optimization project comparing Newton’s method and gradient descent on a nonlinear objective function. The project implemented gradient and Hessian calculations, studied step-size effects on convergence, and used Hessian eigenvalue analysis and visualization to explain optimization speed and stability.

Newton and Gradient Descent Methods for Nonlinear Optimization

Coursework Project

Inversion and Optimisation

Feb 2025

Deep Learning for Lightning Strike Forecasting

Research Project

Deep Learning, Advised by Prof. Thomas M. Davison

Jan 2025

A deep learning project for forecasting lightning strikes across the U.S. using multi-channel storm imagery and lightning time-series data. The project explored ConvLSTM and U-Net architectures to predict future VIL radar frames, reconstruct missing VIL information from satellite channels, and generate probability maps of lightning strikes, followed by Poisson-based post-processing to recover discrete strike events.

Deep Learning for Lightning Strike Forecasting

Research Project

Deep Learning, Advised by Prof. Thomas M. Davison

Jan 2025

2024

Transformer-Based Imputation for Sequential Weather Data

Coursework Project

Deep Learning

Dec 2024

A deep learning project on recovering missing measurements in multi-decade sequential weather data. The project built PyTorch data pipelines for paired corrupted and uncorrupted daily records, designed and trained a Transformer-based imputation model, and reconstructed missing values in a held-out test decade. It further explored feature engineering with first- and second-order differences, normalization, masking strategies, and hyperparameter tuning to improve imputation quality across multiple weather variables.

Transformer-Based Imputation for Sequential Weather Data

Coursework Project

Deep Learning

Dec 2024

U-Net for MRI Image Inpainting

Coursework Project

Deep Learning, Advised by Prof. Carlos Cueto Mondejar

Dec 2024

A deep learning project on recovering missing regions in corrupted brain MRI images. The project used diffusion-generated synthetic data for training, designed corruption patterns to match the test set, and compared an autoencoder baseline with a U-Net model. The final U-Net achieved improved pixel-level reconstruction quality on unseen MRI images.

U-Net for MRI Image Inpainting

Coursework Project

Deep Learning, Advised by Prof. Carlos Cueto Mondejar

Dec 2024

Flood Risk Prediction Tool

Research Project

Applied Computationl / Data Science, Advised by Dr. James Percival

Nov 2024

A flood risk prediction and analysis tool for the UK that estimates flood risk from rivers, seas, and surface water, as well as median house prices and local authority information for postcodes and arbitrary locations. The project combines predictive modeling, geospatial analysis, rainfall and water-level lookup, and interactive visualizations to support flood risk assessment and identify areas at immediate risk.

Flood Risk Prediction Tool

Research Project

Applied Computationl / Data Science, Advised by Dr. James Percival

Nov 2024

Passenger Transportation Classification and Threshold Optimization

Coursework Project

Data Science and Machine Learning

Nov 2024

Built and evaluated classification models to predict passenger transportation outcomes using demographic, spending, and travel features. Compared tuned Logistic Regression and KNN models with 5-fold cross-validation, and selected Logistic Regression as the final model based on stronger AUC-ROC and overall classification performance. Further optimized decision thresholds for different operational scenarios, including balanced error control and high-recall settings where false negatives were more costly.

Passenger Transportation Classification and Threshold Optimization

Coursework Project

Data Science and Machine Learning

Nov 2024

Predicting Significant Wave Heights for Ocean Exploration

Coursework Project

Data Science and Machine Learning

Nov 2024

Built an end-to-end regression pipeline to predict significant wave height (Hsig) from oceanic and environmental variables for maritime risk assessment. The project combined data preprocessing, feature engineering, and model selection, including temperature binning, wind–temperature interaction terms, imputation, scaling, and categorical encoding. A linear regression baseline was compared against ensemble methods using 5-fold cross-validation and RandomizedSearchCV, with a tuned Random Forest selected as the final model, achieving strong predictive performance on unseen test data.

Predicting Significant Wave Heights for Ocean Exploration

Coursework Project

Data Science and Machine Learning

Nov 2024

Verification, Interpolation, Quadrature, and ODE Solvers

Coursework Project

Computational Mathematics, Advised by Prof. Matthew Piggott

Oct 2024

Investigated core ideas in computational mathematics through a series of theoretical and numerical studies, including model verification and validation, polynomial and spline interpolation, numerical quadrature on truncated infinite domains, and higher-order Taylor-series solvers for ordinary differential equations. Implemented idealized examples in Python to compare methods, analyze convergence behavior, and evaluate error using RMS and max norms. The project emphasized not only obtaining accurate numerical results, but also understanding why a model or algorithm works, when it fails, and how verification, validation, and calibration differ in practice.

Verification, Interpolation, Quadrature, and ODE Solvers

Coursework Project

Computational Mathematics, Advised by Prof. Matthew Piggott

Oct 2024

Skyrmion Impossible: A Monte Carlo Quest for a Magnetic Quasi-Particle

Coursework Project

Numerical Programming, Advised by Dr. Marijan Beg

Oct 2024

Developed a Metropolis–Hastings Monte Carlo simulator for a two-dimensional Heisenberg spin lattice to explore magnetic skyrmion formation. The project implemented core energy terms including Zeeman, uniaxial anisotropy, exchange, and Dzyaloshinskii–Moriya interactions, alongside spin-lattice visualization, code optimization, documentation, and packaging. The final workflow used simulation-driven energy minimization to search for equilibrium spin configurations and investigate whether a magnetic skyrmion could emerge.

Skyrmion Impossible: A Monte Carlo Quest for a Magnetic Quasi-Particle

Coursework Project

Numerical Programming, Advised by Dr. Marijan Beg

Oct 2024

Liquid-Cooled Underwater Thruster

Entrepreneurial Project National First Prize

China-US Young Maker Competition

Aug 2024

An entrepreneurial project focused on a liquid-cooled underwater thruster for diving and underwater imaging scenarios. The project proposes a self-developed split-type liquid-cooled motor design that removes dynamic shaft sealing to improve waterproof reliability and diving depth, while using active liquid cooling to reduce motor temperature by over 35% and increase effective payload by 30%. The product targets recreational divers, professional divers, and underwater photographers, and won the National First Prize in the China-US Young Maker Competition.

Liquid-Cooled Underwater Thruster

Entrepreneurial Project National First Prize

China-US Young Maker Competition

Aug 2024

Efficient Privacy-Preserving Arrhythmia Detection Method: A Decentralized Federated Learning Approach

Bachelor's Thesis Best Paper Award

Undergraduate Program

May 2024

Cardiovascular disease is a leading cause of mortality globally, making precise and efficient diagnosis crucial. Electrocardiography (ECG) serves as a conventional diagnostic tool for cardiovascular diseases, capable of detecting various cardiac rhythm abnormalities. However, relying solely on physicians for diagnosis is complex and time-consuming. Therefore, computer-aided diagnosis, utilizing machine learning algorithms to preliminarily identify cardiac rhythm abnormalities from ECG data, holds substantial practical significance. While convolutional neural network (CNN) models have demonstrated efficacy in ECG recognition, challenges persist due to insufficient local data in healthcare institutions and privacy concerns prohibiting data sharing across institutions. Federated learning, a model where clients can jointly train global models locally without sharing raw data, presents a solution to the above issue. Nevertheless, traditional centralized federated learning still faces risks of inefficiency and unreliability. This paper introduces an efficient privacy-preserving arrhythmia detection method based on decentralized federated learning, addressing the reliance on a central server in centralized federated learning for arrhythmia detection. The results of our simulation experiments on real-world datasets demonstrate the method's high level of detection accuracy and efficiency, with detection accuracy exceeding 95% and training efficiency significantly surpassing that of centralized federated learning under equivalent parameter settings.

Efficient Privacy-Preserving Arrhythmia Detection Method: A Decentralized Federated Learning Approach

Bachelor's Thesis Best Paper Award

Undergraduate Program

May 2024

Neural Network Regression for Lung Cancer Mortality Prediction

Course Paper

Machine Learning

Jan 2024

Lung cancer mortality has been rising steadily and has become a major public health concern. In research on lung cancer, the relationship between environmental quality and lung cancer has received increasing attention. This paper uses a neural network regression algorithm to predict lung cancer mortality under environmental quality factors and compares its performance with other baseline methods. It also examines the relative impact of different air pollutants on lung cancer mortality. Experiments on a real-world dataset show that the neural network regression algorithm outperforms other methods in this task, and that PM2.5 has the greatest influence on lung cancer mortality among the environmental factors considered.

Neural Network Regression for Lung Cancer Mortality Prediction

Course Paper

Machine Learning

Jan 2024

2023

Marketing Analysis in Macau Based on Face Detection and Eye Tracking

Course Paper

Data Analytics for Smart Cities

Oct 2023

With the continuous development of information technology, marketing strategies have become increasingly diverse. In Macau, as a tourism-oriented city, travelers’ and users’ responses to different promotional strategies can strongly influence advertising effectiveness. This course paper focuses on advertisement strategy analysis and presents a system based on face detection and eye tracking to evaluate user attention from the consumer perspective. The project includes an MTCNN-based face detection method, compared against the YOLO series of object detectors, and a real-time eye-tracking method based on pupil localization with geometric shape features under a 2D mapping model. The results show that the proposed approach achieves better accuracy and generalizability than the compared methods.

Marketing Analysis in Macau Based on Face Detection and Eye Tracking

Course Paper

Data Analytics for Smart Cities

Oct 2023

2022

Predicting Tourist Volume from Internet Popularity: A Comparative Study of Forecasting Models

Course Paper

Big Data Analytics for Tourism Industry

May 2022

This paper investigates the relationship between provincial internet popularity and local tourist volume by using internet popularity data from 2011 to 2018 to predict tourist arrivals in the following year. It provides a quantitative comparison of linear regression, logistic regression, and multilayer perceptron models for this forecasting task, and further compares these approaches with time-series forecasting methods. The results show that internet popularity is correlated with tourist volume, and that with a sufficient amount of data, the multilayer perceptron performs slightly better than linear regression for this type of prediction problem.

Predicting Tourist Volume from Internet Popularity: A Comparative Study of Forecasting Models

Course Paper

Big Data Analytics for Tourism Industry

May 2022

Friend Recommendation and Data Analysis Using MapReduce

Course Paper

Big Data Processing and Applications

Apr 2022

The current friend recommendation systems in social networks are still at an early stage of development. Most existing approaches rely primarily on the number of mutual friends between users, which often leads to limited accuracy and a narrow recommendation range. This paper studies a basic friend recommendation algorithm based on mutual-friend counts. Using the original dataset model, it applies MapReduce to process, aggregate, and organize the dataset, extract and count mutual friends, and rank the results accordingly. In addition, a monkey algorithm is introduced to optimize recommendation accuracy and expand the recommendation range. The proposed method simulates real interpersonal relationship networks and achieves higher accuracy together with broader recommendation coverage.

Friend Recommendation and Data Analysis Using MapReduce

Course Paper

Big Data Processing and Applications

Apr 2022