top of page

Projects

MTet: Multi-domain English-Vietnamese Translation

MTet is a project founded by VietAI to curate high-quality data and train a state-of-the-art English-Vietnamese translation model. I worked as a data collector on this project, responsible for collecting text on 30,000 English-Vietnamese medical research journals using Python, ChromeDriver, and BeautifulSoup4. I also implemented quality control through comprehensive testing and validation methods.

​

The MTet project has made significant progress in improving the quality of English-Vietnamese translation. The team has released a new dataset of 4.2M examples across 11 domains, and the MTet model has achieved state-of-the-art results on multiple test sets.

​

Skills: Data collection & cleaning, Web scraping, Python, ChromeDriver, BeautifulSoup4, Selenium

Melbourne Housing Market Analysis - Unveiling Trends

This project investigated the Melbourne housing market using a dataset of property listings. The goal was to understand factors influencing housing prices and predict their values, while also exploring neighborhood trends.

​

Key Findings:

  • Identified significant predictors of housing prices: number of rooms, distance from CBD, year built, number of bathrooms, building area, seller, property type, and suburb.

  • Developed a linear regression model with an adjusted R-squared of 0.733, indicating good explanatory power.

  • Built a logistic regression model to predict properties with private pools, achieving an AUC score of 0.73.

  • Analyzed the relationship between suburbs and housing prices.

  • Compared mean housing prices between the eastern and western suburbs of Melbourne University

​

Skills: Data analysis and visualization, Linear and logistic regression, Feature selection, Model evaluation and interpretation, Python

comparision_pretrained_and_ours.png

Source: VietAI

image.png
image.png

(*) Site updating is in progress.

bottom of page