Projects
MTet: Multi-domain English-Vietnamese Translation
MTet is a project founded by VietAI to curate high-quality data and train a state-of-the-art English-Vietnamese translation model. I worked as a data collector on this project, responsible for collecting text on 30,000 English-Vietnamese medical research journals using Python, ChromeDriver, and BeautifulSoup4. I also implemented quality control through comprehensive testing and validation methods.
​
The MTet project has made significant progress in improving the quality of English-Vietnamese translation. The team has released a new dataset of 4.2M examples across 11 domains, and the MTet model has achieved state-of-the-art results on multiple test sets.
​
Skills: Data collection & cleaning, Web scraping, Python, ChromeDriver, BeautifulSoup4, Selenium
Melbourne Housing Market Analysis - Unveiling Trends
This project investigated the Melbourne housing market using a dataset of property listings. The goal was to understand factors influencing housing prices and predict their values, while also exploring neighborhood trends.
​
Key Findings:
-
Identified significant predictors of housing prices: number of rooms, distance from CBD, year built, number of bathrooms, building area, seller, property type, and suburb.
-
Developed a linear regression model with an adjusted R-squared of 0.733, indicating good explanatory power.
-
Built a logistic regression model to predict properties with private pools, achieving an AUC score of 0.73.
-
Analyzed the relationship between suburbs and housing prices.
-
Compared mean housing prices between the eastern and western suburbs of Melbourne University
​
Skills: Data analysis and visualization, Linear and logistic regression, Feature selection, Model evaluation and interpretation, Python
(*) Site updating is in progress.