by Sevara Nastritdinova
mentor: Fiorenzo Omenetto, Biomedical Engineering; funding source: Gatof Summer Scholars Fund
As part of the Summer Scholars 2020 Program, I worked in Dr. Fiorenzo Omenetto’s lab assisting the development of a mobile system used to quantitatively analyze the pH levels of sweat indicated by color changes in a wearable patch. My role involved analyzing the Mean Absolute Error in two machine learning models: Multilayer Perception (MLP) and Ridge Regression. Given that I possessed no knowledge in Advanced Statistical Analysis nor Machine Learning, my journey encompassed not only getting numerical results for my mini-project, but also gaining tangible knowledge in the afore-mentioned fields.
The first step of my plan required me to study machine learning in more depth. However, before that it was necessary to at least learn Introduction to Python programming to understand some of the syntax used by authors on the web. Therefore, I took and completed a course, which also taught me web-scaping and basic statistical analysis.
Next, I was confused by the names and approaches used by different models used in machine learning. What is more, I did not know how machine learning worked! Michael Pine, a graduate Tufts Computer Scientist, held an educational session, where he explained the concept and procedure. He taught Ordinary Least Squares Regression, Support Vector Machines, Random Forest and Neural Networks as well as the pros and cons of each. The lecture also discussed evaluation methods, including Mean Absolute Error and Mean Square Error.
However, given the expansive nature of the overall project, I started to focus on Principal Component Analysis trying to find the Optimal Principal Component. I discussed the topic with Michael, who reassured that the results had already been produced. Hence, I would only need to confirm them. However, once again I did not know what Principal Component Analysis was. Consequently, I took an Advanced Statistical Analysis course on Coursera to get an idea of the concept. Although I did learn the topic and many more, I was not able to apply my knowledge using Python syntax because the Introduction to Python did not cover that!
Therefore, given my very limited knowledge in the area, I personally decided to cover more material and calculate the MAE for two of the models used in analysis: the Neural Network MLP and Ordinary Least Squares Regression model – Ridge Regression. One, at least I knew the concepts from Michael’s mini-lecture. Two, I knew what Mean Absolute Error was from my Engineering Mathematics classes. Thankfully, Michael’s team had already obtained the results for that too. After a couple weeks of struggling with the syntax, I obtained the results for Mean Absolute Error for both training and validation data by referring to Michael’s results and Python notebook. My values matched Michael’s.
To conclude, this summer was one of the most challenging learning experiences. It was not necessarily my lack of knowledge, but deficiency in in-person teaching or mentoring that limited the opportunities for my self-learning of various new concepts. I could have reached out to my team and Principal Investigator more frequently to receive guidance, but my idea of a self-starting and independent researcher pacified my pro-activeness. Regardless, I gained substantial knowledge in Statistics and gained exposure to the Machine Learning field as I am able to understand some of the main concepts discussed by computer scientists in the field.