Front end for GitHub repositories.
Developed a content-based “comparison shopping tool” car recommender using an approximate nearest neighbor algorithm. It is a tool to make
car hunting less of daunting endeavor for users. The shopper enters their favorite or budgeted-for car brand and model and the recommender
will provide cars with similar characteristics such as fuel economy, performance, cost and size.
Try it out for yourself by clicking on this link: Car Recommender StreamLit App
The CORD 19 research challenge dataset contains a large collection of literature on coronaviruses that are made avilable for data mining. Part of the data is 85,000+ PMC papers- the project aims to classify the papers according to their topics in alignment with WHO's Research and Development Blueprint COVID 19 Public Health Emergency of International Concern (PHEIC) research topics to tackle the spread of COVID-19. LDA is applied using Genisim's core estimation which is based on the onlineldavb.py script, by Hoffman, Blei, Bach: Online Learning for Latent Dirichlet Allocation and an estimation using an (optimized version of) collapsed Gibbs sampling from MALLET to model the topics and compare their performance.
Google's BERT pre-trained model is recognized for being one of the most powerful tools used for NLP. One incredibly useful task it can be fine-tuned to do is to classify whether a pair of passages have a high lexical overlap or not. This could be used in applications where a question-answering platform, such as stack-overflow, wishes to use machine learning to determine if the answers given to a question contain relevant context to answer it. In this project I used Tensorflow tutorial and adapted it to carry out the classification task on MS MACRO question answering dataset
The aim of this project is to Try different classification models on the provided by fake news datasets kaggle. These models can be used on various datasets for the same purpose. I have tried those same models on various similar datasets and have always scored high in terms of accuracy. The models that were used are: 1. Neural Network model with Bi-directional recurrent LSTM cell layer. Stanford's Glove-300d embeddings were used in the input layer. 2. Neural Network model with convolutional, max-pooling and recurrent/LSTM-cell layers. Stanford's Glove-300d embeddings were used in the input layer. 3. Pre-trained NNLM 128 Model (https://tfhub.dev/google/nnlm-en-dim128/2) embedding layer with dense classifier layer. 4. Naive Bayes.
The classifier determines what genre a movie is based on its script. Web-scraped scripts of movies from 3 different genres. * Romance * Horror * Fantasy These were chosen because it was deemed they would have the least overlap. Using Multinomial Naive Bayes model, accuracy acheived was 32% above the baseline model.
This project aims to assess the accuracy of a built-from-scratch CNN model and a ResNet50 pre-trained transfer learning CNN model in distinguishing leukemic B-lymphoblast cells from healthy B-lymphoid precursors. The images are microscopic images of blood smear samples for patients who are healthy and others who have been diagnosed with Leukemia.
CNN model to recognize images for either being that of dogs or cars.
The project aims to build Univariate models to forecast NASDAQ stock index’s price using Python programming language. The data and its source are first introduced and the need for such tool is explored. The target variable is set as the close stock price. The data is then is made stationary by differencing the log of the close stock price. Through time decomposition, it is determined that the data has a strong trend and weak seasonality. The data is then modeled using Average, Naïve, Simple Exponential Smoothening, Holt’s linear, Holt-Winter Seasonal, ARMA and ARIMA methods. At the end of experimenting with each model, the best one-step forecast is determined to be Holt Linear method and the best h-step predictor is ARIMA with order (3,1,4).
Created an ensemble decision tree classification algorithm that helps entrepreneurs seeking funds from crowdfunding website to predict whether their project will succeed or fail on Kickstarter. The users enter the project’s category (Film, Music, Fashion etc..), country where project is based in, dates of creation and deadline and the budget desired. I am currently in the process of apply the similar algorithm to gofundme.com and indiegogo.com order to create a comparison tool that helps users pick the right website for their project.
Predicts price of used cars for 43 brands for models between 2014-2018 using linear regression.