Hello ! Hola ! नमस्ते !

     

About me

Hello! My name is Shubham Dubey. I am working as Senior Systems Engineer for Infosys limited, I graduated as Bachelor of Technology in Computer Science & Engineering from Srm University, Delhi-NCR.

Jack inspire me the most, hence I am a person of multiple discipline. I have experience in Development as well as Data-Research. Creativity and Failure are my best friend but Learning help me keep going. Skill based and Goal-Oriented are few of my selling traits. I find myself standing at intersection of Data, Design and Product.

This is me - shubham

Experience

My work

I have worked on dozens of projects so I have picked only the latest for you.

×

Movie Review Analysis

Use BigData Tool and Technologies to analyse Movies.

TechStack : Hive,HDFS,Python,Hive Thrift Server

The Internet Movie Database (IMDb) is one of the world’s most popular sources for movie, TV and celebrity content with more than 100 million unique visitors per month. IMDb has huge collection of movies database that includes various details of movies along with different ratings and user reviews. This movie reviews affects everyone from audience, film critics to the production company. The idea of project is to analyse the movie and ratings dataset (source: https://www.simplilearn.com/) using big data technologies.

View Code

Log Analysis Using Spark

To analyse server logs with over 1 mil records using Apache spark

TechStack : Spark,DataBricks,Python,DataBricks NoteBook

The Project Log-Analysis using Pyspark focuses on learning the basics of Apache Spark, The data is captured from EDGAR Log File Data Set, which provides structured log data. The data set contains around 1 lacs records, with 14 columns {IP,Date,Time,Zone,CIK,Accession,Extension,Code,Size,Index,Noref,NoAgeent,Find,Crawler}. The analysis is mainly done on Size, File type, Status Code and IP. Through this project we have following learning:

  • Working with pySpark, internal working of spark.sql
  • Working with DataFrame
  • Handling different file format linke CSV and JSON
  • Visualizing data using matplotlib, one of the popular ploting library
  • View Code

    Movie Recommender System

    Used concept of correlation for building a recommender system.

    It is a python based project that helps us to get related movie based on the movie you previously watched. The movie are suggested based on the review giveing by other users. We used Pearson Correlation for calculating the relation between two number. If the movies are related it gives a positive result if not its gives negative or else its give zero as a result. Here is just an prototype with a fixed data set. But this could be usefull with tweeks to other set of data. For the purpose of showing the power of pearson correlation we used a fixed data set.

    View Code

    Fingerprint Authenticated Voting System (F.A.V.S)

    Use Fingerprint to caste vote.

    The project is to improve the voting system in India. People in India does not go for casting the vote.There could be different reason for the same, which lead to seletion of party not by majority but minority of those who gives time for casting vote. To improve the situation and increase the number of voters, This project can do wonders.

    I as the leader of team with my two collage mate, Introduce a new concept of using fingerprint of user for voting purpose. Basic idea was to develop an mobile application as well as an Electronic Voting Machine(with fingerprint scanner). The citizen with busy life can cast vote using the mobile application and one with no such faclity can go to the voting booth.

    View Code

    Education

    Bachelor Of Technology

    SRM University (2020)

    • Major :- Computer Science and Engineering


    Intermediate

    Sun Valley Internatinal School (2016)

    • Major :- Science

    Certification

    Big Data Hadoop and Spark Developer

    Simplilearn

    Introduction To Data Science in Python

    DataCamp

    Applied Machine Learning

    Coursera