Prakhar Agrawal
3+ of articles published
Knowledge Leader / Conceptual Explorer / Thought Leader
Domain:
upGrad
Current role in the industry:
AVP, Data Science / Machine Learning at EXL
Educational Qualification:
Master of Science (MS) in Data Science from the University of San Francisco (2018 - 2019)
Certifications:
Instructional Design for E-Learning (Udemy)
FRM Level I (Global Association of Risk Professionals)
Fundamentals of Deep Learning
About
Prakhar is a senior content strategist at UpGrad. He is the co-lead for the team developing content for the DS and ML/AI programs. Also, he is the lead for the team that prepares visualizations for all of UpGrad's programs and performs language reviews for them.
Published
Most Popular
30472
Most Common Examples of Data Mining
Talk about extracting knowledge from large datasets, talk about data mining! Data mining, knowledge discovery, or predictive analysis – all of these terms mean one and the same. Broken down into simpler words, these terms refer to a set of techniques for discovering patterns in a large dataset. These patterns help in creating a predictive model to stay on top of the future behaviours in data science. Today, most of the organisations – irrespective of their domain – are looking to capitalize on their Big Data and are hence using sophisticated analytical methods. As the consumption of Big Data grew, so did the need for data mining. Today, we can see examples of data mining everywhere around us. Let’s look at some such examples of Data Mining that you come across frequently in your day-to-day life: Artificial Intelligence and Machine Learning Both Artificial Intelligence and Machine Learning are gaining a lot of relevance in the world today, and the credit goes to Data Mining. How else do you make a system “artificially intelligent” without feeding it with relevant data and patterns? And, how do you extract relevant patterns if not by Data Mining? One of the most common examples of AI and Machine Learning that you most likely come across every day is the beloved recommendation systems. Has it ever happened that after buying a product from Amazon, you’re shown a list of recommended products, and you end up buying one of those in a blink of an eye? How did Amazon accomplish this? By thoroughly studying and analyzing your past data and behaviours. Using your behavioural trends, Amazon can categorise products depending on the probability of your purchasing the product. While Amazon and other e-commerce websites use AI to show product recommendations, video and music streaming platforms like Spotify and Netflix use the same to better curate your playlists. The examples mentioned above use Artificial Intelligence on top of the mined data. However, reverse usage is also possible, i.e., you can develop theories and then use data mining to strengthen your theory. For example, if a self-driving car sees a red Maruti overspeeding by twice the speed limit, it might develop a theory that all red Marutis over speed. This AI can then use Data Mining methods to strengthen or weaken the theory. Who is a Data Scientist, a Data Analyst and a Data Engineer? Service Providers Service providers have been using Data Mining to retain customers for a very long time now. Using the techniques of Business Intelligence and Data Mining allows these service providers to predict the “churn” – a term used for when a customer leaves them for another service provider. Today, every service provider has terabytes of data on their customers. This data includes things like your billing information, customer services interactions, website visits, and such. Using mining and analysis of this data, the service providers assign a probability score to each customer. This probability score is a reflection of how likely you are of switching the vendors. Then, these companies target the people at a higher risk by providing incentives and personalised attention, to retain the customers. Key Concepts of Data Warehousing: An Overview Supermarkets and Retail Stores Data mining allows the supermarket owners to know your choices and preferences even better than yourself. If you don’t believe us, you’ll be amazed by what Target did a few years back. Following the purchase history and behaviours of one of their female customers, Target correctly concluded that she is pregnant. Oh, and let’s tell you – this was even before the woman herself knew. Such is the power of data, patterns, and analysis. Read: Data Mining Projects in India In general, these retail stores divide the customers into what they call “recency, frequency, monetary” (RFM) groups and specific groups with different campaigns and strategies. So, a customer who spends a lot but infrequently will be dealt differently than a customer who spends little but often. The latter kind may receive loyalty, upsell, or cross-sell offers, whereas the former might be offered a win-back deal, just for instance. Data Visualisation: The What, The Why, and The How! Science, Engineering, and Education: The areas of science and engineering have seen a massive overhaul ever since the application of data mining techniques. Let’s look at some specific fields that make use of Data Mining techniques: Sequence mining finds extensive use in the study of human genetics. It helps in understanding the relationship between the variations in DNA sequence and the variability in susceptibility to diseases. Simply put, it aims to find out how the changes in DNA correspond to the risk of developing common diseases, which will aid significantly in improving methods of diagnosing, preventing, and treating these diseases. Explore our Popular Data Science Courses Executive Post Graduate Programme in Data Science from IIITB Professional Certificate Program in Data Science for Business Decision Making Master of Science in Data Science from University of Arizona Advanced Certificate Programme in Data Science from IIITB Professional Certificate Program in Data Science and Business Analytics from University of Maryland Data Science Courses Data mining is used in the field of educational research to understand the factors leading students to engage in behaviours which reduce their learning and efficiency. In the area of electrical power engineering, data mining methods have been widely used for performing condition monitoring on high voltage electrical equipment. The aim of this is to obtain valuable information on various safety-related parameters like the status of insulation, and such, to avoid any mishaps. The What’s What of Data Warehousing and Data Mining upGrad’s Exclusive Data Science Webinar for you – Watch our Webinar on The Future of Consumer Data in an Open Data Economy document.createElement('video'); https://cdn.upgrad.com/blog/sashi-edupuganti.mp4 Crime Prevention Agencies: The use of Data Mining and Analytics is not just restricted to corporate applications or education and technology, and the last example on this list goes to prove the same. Beyond corporate organisations, crime prevention agencies also use data analytics to spot trends across myriads of data. This data includes information including details of all the major criminal activities that have happened. Mining this data and thoroughly studying and understanding patterns and trends allows these crime prevention agencies to predict the future events with much better accuracy. With the help of Data Mining and analytics, these agencies can find out everything from where to deploy maximum police manpower (where is the next crime most likely to happen and when?), who to search at a border crossing (based on type or age of the vehicle, number or age of occupants, or border crossing history), to even which intelligence to take seriously in counter-terrorism activities. Data Manipulation: How Can You Spot Data Lies? Top Data Science Skills to Learn Top Data Science Skills to Learn 1 Data Analysis Course Inferential Statistics Courses 2 Hypothesis Testing Programs Logistic Regression Courses 3 Linear Regression Courses Linear Algebra for Analysis What we’ve discussed above are just a few of the many examples of Data Mining. If this article has left you fascinated and wanting for more, we recommend you dive deeper into concepts like data mining, data analytics, business intelligence, and artificial intelligence. This will broaden your knowledge-base, and also help you make a more informed career choice – if you’re looking to jump ships to Data. Business Intelligence is the present and the future and Data Mining forms the base of everything to quite an extent. So, make sure you’re thorough with your basics of the same if you’re looking for a rewarding and a fulfilling career! If you are curious to learn about data science, check out IIIT-B & upGrad’s Executive PG Programme in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
29 Mar 2018
9636
A Beginner’s Guide to Data Science and Its Applications
The words Data, Science, or Data Science are not enough to incite a feeling of fear or dread among the readers. To be honest, they’re too cute to be even off-putting, let alone horrid, unlike the words – tessellation, k-mean, k-nearest neighbors, Euclidean Minimum Spanning Tree, and more of this sort – words that you’ll encounter on your journey of Data Science. While “Data Science” doesn’t inspire fear, it also doesn’t explain anything about the field. Everybody knows what data is; at least in a layman sense. Data is essentially just raw bits of information. Science, on the other hand, can be used to mean any group of activities following a scientific method. So, going by this logic, we can conclude that Data Science is a field that uses scientific methods on large chunks of data. But, for what? And what exactly is Data Science? That’s our topic for discussion today. After reading this article, you’ll be able to answer the following questions: What is Data Science? What are the different phases of a Data Science pipeline? Where can I see Data Science at work? What is Data Science? Wikipedia, the mother of all encyclopedias, defines Data Science as a field focused on extracting knowledge and insights from data by using scientific methods. However, what it doesn’t tell you, is that we humans are born data scientists. How? Let’s see. You’re observing the world around you no matter what you’re doing. At every waking moment, you’re taking in details from your surroundings and feeding it to your brain. You then process these observations into data and use it to understand things around you by finding out meanings and make predictions of what is likely to happen next. Learn Data Science Courses online at upGrad When you’re late to leave for work by an hour, you call in to tell them you’ll be working from home. You’re using your past observations of traffic and stoppages on the way that make you conclude that you’re likely to lose your time stuck in traffic than you’d gain by being in office. When you come into your room and see chocolate wrappers lying around, a casual analysis will tell you that someone’s been eating your chocolates in your absence. Top 4 Data Analytics Roles To Look Out For In either of the mentioned cases, if you do these calculations and predictions in your mind, without noting it down, you’re a normal human being. On the other hand, if you go ahead and record these data points (of course in a machine-readable format) and then try to devise an algorithm (or, procedures) and computer programs to run the application. If the output of this “hypothetical” system is that “the traffic is going to suck”, or “your roommates ate your chocolates”, then bingo! You’re a data scientist. It’s just as simple (in theory) as the above analogy makes it sound. At the end of the day, you have data, procedures, algorithms, and tools. You just need to extract knowledge from it. To do that efficiently, there’s a workflow/pipeline you must follow. Let’s see what all is included in a typical Data Science Pipeline. Data Science Pipeline Data science pipeline talks about the flow of the entire process – from obtaining the desired data to make accurate calculations and predictions. Let’s have a look at the elements of this pipeline: Obtain Your Data This is by default the first thing you need to do to practice Data Science – get the data! Just a little heads-up – there are some things you must take into consideration while obtaining your data. You must first identify all of your datasets (can be from the internet or internal/external databases). You should then extract the data into a usable format (CSV, XML, JSON, etc.) Here are Top Skills & Tools to Master to Be a Data Analyst Skills Required Database Management: Either SQL or NoSQL, depending on your needs and requirements. Querying these databases Retrieving unstructured data in the form of videos, audios, texts, documents, etc. Distributed storage: Hadoop, Apache Spark, or Apache Flink. Explore our Popular Data Science Certifications Executive Post Graduate Programme in Data Science from IIITB Professional Certificate Program in Data Science for Business Decision Making Master of Science in Data Science from University of Arizona Advanced Certificate Programme in Data Science from IIITB Professional Certificate Program in Data Science and Business Analytics from University of Maryland Data Science Certifications Scrubbing / Cleaning Your Data Cleaning of the data should be given utmost importance because the final output of your system is only as good as the data you put into it. Cleaning refers to removing anomalies, filling in empty/missing values, seeing if the data is consistent, and other things of this nature. Skills Required Scripting language: Python, R, SAS Data wrangling tools: Python Pandas, R Distributed processing: Hadoop, MapReduce/Spark Exploring (Exploratory Data Analysis) Now that the data is clean, you will begin to understand what patterns your data has. Different types of visualizations and statistical modelings come into use in this phase. Basically, this phase aims to derive the hidden meaning from our data. There’s a lot that goes around in the field of Exploratory Data Analysis. If you feel it’s something you’d enjoy, don’t forget to read our article on the same. To perform better in this phase, you need to have your “spidey senses” tingling. Go crazy and spot weird patterns or trends – always be on the lookout for something out of the box. However, while doing that, don’t forget the problem you’re aiming to solve. Don’t go too much out of the box. Exploratory data analysis is an art, and an artist should always keep the audience in mind. Skills Required Python libraries: Numpy, Matplotlib, Pandas, Scipy R libraries: GGplot2, Dplyr Inferential statistics Data Visualisation Experimental design Top Steps to Mastering Data Science, Trust Me I’ve Tried Them! Modeling (Machine Learning) This is the fun part. Models are simply general rules in a statistical sense. A machine learning model is simply a tool in your toolkit. You have access to so many algorithms with different use-cases and objectives that simple research will lead you to an algorithm that fits your business needs. After cleaning the data and finding out the essential features (in the EDA phase), using a statistical model as a predictive tool will enhance your overall decision making. Instead of looking back to see “what happened?”, predictive analytics aims to answer “what next?” and “how should we go about it?”. Skills Required Machine Learning: Supervised/Unsupervised/Reinforcement learning algorithms Evaluation methods Machine Learning Libraries: Python (Sci-kit Learn) / R (CARET) Linear algebra & Multivariate Calculus Read our popular Data Science Articles Data Science Career Path: A Comprehensive Career Guide Data Science Career Growth: The Future of Work is here Why is Data Science Important? 8 Ways Data Science Brings Value to the Business Relevance of Data Science for Managers The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have Top 6 Reasons Why You Should Become a Data Scientist A Day in the Life of Data Scientist: What do they do? Myth Busted: Data Science doesn’t need Coding Business Intelligence vs Data Science: What are the differences? upGrad’s Exclusive Data Science Webinar for you – Watch our Webinar on The Future of Consumer Data in an Open Data Economy document.createElement('video'); https://cdn.upgrad.com/blog/sashi-edupuganti.mp4 Interpreting (Data Storytelling) This is one of the more challenging tasks in the pipeline. Here, you aim to explain your findings through communication. At the end of the day, it’s all about connecting with your audience – and that is what makes storytelling a key. Your findings are hardly useful if you are not able to convey its significance to the non-tech bunch at your office, or even your boss, for that matter. A good practice to get things in control would be to rehearse a lot. Try framing a story on your findings and telling it to a layman (preferably a kid). If they understand it, so will your boss. And if they don’t, well, you know what Einstein said: “If you can’t explain it to a six-year-old, you don’t understand it yourself.” This phase aims to derive true business insights. Your main challenge here is to visualize your findings and display them in a beautiful and understandable way. Top Data Science Skills to Learn SL. No Top Data Science Skills to Learn 1 Data Analysis Programs Inferential Statistics Programs 2 Hypothesis Testing Programs Logistic Regression Programs 3 Linear Regression Programs Linear Algebra for Analysis Programs Skills Required Knowledge of your business domain Data Visualisation tools: Tableau, D3.JS, Matplotlib, GGplot, Seaborn, etc. Communication: Presentation skills – both verbal and written. This isn’t the end of our pipeline. If you’re to truly bring the best out of your system, you need to make sure you’re updating your model as and when the needs arise. In Data Science, one size does not fit all, and you’ll need to keep revisiting and updating your model. Data Manipulation: How Can You Spot Data Lies? Applications of Data Science As it is clear by now, Data Science is a broad term, and so are its applications. Almost every application on your smartphone thrives on data. So, it’s only fair to say that it’s practically impossible to list down all the applications of data science because of its sheer omnipresence. Let’s have a look at the broad fields that are using the magic of Data Science: 1. Internet Search How does Google return such *accurate* search results within a fraction of a second? Data Science! 2. Recommendation Systems From “people you may know” on Facebook or LinkedIn to “people who’ve bought this product also liked…” on Amazon to your daily curated playlists on Spotify to even “suggested videos” on YouTube, everything is fueled by Data Science. 3. Image/Speech/Character Recognition This pretty much goes without saying. What do you think is the brain behind “Siri”, if not Data Science? Also, how do you think Facebook recognizes your friend when you upload a photo with them? It’s not magic; it’s science – Data Science. 4. Gaming EA Sports, Sony, Nintendo, Zynga, and other giants in this domain have taken it upon themselves to take your gaming experience to an altogether new level. Games are now developed and improved using Machine Learning algorithms so that they can upgrade as you move up to higher levels. 5. Price Comparison Websites These websites are fueled by data. For them, the more the merrier. The data is fetched from the relevant websites using APIs. PriceGrabber, PriceRunner, Junglee, Shopzilla are some such websites. Get Started in Data Science with Python Wrapping Up… If you’re from a tech background and have a little something for data, then Data Science is your true calling. The best part? There’s so much to do and explore in and around Data Science. It’s an umbrella term that covers a number of tools and technologies – mastering any one of which will make you an asset in the ever-increasing market of Data Science. upGrad offers various courses on Data Science to keep you ahead of the curve. Don’t forget to check them out!
23 Feb 2018
6419
Data Visualisation: The What, The Why, and The How!
In this article, we’ll walk you through the world of Data Visualisation. We’ll begin by understanding what is Data Visualisation, after which we’ll see the actual need of DV tools and some of the common Data Visualisations used in practice today. Going further, we’ll talk about the essential tools you must be aware of if you’re setting foot in the world of Visualisation of Big Data. But before we get to that, let’s get you to understand the importance of Data Visualisation using a very common example. Take a look at the images below: Which of the above two arrangements makes it easier for you to browse through all the books quickly and efficiently? The second one, isn’t it? That’s the power of visualisation. Now, think a step further. In our example, we were just looking at a handful of books. In the real world, on the other hand, the problem of visualisation is HUGE. There’s so much data with the organisations at present that it’s impossible to make sense of it without proper representation of it all. That’s exactly where Data Visualisation and its tools come in! By now you’ve understood what exactly is Data Visualisation. Yet, for the sake of a formal definition, here it goes: Data Visualisation is, quite simply, the process of converting huge datasets into concise and unequivocal patterns and shapes (graphs, charts, scatter plots, and such things) to make it easier for people to understand it. Data Visualisation can be carried out in many modes, depending on the requirement. Some of them are – Graphs, Columns, Venn Diagrams, Pie charts, Network/Colour Maps, Trees, Frequency Polygons, Box-and-whisker plots; Line, Surface, and Volume Scatter Plots and so on. Data Visualisation: Need of the hour! Now that we know what Data Visualisation is, let’s try and understand why it is the “need of the hour”. We’ve understood it helps organisations get insights into their data – now, let’s see how! Helps the organisation absorb data quickly: Your Big Data will look gibberish to your organisation if you don’t present it in a concise and understandable way. As you know, a picture is worth a thousand words – or, in this case, worth a gazillion lines of data. Presentable display of data will help all the verticals of your organisation understand the data with utmost ease. That, in turn, will allow them to absorb the data better – without having to spend a lot of time on it. Helps you plan your next steps better: Think of DV as solving a jigsaw puzzle. If you have a thousand puzzle pieces, it’s quite a task to get going with arranging the pieces. But once you have even half of your pieces in place, you can easily figure out the next steps. Likewise, from these visual trends, you can easily figure out your next best steps without wasting too much time or energy on data analysis. You can save a lot of time and money by looking at the big picture, instead of trying to look at a thousand puzzle pieces. Get Started in Data Science with Python Get your audience interested in your data: Nowadays, people have the attention span shorter than that of a goldfish. Keeping that in mind, it’s important for you to present your audience with something that they can grasp quickly – even with a cursory glance. Converting your data into graphics engages your audience as they now feel in control of the situation as they can understand the representation as opposed to understanding the whole datasets – “Graphs? That sounds good!” Find the outliers in your dataset: This is probably the most important use case of Data Visualisation. It helps you quickly find out the outliers, if any, in your datasets. If you get down to imagining, you’ll realise this is indeed a challenge without proper visualisation. Outliers tend to drag down data the averages in the wrong direction, so, it’s essential to find and eliminate them from your analysis when they skew the results. Graphics always make it easier to understand the presence of an outlier and take any required steps against it. Act quickly on your findings: Visualisation of data in the form of graphics helps you in making much faster decisions. By using Data Visualisations, you can review your strategies, make updates, and achieve success – all of this without wasting a lot of time and energy. Analyzing the graphical representation of any dataset will allow you to act better on your findings as compared to analyzing the whole dataset. Exploratory Data Analysis and its Importance to Your Business Ten Data Visualisation Tools You Should Master QlikView QlikView markets itself as a “business discovery platform”. Its ability to process data in-memory makes it a perfect tool for quick-and-dirty processing of data. Talking about sources, QlikView can read data from almost any source – from CSV files to SQL databases. It also performs data integration (combination of data from various sources) and generates composite data sources for better analysis. QlikView targets businesses that are looking to get deeper insights on the data generated by their endeavours. Explore our Popular Data Science Certifications Executive Post Graduate Programme in Data Science from IIITB Professional Certificate Program in Data Science for Business Decision Making Master of Science in Data Science from University of Arizona Advanced Certificate Programme in Data Science from IIITB Professional Certificate Program in Data Science and Business Analytics from University of Maryland Data Science Certifications Our learners also read: Top Python Courses for Free upGrad’s Exclusive Data Science Webinar for you – Watch our Webinar on The Future of Consumer Data in an Open Data Economy document.createElement('video'); https://cdn.upgrad.com/blog/sashi-edupuganti.mp4 Tableau Tableau, too, is a business intelligence tool for visual ananlysis of data. It allows users to create and distribute a very intuitive dashboard which depicts all the variations, trends, and density of the data in for on charts or graphs. Tableau can read data from files, relational databases, and Big Data sources. Its unique feature is that it allows real-time collaboration. It’s put to use by academic researchers, businesses, and many government organisations. Wolfram Alpha You can’t talk about numbers, statistics, and visualisations without mentioning Wolfram Alpha. It is an open source statistics search/calculation engine which can also produce beautiful, informative, and customizable representations in the form of charts and graphs If you are using publicly available data in your analysis, the charts generated can very easily be uploaded to your website using widgets. The What’s What of Data Warehousing and Data Mining Top Data Science Skills to Learn SL. No Top Data Science Skills to Learn 1 Data Analysis Programs Inferential Statistics Programs 2 Hypothesis Testing Programs Logistic Regression Programs 3 Linear Regression Programs Linear Algebra for Analysis Programs MS-Excel We often forget the old warhorses in our search for specific tools. How can we talk about data visualisation and not mention the classic MS-Excel? Chances are, you’ve had some experience with Excel, irrespective of your background. Excel has stood the harsh test of time and is still extensively used. You must be aware of the famous Spreadsheet visualisation. Excel can turn out to be quite a powerful tool – almost as powerful as the other mentions, if the requirements don’t go beyond the basics. However, a major drawback of Excel is that customised data visualisation is difficult, thus it’s a bad candidate for work that has specific requirements. Read our popular Data Science Articles Data Science Career Path: A Comprehensive Career Guide Data Science Career Growth: The Future of Work is here Why is Data Science Important? 8 Ways Data Science Brings Value to the Business Relevance of Data Science for Managers The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have Top 6 Reasons Why You Should Become a Data Scientist A Day in the Life of Data Scientist: What do they do? Myth Busted: Data Science doesn’t need Coding Business Intelligence vs Data Science: What are the differences? CartoDB All the other tools in this list talk primarily about processing quantitative data. Now suppose you have to integrate this data with maps? CartoDB is the tool you need. It allows seamless integration of data in tabular form with maps. To see the magic, you can upload a CSV file containing a list of addresses to CartoDB and it’ll convert them to latitudes and longitudes and plot them on a map. The only disadvantage is that you need to pay for it after using it for 5 times. Apart from the tools mentioned above, there are some other tools, too, that deserve a mention: MatPlotLib: It is a multi-platform library built for Data Visualisation using Python. ChartBlocks: ChartBlocks is a web app that lets you create beautiful, customizable, and shareable charts – you can also download them as vector graphics. Charted: Charted automatically builds beautiful charts, you just need to provide it with the link to your data file. D3.JS: It is a Javascript library that helps you build visualisations using HTML and CSS. Dygraphs: It is a fast, open-sourced Data Visualisation library provided by Javascript. Top Steps to Mastering Data Science, Trust Me I’ve Tried Them Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career. Wrapping Up… If you were paying attention, you’d have realised that data visualisation is by no means a “new” technology. We’ve been doing it for ages – take the example of a 2-D cartesian plane, or the 3-D coordinate system, which is a visualisation of data as well. It’s just that businesses are waking up to the need for Data Visualisation in context of Big Data Analytics. So, if you’re looking to begin a career in Big Data, mastering Data Visualisation is sure to take you a long way! check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
23 Feb 2018