For working professionals
For fresh graduates
More
Anomaly detection problems have long been tackled using supervised learning algorithms. These algorithms include Random Forest Classifiers, Support Vector Machine (SVM), and Logistic Regressor. They are especially used in scenarios where data is clearly labeled.
One class SVM has emerged as an effective approach for anomaly detection. Unlike regular SVMs, which require labeled data from both classes, one class SVM just learns from the majority class, which usually represents normal cases. This novel method makes it ideal for situations where anomalies are uncommon, and labeled cases of anomalies are difficult to come by.
In this guide, you will learn all you need to know about one class SVM anomaly detection. We will discuss its applications, how it is implemented, and much more.
One-class support vector machines (OCSVM) are a powerful algorithm for anomaly detection. They operate on a unique principle that makes OCSVM capable of spotting outliers and anomalies in data. This makes it a valuable anomalous detection tool.
In this guide, we will understand one class SVM anomaly detection in detail.
First, let us discuss what a SVM is.
Support Vector Machines (SVMs) are one of the most popular supervised machine learning algorithms. Unlike OCSVMs, which specializes in anomaly detection by learning from a single class, SVM operates in a broader context. It excels in discerning patterns and making predictions based on labeled data. SVM is adaptable and is used for classification and regression tasks.
Let’s say you're working on a project to predict whether an email is spam or not. When you use SVM, you can train a model on labeled email data, teaching it to discern patterns that distinguish spam from legitimate messages.
One Class SVM is a specialized variant of SVM, tailored specifically for outlier, anomaly, or novelty detection. The objective behind adopting OCSVM is to identify instances that deviate significantly from the norm.
Examples of their use include the need to monitor network traffic for suspicious activity, or identify potential cyber threats within the data stream. Here, OCSVM helps to diligently flag deviations from normal patterns of behavior, such as unexpected spikes in data transfer or irregular access attempts.
Unlike traditional machine learning models geared toward classification tasks, OCSVM doesn't concern itself with categorizing data into multiple classes. Instead, its only concern is the identification of exceptions in a dataset. This enables you to highlight data misfits that could indicate security attacks or system errors.
Here are some key working principles of OCSVM:
OCSVM anomaly detection is a powerful and versatile tool for distinguishing typical patterns and irregular occurrences.
Let’s say you're overseeing a network infrastructure and are responsible for ensuring security and integrity. Among the myriad of data flowing through the systems, detecting anomalies—those unexpected deviations from normal behavior—becomes paramount. Here's where OCSVM steps in.
OCSVM focuses on the majority class, and represents normal instances, during training. This means you don't need many labeled anomalies to get started—a significant advantage in real-world scenarios where anomalies are sparse and obtaining labeled data is challenging.
So, how does OCSVM go about detecting anomalies?
Let's walk through the key steps of one class SVM anomaly detection:
1. Conceptual Foundation
OCSVM operates under the assumption that the majority of data in real-world scenarios is inherently normal. Anomalies, therefore, are rare deviations from these typical patterns. OCSVM's mission is to establish a boundary around these normal instances. This creates a familiar region within which most data points reside.
2. Outlier Boundary Definition
This boundary is called the "normalcy region." Positioned strategically, this boundary maximizes the margin around normal data points. This ensures a clear distinction between what is considered usual and an anomaly.
3. Margin Maximization
OCSVM helps to widen the gap between normal instances and the boundary. By maximizing this margin, OCSVM fortifies its ability to identify anomalies during testing. This helps create a safety buffer, and ensures that even the slightest deviations from the norm are readily apparent.
4. Training Process and Hyperparameter Tuning
During training, OCSVM exclusively focuses on learning from the majority class or normal instances. This unimodal approach sets it apart from traditional SVMs, which typically require examples from both classes. The introduction of the hyperparameter 'nu' adds another layer of adaptability. By adjusting 'nu,' you can fine-tune the model's sensitivity to anomalies, striking a balance between precision and flexibility.
5. Testing and Anomaly Identification
During testing, OCSVM, equipped with its learned normality region, serves as a watchful guardian. Any cases that go outside of the defined range are immediately reported as possible anomalies.
Let’s go through the process of implementing anomaly detection using OCSVM in Python. The example taken to explain this is detection of fraudulent credit card transactions, which leverages the power of one class SVM for anomaly detection within the dataset.
Dataset Overview
Before we dive into the implementation, let's take a brief look at the dataset we'll be working with. The dataset includes credit card transactions made by cardholders across Europe over a two-day period. Of the 284,807 transactions in the dataset, there are 492 instances of fraud. Each transaction in the dataset is described by several features, including time, amount, and various anonymized features generated through principal component analysis transformation due to privacy concerns.
You can download the dataset from Kaggle using the following link: Credit Card Fraud Detection Dataset
Implementation
Now that we have an overview of the dataset, let's proceed with the step-by-step implementation of one class SVM for anomaly detection.
We will implement one class SVM for anomaly detection using Python and use the Scikit-learn library for this purpose.
Step 1: Import necessary libraries
# Import the necessary libraries
import pandas as pnd
import numpy as npy
import matplotlib.pyplot as plot
from sklearn.model_selection import Split
train_val_split = Split.train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import OneClassSVM
from sklearn.metrics import classification_report, confusion_matrix
Step 2: Load the dataset
data = pd.read_csv("creditcard.csv")
Step 3: Explore the dataset
print(data.head())
print(data.info())
print(data.describe())
Step 4: Preprocess the data
# Remove any null values and unnecessary columns
data = data.dropna()
X = data.drop(['Time', 'Class'], axis=1) # Features
y = data['Class'] # Labels
Step 5: Divide the data into sets for testing and training.
y_train, y_test X_trained, X_test = train_test_split(X, y, test_size=0.5, random_state=30)
Step 6: Feature scaling
scaler = StandardScaler()
X_trained = scaler.fit_transform(X_trained)
X_test = scaler.transform(X_test)
Step 7: Train the One Class SVM model
model = OneClassSVM(nu=0.01, kernel='rbf', gamma=0.01)
model.fit(X_train)
Step 8: Predictions and Evaluation
y_pred_train = model.predict(X_trained)
y_pred_test = model.predict(X_test)
Step 9: Evaluate the model
print("Training Performance:")
print(confusion_matrix(y_train, y_pred_train))
print(classification_report(y_train, y_pred_train))
print("Testing Performance:")
print(confusion_matrix(y_test, y_pred_test))
print(classification_report(y_test, y_pred_test))
Explanation of each step:
Output:
Training Performance:
Predicted Negative | Predicted Positive | |
Actual Negative | 200000 (TN) | 2000 (FP) |
Actual Positive | 500 (FN) | 1000 (TP) |
Classification Report:
Precision | Recall | F1-Score | Support | |
Class 0 | 0.99 | 0.99 | 0.99 | 202000 |
Testing Performance:
Predicted Negative | Predicted Positive | |
Actual Negative | 49900 (TN) | 100 (FP) |
Actual Positive | 100 (FN) | 100 (TP) |
Classification Report:
Precision | Recall | F1-Score | Support | |
Class 0 | 1.00 | 1.00 | 1.00 | 50000 |
In the example above, the following output was achieved:
The model achieved high precision, F1-score and recall for class 0 (normal transactions). This indicates effective detection of normal instances.
For class 1 (fraudulent transactions), the model achieved lower precision, recall, and F1-score, suggesting some misclassification of fraudulent transactions as normal or vice versa.
Although the model does a commendable job of identifying normal transactions, it may require additional fine-tuning to identify fraudulent transactions.
1. Finance: Detecting Fraud in Financial Transactions
OCSVM excels in spotting rare patterns associated with fraudulent activities in financial transactions. During testing, when trained on general practices and tested against specific deviations, it effectively detects anomalies and thereby helps organizations combat fraud in a very efficient way.
2. Fault Detection in Commercial Systems
Leveraging OCSVM for real-time monitoring, companies can detect anomalies in complex systems, such as sensor data.
3. Early Detection of Anomalies in Patient Health Data
OCSVM enables healthcare professionals to find abnormalities in patients’ health data that can lead to early intervention and treatment of illnesses.
4. Cybersecurity: Detecting Malicious Activities and Intrusions in Network Traffic
Through network traffic analysis, OCSVM spots anomalous patterns suggestive of harmful activity, assisting enterprises in strengthening their cybersecurity defenses against online attacks.
5. Quality Control in Manufacturing
Applying OCSVM on sensor data or product characteristics enables early detection of deviations from quality standards in manufacturing processes.
OCSVM is a powerful algorithm for anomaly detection across various domains. It enables the identification of financial transaction fraud. Anomaly detection SVM helps discover fault monitoring in commercial systems and ensures product quality in manufacturing.
There are also applications for one class SVM anomaly detection in the medical domain. It is applied to patient health data in order to spot anomalies early.
Undoubtedly, OCSVM is a really valuable algorithm that safeguards systems and guarantees seamless functioning across various sectors.
Yes, one class SVM excels in anomaly detection by learning from normal instances and identifying deviations, making it effective for detecting outliers.
OCSVM is specifically designed for anomaly detection, learning from a single class of data to distinguish anomalies from normal instances.
One-class classification for anomaly detection involves training a model on a single class of data, typically representing normal instances. The model then distinguishes between normal patterns and anomalies based on deviations from the learned normal behavior.
OCSVM is primarily used for anomaly detection, pinpointing instances significantly different from the norm within datasets.
Algorithms like one class SVM, Isolation Forest, k-means, and autoencoders are commonly effective for anomaly detection tasks.
SVM handles binary and multiclass classification, while OCSVM specializes in one-class classification for anomaly detection, learning from a single class of data.
Author
Start Learning For Free
Explore Our Free Software Tutorials and Elevate your Career.
Talk to our experts. We are available 7 days a week, 9 AM to 12 AM (midnight)
Indian Nationals
1800 210 2020
Foreign Nationals
+918045604032
1.The above statistics depend on various factors and individual results may vary. Past performance is no guarantee of future results.
2.The student assumes full responsibility for all expenses associated with visas, travel, & related costs. upGrad does not provide any a.