Understanding Artificial Intelligence, Machine Learning, and Deep Learning

Unit -1 : Understanding Artificial Intelligence, Machine Learning, and Deep Learning

Technology is evolving faster than ever, and at the center of this revolution are three buzzwords Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL).These fields are shaping everything from healthcare and entertainment to finance and education.

🧠 What is Artificial Intelligence?

Artificial Intelligence (AI) refers to the ability of machines to mimic human intelligence.

It enables systems to think, reason, and make decisions — just like humans, but much faster and with huge amounts of data.

🧩 Example:

AI powers voice assistants like Siri and Alexa.
It helps Netflix recommend what you might like to watch next.

In short: AI is the broad concept of machines doing tasks that normally require human intelligence.

🤖 What is Machine Learning?

Machine Learning (ML) is a subset of AI.
It focuses on teaching computers to learn from data and improve automatically without being explicitly programmed

🧠 Example:

When you tag your friends on Facebook, the system learns to recognize faces using previous data — that’s machine learning at work.

In short: ML is how machines learn patterns from data to make predictions or decisions.

🧬 What is Deep Learning?

Deep Learning (DL) is a specialized branch of Machine Learning that uses artificial neural networks — models inspired by the human brain.
It’s especially powerful for tasks like image recognition, natural language processing, and speech translation.

🧠 Example:

Deep Learning helps self-driving cars detect pedestrians and traffic signs.
It’s used in Google Translate to understand context in multiple languages.

In short: Deep Learning is ML with more layers of learning — capable of handling very complex data.

💡 Applications of AI, ML, and DL

🧮 Types of Machine Learning-

Supervised Learning:
The model learns from labeled data — input-output pairs.
Example: Predicting house prices from known datasets.
Unsupervised Learning:
The model finds hidden patterns in unlabeled data.
Example: Customer segmentation in marketing.
Reinforcement Learning:
The system learns by interacting with its environment and receiving rewards or penalties.
Example: Training a robot to walk or an AI to play chess.

📊 Data Formats in Machine Learning

Machine learning models rely on data in structured or unstructured forms:

Structured Data: Tables, CSVs, databases (numeric, categorical values).
Unstructured Data: Text, images, audio, videos.
Semi-Structured Data: JSON, XML files.

Example:
An email dataset may contain structured fields (sender, time) and unstructured text (email body).

🧩 Learnability and Statistical Learning Approach

Learnability refers to how easily a model can learn patterns from data and generalize to new data.
A model is “learnable” if it performs well not only on training data but also on unseen examples.

The Statistical Learning Approach views learning as finding a function that best maps input (X) to output (Y).
It uses mathematical models and probability theory to minimize prediction errors.

Example: Linear regression finds the best-fitting line that predicts Y from X.

🧹 Data Cleaning

Before training any model, data must be clean and reliable.
Data cleaning involves:

Handling missing values
Removing duplicates
Fixing inconsistent formatting
Detecting outliers

Example:
If your dataset contains “N/A” or “null” values, replacing them with the mean or median ensures consistent input for models.

🔍 Exploratory Data Analysis (EDA)

EDA helps understand the dataset’s structure, patterns, and relationships before modeling.
It involves both visual and statistical analysis.

Key steps include:

Summary statistics (mean, median, variance)
Data visualization (histograms, scatter plots, box plots)
Correlation analysis

Example:
A scatter plot between “study hours” and “exam scores” can visually show if higher study time leads to better performance.

🎯 Feature Selection Techniques

Feature selection is the process of identifying and selecting a subset of relevant features for use in model construction. The goal is to enhance the model's performance by reducing overfitting, improving accuracy, and reducing training time.

1️⃣ SelectKBest

Selects the top ‘k’ features based on statistical tests like Chi-square or ANOVA.
Example: Selecting the top 10 features most related to a student’s marks.

Code Snipet-

from sklearn.feature_selection import SelectKBest, mutual_info_regression

# Using mutual_info_regression as score_func and selecting top 5 features

selector = SelectKBest(score_func=mutual_info_regression, k=5)

selector.fit(X, y)

2️⃣ Variance Threshold

Variance in data represents how spread out the values of a feature are. Features with low variance (e.g., nearly constant values) contain little information because it remains almost constant across samples. Removing them helps reduce the noise and computational cost.

Removes features with very low variance (little change across data).
Example: A column where all values are 1 adds no learning value.

Python's scikit-learn library offers a straightforward implementation of Variance Threshold:

# Initialize VarianceThreshold

selector = VarianceThreshold()

# Fit and transform the data

X_sele = selector.fit_transform(X)

3️⃣ Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is one of the most powerful and widely used techniques in data preprocessing and dimensionality reduction.
It helps simplify large datasets by converting them into smaller sets of variables, called principal components, without losing much important information.

⚙️ How PCA Works

Let’s break it down into simple steps:

Standardize the Data:
Since PCA is affected by scale, all features are converted to the same scale (e.g., z-score normalization)
Compute the Covariance Matrix:
It measures how features vary with respect to each other.
Calculate Eigenvalues and Eigenvectors:
- Eigenvalues represent the amount of variance captured by each component.
- Eigenvectors represent the directions (new axes) of maximum variance.
Select Principal Components:
Sort eigenvalues from largest to smallest — the top components capture the most information.
Transform the Data:
The original dataset is projected onto these principal components to form a new dataset with fewer dimensions.

Example: Compressing 100 features into 10 principal components while preserving 90% of the original information.

Unit -3 Supervised Learning

🧠 What is Supervised Learning?

Supervised learning is the process of teaching a machine using labeled datasets. In simple terms, the algorithm learns the relationship between input features (X) and output labels (Y). Once trained, it can predict outcomes for new, unseen data.

Examples:

Predicting whether an email is spam or not.
Classifying handwritten digits.
Detecting if a tumor is benign or malignant.

🔹 1. Logistic Regression

Despite its name, Logistic Regression is used for classification, not regression.
It predicts the probability that an instance belongs to a certain class.

Mathematical Model:

P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + \dots + \beta_nx_n)}}

Here, $\beta_0, \beta_1, ... \beta_n$ are model parameters learned from data.
If the output probability > 0.5, the class is labeled as 1, else 0.

Intuition: Logistic regression draws a boundary (line or curve) that separates different classes using the sigmoid function.

Code Example-

from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LogisticRegression()

model.fit(X_train, y_train)

🔹 2. Naïve Bayes Classifier

A probabilistic model based on Bayes’ Theorem, assuming independence among features.

$P (C ∣ X) = \frac{P (X ∣ C) \cdot P (C)}{P (X)}$

It’s widely used for text classification and spam detection.

Python Syntax:


from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)

📙 3. Support Vector Machine (SVM)

Definition:
SVM is a supervised machine learning algorithm that finds the optimal hyperplane to classify data into two categories by maximizing the margin between the classes

It can handle both linear and non-linear data, and are used for tasks like image recognition and spam detection.

Key Terms:

Margin: Distance between separating line and nearest data points.
Kernel: Function that transforms data into higher dimensions (linear, RBF, polynomial).

Python Syntax:


from sklearn.svm import SVC

svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

📒 4. Decision Trees

Definition:

A decision tree in machine learning is a supervised learning algorithm that uses a tree-like model of decisions and their possible consequences to predict an outcome. It can be used for both classification and regression tasks.

Key Concepts:

Entropy (H): Measures impurity.
$H(S) = -p_1 \log_2(p_1) - p_2 \log_2(p_2)$
Information Gain (IG):
$IG = H(S) - \sum \frac{|S_i|}{|S|} H(S_i)$

Python Syntax:


from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(criterion='entropy')
tree.fit(X_train, y_train)

📕 5. Ensemble Learning

Definition:
Ensemble learning is a machine learning technique that combines multiple models to produce a single, more accurate prediction. Instead of relying on one model, it aggregates the predictions from several "weak" or "base" learners to create a "strong" learner that is more robust and less prone to errors.

Common Techniques:

Bagging: Uses multiple models trained on random subsets (e.g., Random Forest).
Boosting: Sequential models correct previous errors (e.g., AdaBoost, XGBoost).

Python Syntax:


from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier

rf = RandomForestClassifier(n_estimators=100)
ab = AdaBoostClassifier(n_estimators=50)



6. Confusion Matrix
A table that shows the comparison between actual and predicted values.

Metrics:

Accuracy = (TP + TN)/(TP + TN + FP + FN)
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F-score = 2 × (Precision × Recall)/(Precision + Recall)

AUC-ROC = Area under the Receiver Operating Characteristic curve.

Python Syntax:


from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

Search This Blog

Machine Learning

Machine Learning Blog

Unit -1 : Understanding Artificial Intelligence, Machine Learning, and Deep Learning

🧩 Example:

🤖 What is Machine Learning?

🧠 Example:

🧬 What is Deep Learning?

🧠 Example:

💡 Applications of AI, ML, and DL

🧮 Types of Machine Learning-

🧩 Learnability and Statistical Learning Approach

🧹 Data Cleaning

🔍 Exploratory Data Analysis (EDA)

1️⃣ SelectKBest

2️⃣ Variance Threshold

3️⃣ Principal Component Analysis (PCA)

⚙️ How PCA Works

Unit -3 Supervised Learning

🧠 What is Supervised Learning?

🔹 1. Logistic Regression

🔹 2. Naïve Bayes Classifier

📙 3. Support Vector Machine (SVM)

📒 4. Decision Trees

📕 5. Ensemble Learning

Comments

Post a Comment