RNN vs. CNN vs. Autoencoder vs. Attention/Transformer: A Practical Guide with PyTorch

Deep learning has evolved rapidly, offering a toolkit of neural architectures for various data types and tasks. Among the most influential are Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Autoencoders, and the modern Attention/Transformer models.
But how do they differ? When should you use each? Let’s break them down with simple PyTorch code examples!

RNNs (Recurrent Neural Networks)
CNNs (Convolutional Neural Networks)
Autoencoders
Attention & Transformer Models
Summary Table

1. RNNs: Sequential Data Specialists

recurrent_neural_networks

RNNs are designed for sequence modeling, where inputs are ordered and past context matters—think text, speech, time series.

Core idea:

Maintain a hidden state that is updated as the sequence progresses.
Handles variable-length input, but struggles with long-range dependencies (can “forget” far-back info).

Common use-cases:

Language modeling, text generation, sentiment analysis, speech recognition, forecasting.

PyTorch Example: Simple Character-level RNN

import torch
import torch.nn as nn

class SimpleRNN(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.rnn = nn.RNN(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = self.embedding(x)         # [batch, seq_len] -> [batch, seq_len, embed_size]
        out, _ = self.rnn(x)          # [batch, seq_len, hidden_size]
        out = self.fc(out[:, -1, :])  # Take last time step
        return out

# Example usage
model = SimpleRNN(vocab_size=100, embed_size=32, hidden_size=64, num_classes=10)
inputs = torch.randint(0, 100, (8, 20))  # batch_size=8, seq_len=20
outputs = model(inputs)
print(outputs.shape)  # torch.Size([8, 10])

2. CNNs: The Grid Data Pros

CNNs shine on grid-like data, especially images, but also 1D signals (audio, time series) and even text (for local feature extraction).

Core idea:

Use convolutional filters to extract local patterns, hierarchically combining them.
Exploit spatial/local correlations.

Common use-cases:

Image classification, object detection, medical imaging, audio, some text tasks.

PyTorch Example: Simple Image CNN

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(32 * 7 * 7, num_classes)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))  # [batch, 16, 14, 14]
        x = self.pool(torch.relu(self.conv2(x)))  # [batch, 32, 7, 7]
        x = x.view(x.size(0), -1)                 # Flatten
        x = self.fc1(x)
        return x

# Example usage
model = SimpleCNN(num_classes=10)
inputs = torch.randn(8, 1, 28, 28)  # batch_size=8, 1 channel, 28x28 image
outputs = model(inputs)
print(outputs.shape)  # torch.Size([8, 10])

3. Autoencoders: Unsupervised Compressors

Autoencoders learn to encode data to a compact representation and reconstruct it—great for dimensionality reduction, denoising, and unsupervised feature learning.

Core idea:

Consists of an encoder (compresses input) and decoder (reconstructs input).
Forces learning of salient features in the bottleneck.

Common use-cases:

Denoising images, anomaly detection, unsupervised pre-training, generative tasks.

PyTorch Example: Simple MLP Autoencoder for MNIST

import torch
import torch.nn as nn

class SimpleAutoencoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 32)
        )
        self.decoder = nn.Sequential(
            nn.Linear(32, 128),
            nn.ReLU(),
            nn.Linear(128, 28*28),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = x.view(x.size(0), -1)         # Flatten
        z = self.encoder(x)
        out = self.decoder(z)
        out = out.view(x.size(0), 1, 28, 28)
        return out

# Example usage
model = SimpleAutoencoder()
inputs = torch.randn(8, 1, 28, 28)  # batch_size=8
outputs = model(inputs)
print(outputs.shape)  # torch.Size([8, 1, 28, 28])

4. Attention & Transformer: Long-Context Masters

Transformers (powered by attention mechanisms) revolutionized NLP and are now conquering vision, speech, and more.

Core idea:

Self-attention: Each element attends to all others, modeling global dependencies efficiently.
Processes sequences in parallel (unlike RNNs).
Easily scales to long sequences and enables transfer learning (pretrained models like BERT, GPT, ViT).

Common use-cases:

Language modeling, translation, summarization, question answering, code completion, vision transformers.

PyTorch Example: Tiny Transformer for Classification

import torch
import torch.nn as nn

class SimpleTransformer(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_heads, num_classes, max_len=32):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.pos_embedding = nn.Parameter(torch.randn(1, max_len, embed_dim))
        encoder_layer = nn.TransformerEncoderLayer(d_model=embed_dim, nhead=num_heads)
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=2)
        self.fc = nn.Linear(embed_dim, num_classes)
        self.max_len = max_len

    def forward(self, x):
        # x: [batch, seq_len]
        x = self.embedding(x)                          # [batch, seq_len, embed_dim]
        seq_len = x.size(1)
        x = x + self.pos_embedding[:, :seq_len, :]     # Add positional encoding
        x = x.transpose(0, 1)                          # Transformer expects [seq_len, batch, embed_dim]
        out = self.transformer(x)
        out = out[0]                                   # Use the output at position 0 (could use pooling)
        out = self.fc(out)
        return out

# Example usage
model = SimpleTransformer(vocab_size=100, embed_dim=64, num_heads=4, num_classes=10)
inputs = torch.randint(0, 100, (8, 32))  # batch_size=8, seq_len=32
outputs = model(inputs)
print(outputs.shape)  # torch.Size([8, 10])

5. Summary Table

Architecture	Data Type	Pros	Cons	Example Use-case
RNN	Sequences (text, time series)	Captures order, works for variable length	Hard to train on long sequences (vanishing gradients)	Language modeling
CNN	Images, grid-like	Efficient, local feature detection	Not suited for long dependencies	Image classification
Autoencoder	Any (usually images/tabular)	Unsupervised, learns features	Not for sequence tasks	Denoising, anomaly detection
Transformer	Sequences (NLP, vision, audio)	Captures long-range/global dependencies, parallelizable	Needs large data, compute	Translation, summarization, ViT

Which to Use When?

RNN: Time-ordered, sequence tasks (text, audio) when sequence length isn’t huge.
CNN: Images or short, fixed-length signals.
Autoencoder: When you want to compress, denoise, or learn representations unsupervised.
Transformer: Most NLP tasks, especially with long dependencies or need for transfer learning; now strong in vision, audio, and more.

RNN vs. CNN vs. Autoencoder vs. Attention/Transformer

Published by admin on August 3, 2025August 3, 2025

RNN vs. CNN vs. Autoencoder vs. Attention/Transformer: A Practical Guide with PyTorch

Table of Contents

1. RNNs: Sequential Data Specialists

2. CNNs: The Grid Data Pros

3. Autoencoders: Unsupervised Compressors

4. Attention & Transformer: Long-Context Masters

5. Summary Table

Which to Use When?

Like this:

0 Comments

What do you think?Cancel reply

AI Engineering by Chip Huyen: Chapter 7: RAG and Agents

AI Engineering by Chip Huyen: Chapter 5 Prompt Engineering

AI Engineering by Chip Huyen: Chapter 2 Notes and summary

RNN vs. CNN vs. Autoencoder vs. Attention/Transformer

Published by admin on August 3, 2025August 3, 2025

RNN vs. CNN vs. Autoencoder vs. Attention/Transformer: A Practical Guide with PyTorch

Table of Contents

1. RNNs: Sequential Data Specialists

2. CNNs: The Grid Data Pros

3. Autoencoders: Unsupervised Compressors

4. Attention & Transformer: Long-Context Masters

5. Summary Table

Which to Use When?

Like this:

0 Comments

What do you think?Cancel reply

Related Posts

AI Engineering by Chip Huyen: Chapter 7: RAG and Agents

AI Engineering by Chip Huyen: Chapter 5 Prompt Engineering

AI Engineering by Chip Huyen: Chapter 2 Notes and summary