Activation Function

โดย Naweeparb Wangwanich — Safem0de

Activation Function ใช้เปลี่ยนค่าผลรวมของนิวรอนให้เป็นค่าที่มีความหมายก่อนส่งไปยังชั้นถัดไป เพื่อให้โมเดลสามารถเรียนรู้ความซับซ้อนของข้อมูลได้ดีขึ้น

ประเภทของ Activation Function

Sigmoid (ใช้ใน Binary Classification)
- กระบวนการของ Sigmoid
- โค้ดสำหรับการ Plot Sigmoid
สมการ :

$$ \begin{equation} f(x) = \frac{1}{1 + e^{-x}} \end{equation} $$
```
Dense(1, activation='sigmoid')  # ใช้ใน Output Layer ของ Binary Classification
```
ReLU (Rectified Linear Unit)
- กระบวนการของ ReLU
- โค้ดสำหรับการ Plot ReLU
สมการ :

$$ \begin{equation}f(x) = \max(0, x)\end{equation} $$
```
Dense(64, activation='relu')  # ใช้ใน Hidden Layer
```
Leaky ReLU (แก้ปัญหา ReLU ตาย)
- กระบวนการของ Leaky ReLU
- โค้ดสำหรับการ Plot Leaky ReLU
สมการ :

$$ \begin{equation} f(x) = \begin{cases} x, & x > 0 \\ \alpha x, & x \leq 0 \end{cases} \end{equation} $$
```
from keras.layers import LeakyReLU
Dense(64, activation=LeakyReLU(alpha=0.01))  # ให้ค่าลบยังคงอยู่
```
Tanh (Hyperbolic Tangent)
- กระบวนการของ Tanh
- โค้ดสำหรับการ Plot Tanh
สมการ :

$$ \begin{equation} f(x)=\frac{e^x - e^{-x}}{e^x + e^{-x}} \end{equation} $$
```
Dense(64, activation='tanh')
```
Softmax (ใช้ใน Multi-class Classification)

![รูปที่1 แสดงกระบวนการของ Softmax Activation Function ซึ่งใช้ใน Output Layer ของโมเดลที่ทำงานกับ Multi-class Classification

credit : https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60/](attachment:106464ae-caad-45bb-b827-8a3edb0c5603:image.png)

รูปที่1 แสดงกระบวนการของ Softmax Activation Function ซึ่งใช้ใน Output Layer ของโมเดลที่ทำงานกับ Multi-class Classification

credit : https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60/
- กระบวนการของ Softmax
- โค้ดสำหรับการ Plot Softmax
สมการ :

$$ \begin{equation} f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}

\end{equation} $$
```
Dense(10, activation='softmax')  # ใช้ใน Output Layer ของ Multi-class
```

ตารางเปรียบเทียบ Activation Functions

Activation	Output Range	ใช้ใน	ข้อดี	ข้อเสีย
Sigmoid (1)	0 ถึง 1	Binary Classification	ค่าผลลัพธ์เป็นไปได้ระหว่าง 0-1	Gradient Vanishing
ReLU (2)	0 ถึง +∞	Hidden Layers	เร็ว, แก้ Gradient Vanishing	อาจเกิดปัญหา Neuron Dead (ค่าติด 0)
Leaky ReLU (3)	−∞ ถึง +∞	Hidden Layers	ป้องกัน Neuron Dead (ReLU)	อาจทำให้คำนวณช้าขึ้น
Tanh (4)	-1 ถึง 1	NLP, Time Series	ค่าผลลัพธ์ -1 ถึง 1	Gradient Vanishing (แต่ดีกว่า Sigmoid)
Softmax (5)	0 ถึง 1	Multi-class Classification	ใช้เลือกคลาสที่เป็นไปได้มากที่สุด	- ใช้ใน Hidden Layer ไม่ได้

คำนวณช้ากว่า Sigmoid |

สรุปการเลือก Activation Function

Input Layer → ไม่มี Activation Function
Hidden Layers → ใช้ ReLU หรือ Leaky ReLU
Output Layer
- Regression → ใช้ Linear
- Binary Classification → ใช้ Sigmoid
- Multi-class Classification → ใช้ Softmax

reference :

https://www.geeksforgeeks.org/what-are-logits-what-is-the-difference-between-softmax-and-softmax-cross-entropy-with-logits/

https://wandb.ai/amanarora/Written-Reports/reports/Understanding-Logits-Sigmoid-Softmax-and-Cross-Entropy-Loss-in-Deep-Learning--Vmlldzo0NDMzNTU3