17
Lesson 17 of 20 ยท Patterns and Data
Natural language as data
Text requires special preprocessing: tokenization, stop word removal, stemming, and encoding. Modern NLP uses subword tokenization like BPE.
- Text preprocessing converts language to numbers.
- BPE tokenization handles subwords efficiently.
Think about it
What is the ROC curve?
