17

Lesson 17 of 20 ยท Patterns and Data

Natural language as data

Text requires special preprocessing: tokenization, stop word removal, stemming, and encoding. Modern NLP uses subword tokenization like BPE.

  • Text preprocessing converts language to numbers.
  • BPE tokenization handles subwords efficiently.

Think about it

What is the ROC curve?

Your Cart (0)

Your cart is empty

Browse our shop to find activities your kids will love