nltk => Enracinement

Introduction

Stemming est une sorte de méthode de normalisation. De nombreuses variantes de mots ont la même signification, sauf lorsque le temps est impliqué. La raison pour laquelle nous cherchons à réduire la recherche et à normaliser les phrases. Fondamentalement, il s'agit de trouver la racine des mots après en avoir retiré le verbe et la partie tendue. L'un des algorithmes les plus populaires est le stemmer de Porter, qui existe depuis 1979.

Porter stemmer

Importer PorterStemmer et initialiser

 from nltk.stem import PorterStemmer
 from nltk.tokenize import word_tokenize
 ps = PorterStemmer()

Stem une liste de mots

 example_words = ["python","pythoner","pythoning","pythoned","pythonly"]

 for w in example_words:
     print(ps.stem(w))

Résultat:

 python
 python
 python
 python
 pythonli

Soumettez une phrase après l'avoir marquée.

 new_text = "It is important to by very pythonly while you are pythoning with python. All pythoners have pythoned poorly at least once."

 word_tokens = word_tokenize(new_text)
 for w in word_tokens:
     print(ps.stem(w))   # Passing word tokens into stem method of Porter Stemmer

Résultat:

 It
 is
 import
 to
 by
 veri
 pythonli
 while
 you
 are
 python
 with
 python
 .
 all
 python
 have
 python
 poorli
 at
 least
 onc
 .

Modified text is an extract of the original Stack Overflow Documentation

Sous licence CC BY-SA 3.0

Non affilié à Stack Overflow

nltk
Enracinement

Recherche…

Introduction

Porter stemmer