nltk => 스테 밍

소개

스테 밍은 일종의 정규화 방법입니다. 긴장이 관련되어있을 때를 제외하고는 많은 다양한 단어가 동일한 의미를 지닙니다. 우리가 줄기를 떼는 이유는 조회를 줄이고 문장을 정상화하기 위해서입니다. 기본적으로 동사와 시제 부분을 제거한 후 단어의 근원을 찾는 것입니다. 가장 인기있는 형태소 분석 알고리즘 중 하나는 1979 년 이래로 있었던 Porter 스 트리머입니다.

포터 줄기

PorterStemmer 가져 오기 및 초기화

 from nltk.stem import PorterStemmer
 from nltk.tokenize import word_tokenize
 ps = PorterStemmer()

단어 목록 스템프

 example_words = ["python","pythoner","pythoning","pythoned","pythonly"]

 for w in example_words:
     print(ps.stem(w))

결과:

 python
 python
 python
 python
 pythonli

토큰 화 한 후 문장을 자릅니다.

 new_text = "It is important to by very pythonly while you are pythoning with python. All pythoners have pythoned poorly at least once."

 word_tokens = word_tokenize(new_text)
 for w in word_tokens:
     print(ps.stem(w))   # Passing word tokens into stem method of Porter Stemmer

결과:

 It
 is
 import
 to
 by
 veri
 pythonli
 while
 you
 are
 python
 with
 python
 .
 all
 python
 have
 python
 poorli
 at
 least
 onc
 .

Modified text is an extract of the original Stack Overflow Documentation

아래 라이선스 CC BY-SA 3.0

와 제휴하지 않음 Stack Overflow

nltk
스테 밍

수색…

소개

포터 줄기