nltk => शब्दों को रोकें

परिचय

स्टॉप शब्द वे शब्द हैं जो ज्यादातर फिलर्स के रूप में उपयोग किए जाते हैं और शायद ही कोई उपयोगी अर्थ होता है। हमें इन शब्दों को डेटाबेस में जगह लेने या मूल्यवान प्रसंस्करण समय लेने से बचना चाहिए। हम आसानी से शब्दों की एक सूची बना सकते हैं जिसका उपयोग शब्दों को रोकने के लिए किया जा सकता है और फिर इन शब्दों को उस डेटा से फ़िल्टर करें जिसे हम संसाधित करना चाहते हैं।

फ़िल्टरिंग शब्दों को रोकना

एनएलटीके में डिफ़ॉल्ट रूप से शब्दों का एक गुच्छा होता है जिसे वह शब्दों को रोकना समझता है। इसे NLTK कॉर्पस के माध्यम से एक्सेस किया जा सकता है:

from nltk.corpus import stopwords

अंग्रेजी भाषा के लिए संग्रहीत स्टॉप शब्दों की सूची की जाँच करने के लिए:

stop_words = set(stopwords.words("english"))
print(stop_words)

दिए गए पाठ से स्टॉप शब्द हटाने के लिए स्टॉप_ पासवर्ड सेट करने के लिए उदाहरण:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

example_sent = "This is a sample sentence, showing off the stop words filtration."
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]

filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)
    
print(word_tokens)
print(filtered_sentence)

Modified text is an extract of the original Stack Overflow Documentation

के तहत लाइसेंस प्राप्त है CC BY-SA 3.0

से संबद्ध नहीं है Stack Overflow

nltk
शब्दों को रोकें

खोज…

परिचय

फ़िल्टरिंग शब्दों को रोकना