使用 NLTK 删除停止词

我试图通过使用 nltk 工具包删除用户输入的停止词来处理文本,但是通过删除停止词,像‘ and’、‘ or’、‘ not’这样的词会被删除。我希望这些单词出现在停止词删除过程之后,因为它们是后来处理文本作为查询所需的操作符。我不知道哪些单词可以作为文本查询中的操作符,我也想删除不必要的单词从我的文本。

163670 次浏览

NLTK中有一个内置的停顿词列表,由11种语言的2,400个停顿词组成(波特等人) ,见 http://nltk.org/book/ch02.html

>>> from nltk import word_tokenize
>>> from nltk.corpus import stopwords
>>> stop = set(stopwords.words('english'))
>>> sentence = "this is a foo bar sentence"
>>> print([i for i in sentence.lower().split() if i not in stop])
['foo', 'bar', 'sentence']
>>> [i for i in word_tokenize(sentence.lower()) if i not in stop]
['foo', 'bar', 'sentence']

我建议使用 tf-idf 删除停顿词,请参阅 茎梗对术语频率的影响?

我建议您创建您自己的操作符单词列表,您从停止词列表中删除。集合可以方便地减去,因此:

operators = set(('and', 'or', 'not'))
stop = set(stopwords...) - operators

然后,您可以简单地测试一个单词是 in还是 not in集合,而不依赖于您的操作符是否属于终止词列表的一部分。然后您可以切换到另一个停止词列表或添加一个操作符。

if word.lower() not in stop:
# use word

@ alvas 有个好答案。但是,这又取决于任务的性质,例如,在你的应用程序中,你想把所有的 conjunction例如 而且,或者,但是,如果,同时和所有的 determiner例如 一,一,一些,大部分,每,没有作为停止词考虑到所有其他部分的话是合法的,然后你可能想看看这个解决方案,使用词性标签集来丢弃单词,检查表5.1:

import nltk


STOP_TYPES = ['DET', 'CNJ']


text = "some data here "
tokens = nltk.pos_tag(nltk.word_tokenize(text))
good_words = [w for w, wtype in tokens if wtype not in STOP_TYPES]

@ alvas 的回答起作用了,但可以快得多。假设您有 documents: 字符串列表。

from nltk.corpus import stopwords
from nltk.tokenize import wordpunct_tokenize


stop_words = set(stopwords.words('english'))
stop_words.update(['.', ',', '"', "'", '?', '!', ':', ';', '(', ')', '[', ']', '{', '}']) # remove it if you need punctuation


for doc in documents:
list_of_words = [i.lower() for i in wordpunct_tokenize(doc) if i.lower() not in stop_words]

请注意,由于您是在一个集合(而不是在一个列表中)中进行搜索,所以理论上速度会快上 len(stop_words)/2倍,如果您需要操作许多文档,这一点非常重要。

对于每个大约300字的5000个文档,我的示例是1.8秒,而@alvas 是20秒。

另外,在大多数情况下,您需要将文本分割成单词来执行其他一些使用 tf-idf 的分类任务。因此,最有可能的情况是也使用词干分析器:

from nltk.stem.porter import PorterStemmer
porter = PorterStemmer()

并在循环中使用 [porter.stem(i.lower()) for i in wordpunct_tokenize(doc) if i.lower() not in stop_words]

您可以使用内置 NLTK 停止词列表的 字符串,标点符号:

from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from string import punctuation


words = tokenize(text)
wordsWOStopwords = removeStopWords(words)


def tokenize(text):
sents = sent_tokenize(text)
return [word_tokenize(sent) for sent in sents]


def removeStopWords(words):
customStopWords = set(stopwords.words('english')+list(punctuation))
return [word for word in words if word not in customStopWords]

NLTK 停顿词完成 名单

从字符串中删除止字

这里我也添加了自定义终止词列表

nltk.download('stopwords')
from nltk.corpus import stopwords                    # Stop words


stop_words = set(stopwords.words('english'))
stop_words.update(list(set(['zero'    , 'one'     , 'two'      ,
'three'   , 'four'    , 'five'     ,
'six'     , 'seven'   , 'eight'    ,
'nine'    , 'ten'     ,
               

'may'     , 'also'    , 'across'   ,
'among'   , 'beside'  , 'however'  ,
'yet'     , 'within'  ,
               

'jan'     ,  'feb'    , 'mar'      ,
'apr'     ,  'may'    , 'jun'      ,
'jul'     ,  'aug'    , 'sep'      ,
'oct'     ,  'nov'    , 'dec'      ,
               

'january' , 'february', 'march'    ,
'april'   , 'may'     , 'june'     ,
'july'    , 'august'  , 'september',
'october' , 'november', 'december' ,
               

'summer'  , 'winter'  , 'fall'     ,
'spring'


"a"         , "about"     ,   "above"  , "after"   ,
"again"     , "against"   ,   "ain"    , "aren't"  ,
"all"       , "am"        ,   "an"     , "and"     ,
"any"       , "are"       ,   "aren"   ,  "as"     ,
"at"        ,
               

"be"        , "because"   ,   "been"   , "before"  ,
"being"     , "below"     ,   "between", "both"    ,
"but"       , "by"        ,
               

"can"       , "couldn"    , "couldn't" , "could"   ,
               

"d"         , "did"       , "didn"     , "didn't"  ,
"do"        , "does"      , "doesn"    , "doesn't" ,
"doing"     , "don"       , "don't"    , "down"    ,
"during"    ,
               

"each"      ,
               

"few"       , "for"      , "from"      , "further" ,
               

"had"       , "hadn"     , "hadn't"    , "has"     ,
"hasn"      , "hasn't"   , "have"      , "haven"   ,
"haven't"   , "having"   , "he"        , "her"     ,
"here"      , "hers"     , "herself"   , "him"     ,
"himself"   , "his"      , "how"       ,
"he'd"      , "he'll"    , "he's"      , "here's"  ,
"how's"     ,
               

"i"         , "if"       , "in"        , "into"    ,
"is"        , "isn"      , "isn't"     , "it"      ,
"it's"      , "its"      , "itself"    , "i'd"     ,
"i'll"      , "i'm"      , "i've"      ,
               

"just"      ,
               

"ll"        , "let's"    ,
               

"m"         , "ma"       ,"me"         ,
"mightn"    , "mightn't" , "more"      , "most"    ,
"mustn"     , "mustn't"  , "my"        , "myself"  ,
"needn"     , "needn't"  , "no"        , "nor"     ,
"not"       , "now"      ,
               

"o"         , "of"       , "off"       , "on"      ,
"once"      , "only"     , "or"        , "other"   ,
"our"       , "ours"     , "ourselves" , "out"     ,
"over"      , "own"      , "ought"     ,
               

"re"        ,
               

"s"         , "same"     , "shan"      , "shan't"   ,
"she"       , "she's"    , "should"    , "should've",
"shouldn"   , "shouldn't", "so"        , "some"     ,
"such"      , "she'd"    , "she'll"    ,
               

"t"         , "than"     , "that"      , "that'll"  ,
"the"       , "their"    , "theirs"    , "them"     ,
"themselves", "then"     , "there"     , "these"    ,
"they"      , "this"     , "those"     , "through"  ,
"to"        , "too"      , "that's"    , "there's"  ,
"they'd"    , "they'll"  , "they're"   , "they've"  ,
               

"under"     , "until"    , "up"        ,
               

"ve"        , "very"     ,
               

"was"       , "wasn"     , "wasn't"    , "we"       ,
"were"      , "weren"    , "weren't"   , "what"     ,
"when"      , "where"    , "which"     , "while"    ,
"who"       , "whom"     , "why"       , "will"     ,
"with"      , "won"      , "won't"     , "wouldn"   ,
"wouldn't"  , "we'd"     , "we'll"     , "we're"    ,
"we've"     , "what's"   , "when's"    , "where's"  ,
"who's"     , "why's"    , "would"     ,
               

"y"         , "you"      , "you'd"     , "you'll"   ,
"you're"    , "you've"   , "your"      , "yours"    , "yourself",
"yourselves",
               

'a',"able", "abst", "accordance", "according", "accordingly", "across", "act", "actually"          ,
"added", "adj", "affected", "affecting", "affects", "afterwards", "ah",      "almost"          ,
"alone", "along", "already", "also", "although", "always", "among", "amongst", "anyone"        ,
"announce", "another", "anybody", "anyhow", "anymore",  "anything", "anyway", "anyways"        ,
"anywhere", "apparently", "approximately", "arent", "arise", "around", "aside", "ask"          ,
"asking", "auth", "available", "away", "awfully", "a's", "ain't", "allow", "allows", "apart"   ,
"appear", "appreciate", "appropriate", "associated"                                            ,
               

"b", "back", "became", "become", "becomes", "becoming", "beforehand", "begin", "beginning"     ,
"beginnings", "begins", "behind", "believe", "beside", "besides", "beyond", "biol", "brief"    ,
"briefly"                                                                                      ,
               

"c", "ca", "came", "cannot", "can't", "cause", "causes", "certain", "certainly", "co", "com"   ,
"come", "comes", "contain", "containing", "contains", "couldnt"                                ,
               

'd',"date", "different", "done", "downwards", "due"                                                ,
               

"e", "ed", "edu", "effect", "eg", "eight", "eighty", "either", "else", "elsewhere", "end"      ,
"ending", "enough", "especially", "et", "etc", "even", "ever", "every", "everybody","except"   ,
"everyone", "everything", "everywhere", "ex"                                                   ,
               

"f", "far", "ff", "fifth", "first", "five", "fix", "followed", "following", "follows", "four"  ,
"former", "formerly", "forth", "found",  "furthermore"                                         ,
               

"g", "gave", "get", "gets", "getting", "give", "given", "gives",  "go", "goes", "got","gone"   ,
"gotten", "giving"                                                                             ,
               

"h", "happens", "hardly", "hed", "hence", "hereafter", "hereby", "herein", "heres", "however"  ,
"hereupon", "hes", "hi", "hid", "hither", "home", "howbeit",  "hundred"                        ,
               

"id", "ie", "im", "immediately", "importance", "important", "inc", "indeed", "itd", "index"    ,
'i',"information", "instead", "invention",   "it'll", "inward", "immediate"                        ,
               

"j",
               

"k", "keep", "keeps", "kept", "kg", "km", "know", "known", "knows"                             ,
               

"l", "largely", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "ltd",
"let", "lets", "like", "liked", "likely", "line", "little", "'ll", "look", "looking", "looks"  ,
               

'm',"made", "mainly", "make", "makes", "many", "maybe", "mean", "means", "meantime", "merely", "mg",
"might", "million", "miss", "ml", "moreover", "mostly", "mr", "mrs", "much", "mug", "must"     ,
"meanwhile", "may"                                                                             ,
               

"n", "na", "name", "namely", "nay", "nd", "near", "nearly", "necessarily", "necessary", "need" ,
"needs", "neither", "never", "nevertheless", "new", "next", "nine", "ninety", "nobody", "non"  ,
"none", "nonetheless", "noone", "normally", "nos", "noted", "nothing", "nowhere", "n2", "nc"   ,
"nd", "ne", "ng", "ni", "nj", "nl", "nn", "nr", "ns", "nt", "ny"                               ,
               

'o',"obtain", "obtained", "obviously", "often", "oh", "ok", "okay", "old", "omitted", "one", "ones",
"onto", "ord", "others", "otherwise", "outside", "overall", "owing",  "oa", "ob", "oc", "od"   ,
"of", "og", "oi", "oj", "ol", "om", "on", "oo", "oq", "or", "os", "ot", "ou", "ow", "ox", "oz" ,
               

"p", "page", "pages", "part", "particular", "particularly", "past", "per", "perhaps", "placed" ,
"please", "plus", "poorly", "possible", "possibly", "potentially", "pp", "predominantly"       ,
"present", "previously", "primarily", "probably", "promptly", "proud", "provides", "put"       ,
"p1", "p2", "p3", "pc", "pd", "pe", "pf", "ph", "pi", "pj", "pk", "pl", "pm", "pn", "po", "pq" ,
"pr", "ps", "pt", "pu", "py"                                                                   ,
               

"q", "que", "quickly", "quite", "qv",  "qj", "qu"                                              ,
               

'r',"readily", "really", "recent", "recently", "ref", "refs", "regarding", "regardless", "regards" ,
"related", "relatively", "research", "respectively", "resulted", "resulting", "results", "run" ,
"right",  "r2", "ra", "rc", "rd", "rf", "rh", "ri", "rj", "rl", "rm", "rn", "ro", "rq", "rr"   ,
"rs", "rt", "ru", "rv", "ry" "r", "ran", "rather", "rd"                                        ,
               

's',"said", "saw", "say", "saying", "says", "sec", "section", "see", "seeing", "seem", "seemed"    ,
"seeming", "seems", "seen", "self", "selves", "sent", "seven", "several", "shall", "shed"      ,
"shes", "show", "showed", "shown", "showns", "shows", "significant", "significantly"           ,
"similar", "similarly", "since", "six", "slightly", "somebody", "somehow", "someone", "soon"   ,
"somewhat", "somewhere", "specifically", "specified", "specify", "specifying", "still", "stop" ,
"strongly", "sub", "substantially", "successfully", "sufficiently", "suggest", "sup", "sure"   ,
"s2", "sa", "sc", "sd", "se", "sf", "si", "sj", "sl", "sm", "sn", "sp", "sq", "sr", "ss", "st" ,
"sy", "sz",   "sorry", "sometime", "somethan", "something", "sometimes"                        ,
               

't',"take", "taken", "taking", "tell", "tends", "thank", "thanx", "that've", "thence", "thereafter",
"thereby", "therefore", "therein", "there'll", "thereof", "therere", "thereto", "thereupon"    ,
"there've", "theyd", "theyre", "think", "thou", "though", "thoughh", "thousand", "throug"      ,
"throughout", "thru", "thus", "til", "tip", "together", "took", "toward", "towards", "tried"   ,
"tries", "truly", "try", "trying", "ts", "twice", "two", "thats",  "thanks",  "th",  "thered"  ,
"theres" "t1", "t2", "t3", "tb", "tc", "td", "te", "tf", "th", "ti", "tj", "tl", "tm", "tn"    ,
"tp", "tq", "tr", "ts", "tt", "tv", "tx"                                                       ,
               

"u", "un", "unfortunately", "unless", "unlike", "unlikely", "unto", "upon", "ups", "us", "use" ,
"used", "useful", "usefully", "usefulness", "uses", "using", "usually", "ue", "ui", "uj", "uk" ,
"um", "un", "uo", "ur", "ut",
               

"v", "value", "various", "'ve", "via", "viz", "vol", "vols", "vs", "va", "vd", "vj", "vo", "vq",
"vt", "vu"                                                                                     ,
               

"w", "want", "wants", "wasnt", "way", "wed", "welcome", "went", "werent", "whatever", "what'll",
"whats", "whence", "whenever", "whereas", "whereby", "wherein", "wheres", "wherever", "whether",
"whim", "whither", "whod", "whoever", "whole", "who'll", "whomever", "whos", "whose", "widely" ,
"whereupon", "willing", "wish", "within", "without", "wont", "words", "world", "wouldnt", "www",
"wi", "wa", "wo",
               

"x", "x1", "x2", "x3", "xf", "xi", "xj", "xk", "xl", "xn", "xo", "xs", "xt", "xv", "xx",
               

"yes", "yet", "youd", "youre", "y2", "yj", "yl", "yr", "ys", "yt",
               

"z", "zero", "zi", "zz"
               

"best", "better", "c'mon", "c's", "cant", "changes", "clearly", "concerning", "consequently", "consider", "considering", "corresponding", "course", "currently", "definitely", "described", "despite", "entirely", "exactly", "example", "going", "greetings", "hello", "help", "hopefully", "ignored", "inasmuch", "indicate", "indicated", "indicates", "inner", "insofar", "it'd", "keep", "keeps", "novel", "presumably", "reasonably", "second", "secondly", "sensible", "serious", "seriously", "sure", "t's", "third", "thorough", "thoroughly", "three", "well", "wonder", "a", "about", "above", "above", "across", "after", "afterwards", "again", "against", "all", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "amoungst", "amount", "an", "and", "another", "any", "anyhow", "anyone", "anything", "anyway", "anywhere", "are", "around", "as", "at", "back", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "below", "beside", "besides", "between", "beyond", "bill", "both", "bottom", "but", "by", "call", "can", "cannot", "cant", "co", "con", "could", "couldnt", "cry", "de", "describe", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "either", "eleven", "else", "elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone", "everything", "everywhere", "except", "few", "fifteen", "fify", "fill", "find", "fire", "first", "five", "for", "former", "formerly", "forty", "found", "four", "from", "front", "full", "further", "get", "give", "go", "had", "has", "hasnt", "have", "he", "hence", "her", "here", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "him", "himself", "his", "how", "however", "hundred", "ie", "if", "in", "inc", "indeed", "interest", "into", "is", "it", "its", "itself", "keep", "last", "latter", "latterly", "least", "less", "ltd", "made", "many", "may", "me", "meanwhile", "might", "mill", "mine", "more", &
quot;moreover", "most", "mostly", "move", "much", "must", "my", "myself", "name", "namely", "neither", "never", "nevertheless", "next", "nine", "no", "nobody", "none", "noone", "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on", "once", "one", "only", "onto", "or", "other", "others", "otherwise", "our", "ours", "ourselves", "out", "over", "own", "part", "per", "perhaps", "please", "put", "rather", "re", "same", "see", "seem", "seemed", "seeming", "seems", "serious", "several", "she", "should", "show", "side", "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone", "something", "sometime", "sometimes", "somewhere", "still", "such", "system", "take", "ten", "than", "that", "the", "their", "them", "themselves", "then", "thence", "there", "thereafter", "thereby", "therefore", "therein", "thereupon", "these", "they", "thickv", "thin", "third", "this", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "top", "toward", "towards", "twelve", "twenty", "two", "un", "under", "until", "up", "upon", "us", "very", "via", "was", "we", "well", "were", "what", "whatever", "when", "whence", "whenever", "where", "whereafter", "whereas",                   "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "whoever", "whole", "whom", "whose", "why", "will", "with", "within", "without", "would", "yet", "you", "your", "yours", "yourself", "yourselves", "the", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "co", "op", "research-articl", "pagecount", "cit", "ibid", "les", "le", "au", "que", "est", "pas", "vol", &qu
ot;el", "los", "pp", "u201d", "well-b", "http", "volumtype", "par",
"0o", "0s", "3a", "3b", "3d", "6b", "6o",
"a1", "a2", "a3", "a4", "ab", "ac", "ad", "ae", "af", "ag", "aj", "al", "an", "ao", "ap", "ar", "av", "aw", "ax", "ay", "az",
"b1", "b2", "b3", "ba", "bc", "bd", "be", "bi", "bj", "bk", "bl", "bn", "bp", "br", "bs", "bt", "bu", "bx",
"c1", "c2", "c3", "cc", "cd", "ce", "cf", "cg", "ch", "ci", "cj", "cl", "cm", "cn", "cp", "cq", "cr", "cs", "ct", "cu", "cv", "cx", "cy", "cz",
"d2", "da", "dc", "dd", "de", "df", "di", "dj", "dk", "dl", "do", "dp", "dr", "ds", "dt", "du", "dx", "dy",
"e2", "e3", "ea", "ec", "ed", "ee", "ef", "ei", "ej", "el", "em", "en", "eo", "ep", "eq", "er", "es", "et", "eu", "ev", "ex", "ey",
"f2", "fa", "fc", "ff", "fi", "fj", "fl", "fn", "fo", "fr", "fs", "ft", "fu", "fy",
"ga", "ge", "gi", "gj", "gl", "go", "gr", "gs", "gy",
"h2", "h3", "hh", "hi", "hj", "ho", "hr", "hs", "hu", "hy",
"i", "i2", "i3", "i4", "i6", "i7", "i8", "ia", "ib", "ic", "ie", "ig", "ih", "ii", "ij", "il", "in", "io", "ip", "iq", "ir", "iv", "ix", "iy", "iz",
"jj", "jr", "js", "jt", "ju",
"ke", "kg", "kj", "km", "ko",
"l2", "la", "lb", "lc", "lf", "lj", "ln", "lo", "lr", "ls", "lt",
"m2", "ml", "mn", "mo", "ms", "mt", "mu",
               

'i',  'ii', 'iii', 'iv', 'v', 'vi', 'vii', 'viii','ix', 'x',
'xi', 'xii', 'xiii', 'xiv', 'xv', 'xvi', 'xvii', 'xviii', 'xix', 'xx',
'xxi', 'xxii', 'xxiii', 'xxiv', 'xxv', 'xxvi', 'xxvii', 'xxviii', 'xxix', 'xxx',
'xxxi', 'xxxii', 'xxxiii', 'xxxiv', 'xxxv', 'xxxvi', 'xxxvii', 'xxxviii', 'xxxix', 'xl',
'xli', 'xlii', 'xliii', 'xliv', 'xlv', 'xlvi', 'xlvii', 'xlviii', 'xlix', 'l',
'li', 'lii', 'liii', 'liv', 'lv', 'lvi', 'lvii', 'lviii', 'lix', 'lx',
'lxi', 'lxii', 'lxiii', 'lxiv', 'lxv', 'lxvi', 'lxvii', 'lxviii', 'lxix', 'lxx',
'lxxi', 'lxxii', 'lxxiii', 'lxxiv', 'lxxv', 'lxxvi', 'lxxvii', 'lxxviii', 'lxxix', 'lxxx',
'lxxxi', 'lxxxii', 'lxxxiii', 'lxxxiv', 'lxxxv', 'lxxxvi', 'lxxxvii', 'lxxxviii', 'lxxxix', 'xc',
'xci', 'xcii', 'xciii', 'xciv', 'xcv', 'xcvi', 'xcvii', 'xcviii', 'xcix', 'c',
               

"one", "first", "two", "second", "three", "third",
"four", "fourth", "five", "fifth", "six",  "sixth", "seven",
"seventh", "eight", "eighth", "nine", "ninth", "ten",
"tenth", "eleven", "eleventh", "twelve", "twelfth", "thirteen",
"thirteenth", "fourteen", "fourteenth", "fifteen", "fifteenth",
"sixteen", "sixteenth",  "seventeen", "seventeenth", "eighteen",
"eighteenth", "nineteen", "nineteenth", "twenty", "twentieth",
"one", "22nd", "second", "nd", "st", "rd", "th",
               

"1","2","3","4","5","6","7","8","9","10th","11th","12th","13th","14th","15th",
"16th","17th","18th","19th","20th","21st","22nd","23rd","24th","25th","26th","27th",
"28th","29th","30th","31st","32nd","33rd","34th","35th","36th","37th","38th","39th",
"40th","41st","42nd","43rd","44th","45th","46th","47th","48th","49th","50th","51st",
"52nd","53rd","54th","55th","56th","57th","58th","59th","60th","61st","62nd","63rd",
"64th","65th","66th","67th","68th","69th","70th","71st","72nd","73rd","74th","75th",
"76th","77th","78th","79th","80th","81st","82nd","83rd","84th","85th","86th","87th",
"88th","89th","90th", "91st", "92nd", "93rd", "94th", "95th", "96th","97th", "98th",
"99th","100th","thirty","forty","fifty","thirty","thirtieth","forty","fortieth",
"fifty", "fiftiethiftieth","sixty","sixtieth","seventy","seventieth", "eighty",
"eightieth", "ninety", "ninetieth","one", "hundred", "100th", "hundredth",
"order","state","page","file",
                

"'d","'ll",  "'m",  "'re",  "'s",  "'ve",  'a',
'about',  'above',  'across',  'after',  'afterwards',  'again',  'against',  'all',
'almost',  'alone',  'along',  'already',  'also',  'although',  'always',  'am',
'among',  'amongst',  'amount',  'an',  'and',  'another',  'any',  'anyhow',  'anyone',
'anything',  'anyway',  'anywhere',  'are',  'around',  'as',  'at',  'back',  'be',
'became',  'because',  'become',  'becomes',  'becoming',  'been',  'before',  'beforehand',
'behind',  'being',  'below',  'beside',  'besides',  'between',  'beyond',  'both',
'bottom',  'but',  'by',  'ca',  'call',  'can',  'cannot',  'could',  'did',  'do',  'does',
'doing',  'done',  'down',  'due',  'during',  'each',  'eight',  'either',  'eleven',
'else',  'elsewhere',  'empty',  'enough',  'even',  'ever',  'every',  'everyone',
'everything',  'everywhere',  'except',  'few',  'fifteen',  'fifty',  'first',
'five',  'for',  'former',  'formerly',  'forty',  'four',  'from',  'front',  'full',
'further',  'get',  'give',  'go',  'had',  'has',  'have',  'he',  'hence',  'her',
'here',  'hereafter',  'hereby',  'herein',  'hereupon',  'hers',  'herself',  'him',  'himself',
'his',  'how',  'however',  'hundred',  'i',  'if',  'in',  'indeed',  'into',  'is',  'it',
'its',  'itself',  'just',  'keep',  'last',  'latter',  'latterly',  'least',  'less',  'made',
'make',  'many',  'may',  'me',  'meanwhile',  'might',  'mine',  'more',  'moreover',  'most',
'mostly',  'move',  'much',  'must',  'my',  'myself',  "n't",  'name',  'namely',  'neither',
'never',  'nevertheless',  'next',  'nine',  'no',  'nobody',  'none',  'noone',  'nor',  'not',
'nothing',  'now',  'nowhere',  'n‘t',  'n’t',  'of',  'off',  'often',  'on',  'once',  'one',
'only',  'onto',  'or',  'other',  'others',  'otherwise',  'our',  'ours',  'ourselves',  'out',
'over',  'own',  'part',  'per',  'perhaps',  'please',  'put',  'quite',  'rather',  're',  'really',
'regarding',  'same',  'say',  'see',  'seem',  'seemed',  'seeming',  'seems',  'serious',  'several',
'she',  'should',  'show',  'side',  'since',  'six',  'sixty',  'so',  'some',  'somehow',  'someone',
'something',  'sometime',  'sometimes',  'somewhere',  'still',  'such',  'take',  'ten',  'than',
'that',  'the',  'their',  'them',  'themselves',  'then',  'thence',  'there',  'thereafter',
'thereby',  'therefore',  'therein',  'thereupon',  'these',  'they',  'third',  'this',  'those',
'though',  'three',  'through',  'throughout',  'thru',  'thus',  'to',  'together',  'too',  'top',
'toward',  'towards',  'twelve',  'twenty',  'two',  'under',  'unless',  'until',  'up',  'upon',  'us',
'used',  'using',  'various',  'very',  'via',  'was',  'we',  'well',  'were',  'what',  'whatever',  'when',
'whence',  'whenever',  'where',  'whereafter',  'whereas',  'whereby',  'wherein',  'whereupon',  'wherever',
'whether',  'which',  'while',  'whither',  'who',  'whoever',  'whole',  'whom',  'whose',  'why',  'will',
'with',  'within',  'without',  'would',  'yet',  'you',  'your',  'yours',  'yourself',  'yourselves',  '‘d',
'‘ll',  '‘m',  '‘re',  '‘s',  '‘ve',  '’d',  '’ll',  '’m',  '’re',  '’s',  '’ve'


                       

])))






import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
from nltk.tokenize import word_tokenize


stop_words = stopwords.words("english")


sentence = "PDF.co is a website that contains different tools to read, write and process PDF documents"
words = word_tokenize(sentence)


sentence_wo_stopwords = [word for word in words if not word in stop_words]


print(" ".join(sentence_wo_stopwords))