从 Python: tf-idf-cosine: 查找文档相似性可以使用 tf-idf 余弦计算文档相似度。在不导入外部库的情况下,有什么方法可以计算两个字符串之间的余弦距离吗?
s1 = "This is a foo bar sentence ."
s2 = "This sentence is similar to a foo bar sentence ."
s3 = "What is this string ? Totally not related to the other two lines ."
cosine_sim(s1, s2) # Should give high cosine similarity
cosine_sim(s1, s3) # Shouldn't give high cosine similarity value
cosine_sim(s2, s3) # Shouldn't give high cosine similarity value