如何强制 Logstash 重新解析文件?

我安装了 Logstash 来解析 apache 文件。我花了相当多的时间来设置正确,我总是在真正的日志上尝试。我注意到(如文档所述) logstash“记得”它在文件中的位置。现在我的设置是确定的,我希望 Logstash“忘记”。这看起来比我想的要难。我已经做了以下事情:

  • 用途: start_position => "beginning"

  • 从 elastissearch 中删除完整的“ data”文件夹(并首先停止它)

  • 看哪些文件在哪里打开 logstash 与 lsof -p PID和删除一切有希望的(在我的情况下 /tmp/jffi*.tmp)

Logstash 仍然不会忘记和解析日志所在文件夹中的“新鲜”文件

有什么想法吗?

84352 次浏览

The plugin file store history of "tailing" in sincedb file, default : under $HOME/.sincedb* , see http://logstash.net/docs/1.3.3/inputs/file#sincedb_path

The since db file contains line look like :

[inode] [major device number] [minor device number] [byte offset]

So, if you want to parse again a complete file, you need to :

  • delete sindedb files
  • OR only delete the corresponding line in sincedb file, check the inode number before of your file (ls -i yourFile | awk '{print $1}' )
  • And restart Logstash

With the key start_position => "beginning", Logstash will analyze all the file.

Example of a sincedb file :

Logstash will keep the record in $HOME/.sincedb_*. You can delete all the .sincedb and restart logstash, Logstash will reparse the file.

By default logstash writes the position is last was on to a logfile which usually resides in $HOME/.sincedb. Logstash can be fooled into believing it never parsed the logfile by specifying /dev/null as sincedb_path.

Here the part of the documentation Input File.

Where to write the since database (keeps track of the current position of monitored log files). Defaults to the value of environment variable "$SINCEDB_PATH" or "$HOME/.sincedb".

Config Example

input {
file {
path => "/tmp/logfile_to_analyse"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

I found it in my home dir but after deleting it, logstash refused to re-pick the existing log files. The way I got it to work was to add

sincedb_path => "/opt/elk/sincedb/"

to my file plugin. I think to reset each time, just change the path of sincedb_path

If you are using logstash-forwarder check your home for .logstash-forwarder file instead:

{
"/var/log/messages": {
"source": "/var/log/messages",
"offset": 43715,
"inode": 12967,
"device": 51776
}
}

After deleting $HOME/.sincedb_* it still wasn't ingesting data for me.

After trying a bunch of things I removed all but the main .conf file from /etc/logstash/conf.d and restarted Logstash, and everything worked. I can only assume there was something in one of the .conf files that logstash was silently hanging on.

Actually reparsing each time is very costly if the file has large data in it. So you need to be careful before doing this. If we want to force it to reparse again then set the parameter inside input block

sincedb_path => "/dev/null"

This option will not be storing the .sincedb file and logstash will reparse each time. But if you want to reparse occasionaly not each time then what you can do is that delete manually the .sinceDb path which is created on parsing the file. Generally it is present in the home directory as a hidden file if you are not a root user otherwise in root directory. You can also set the sincedb_path to some other location to trace this file easily.

sincedb_path => "/home/shubham/sinceDB/productsSince.db"

If you want to avoid messing with the logstash options I've found that renaming or removing the existing log file and creating a new file from the old file contents will trick logstash into re-indexing.

Combining all answers, guess this is the best way to parse files. I did the same for my testing.

input {
file {
path => "/tmp/access_log"
start_position => beginning
sincedb_path => "/dev/null"
ignore_older => 0
}
}

For a quick test, instead of ignore_older , you can also touch /tmp/access_log to change timestamp of the file.

logstash version 5 new directory is in

<path.data>/plugins/inputs/file

path.data definition is in logstash.yml

if you use tar.gz install filebeat, you can delete this file, $FilebeatPath/data/registry/filebeat/data.json, and rerun the filebeat

Try by deleting /var/lib/logstash folder in your ENV

As seen on: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-sincedb_path

You can see that Logstash is going to save a sincedb file keeping track of which file it already has seen and processed till which line.

If you want to get rid of the existing sincedb file and you do not have defined the sincedb_path yourself you can find it in

<path.data>/plugins/inputs/file

By default <path.data> holds the value

LOGSTASH_HOME/data

By default LOGSTASH_HOME holds the value

/var/lib/logstash

It is best to define the sincedb_path if you want to have full control of it

I would suggest:

sincedb_clean_after => 0
start_position => "beginning"