* I had copy/pasted this from my discord-markov source and accidentally
used the discord bot's terminology in one place. This is fixed.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* !markov help will display a pretty crude help message. It's funny to
me, I dunno.
* help_timeout config option limits the number of times that `!markov
help` is allowed to run. This is because the help message is a lot of
lines, and it could cause problems if `!markov help` is spammed or
abused. It can be set to 0 or less to disable.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
Sometimes an incoming message won't have any content (for some reason).
I haven't looked too deeply into the origins of it. This just bypasses
that case and returns early.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* Remove toml dependency since that comes with Python as of 3.11
* Update toml usage to tomllib in config.py
* Update `with open(...)` for toml file reading to be 'rb'
* Update Pipfile.lock for locked dependencies to work with Python 3.12
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
We should ignore the data directory because that directory can get quite
large and makes for annoying rebuilds since it has to send all of that
data over to the daemon, and we don't care about that
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
For specific error messages that come in, we should give back an
error-level log instead of logging to debug or trace.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
This is useful for when we want important debug messages, but not
necessarily to be flooded with every message that comes through IRC
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* TODO.md - for small, easier tasks that shouldn't require any (or many)
new components on the codebase
* WISHLIST.md - for larger tasks that require more work, thought, and
code to be added. Probably deserving of their own branches.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* Add .gitignore to individual data directories, so they can manage
their own ignored files
* Remove redundant ignored patterns from the root .gitignore that is now
handled by these individual gitignores (see previous note)
* Remove pickle files from .gitignore since those aren't being used by
any plugin anymore
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
Last patch was creating a set of the list of all matches, instead of a
set of all matches. This fixes that, and also makes the regex a raw
string.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
If there was an unmatched word 'foo', and someone wrote 'foo. bar.', wordbot would not recognize the match because it was only splitting the string on spaces, not punctuation.
This treats as a word any contiguous sequence of letters, numbers (just in case) and hyphens (just in case).
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
The allchain has been a source of headaches because it takes up a lot of
memory and slows everything down. However, with the new database
model, we can generate markov sentences using all of the rows since they
are a flat collection. This helps reduce disk space and increases the
import speed significantly.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* save_every is no longer necessary so it is removed
* sql_path is added if you need to specify the location of the database
SQL
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
Markov now uses a sqlite3 database instead of flat JSON files. This
should significantly speed up saving time, plus reduce the amount of RAM
that it uses. Saving and loading large JSON files was very slow and
caused issues with other plugins, especially when messages were
received. Additionally, in order to save RAM, a cache was used and
periodically flushed when not used, adding some complications to the
implementation. This has all been removed since things get committed on
the fly with the database implementation. The main trade-off we have to
make is the disk space used by the database. This is OK though, because
disk space is cheap while RAM is not.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
This is a lot simpler from a concurrency perspective. Training values
can get committed to the database immediately, rather than in
long-running flat file batches.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
If an HTML title was parsed with whitespace, it would not strip that
surrounding whitespace. This fixes that.
Also, there are some new debug log messages in linkbot. Hooray!
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
Previously, the environment variable would take priority over the
command line argument. This is now reversed.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
Just like the actual chain data structure, this value is now loaded
lazily, since it's stored in the filesystem.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
This allows markov to save (hopefully) in parallel using a
ProcessPoolExecutor. Since objects are sent over-the-wire and copied,
pruning in parallel is not an issue.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
This moves the self.__touch() call around in markov's Chain class such
that it will only access truly available data.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* Chain.__touch() is a new function that updates the last time a markov
chain was accessed
* Fix a bug that would not reliably update the last access time of the
chain during Chain.add()
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* Log levels can now be set via the command line and the configuration
file.
* ServerConfig.load() function takes a file-like object now, rather than
a string
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
Markov chains used to prune the chains themselves from memory, but now
that behavior is specifically delegated up the chain to the Bot
structure instead.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
* Linkbot parser also looks for <meta> tags and uses an actual HTML
parser.
* Inner title HTML is decoded before being displayed.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
There's a long explanation in the code of this commit that says this:
> TL;DR OF THE BELOW: if the first parameter looks like a channel in
> addition to message type, then filter by channel. Otherwise, don't
> filter by channel.
>
> Here's the issue: plugins are *usually* multiplexed by channel. But
> that's only for messages that target channels, such as PRIVMSG and JOIN.
> For non-channel messages, such as server status messages (such as 001 on
> connect, or 372 for MOTD, etc) we want to ignore the channel aspect of
> plugin multiplexing. In order to accomplish this, we just check if the
> first parameter looks like a channel - i.e., starts with an octothorpe #.
Signed-off-by: Alek Ratzloff <alekratz@gmail.com>