This include contains functions used by the spider
returns a new databasehandle
deprecated - use doSQL instead.
Takes a a SQL statement as argument and executes it. If no global database handle is available it is created. Returns twodimensional array with the result set. in [0][0] is the insert ID on a insert statement.
quotes a string for database and converts perl 5.8+ Unicodestrings back to latin1
returns returns quoted string
Inserts a keyword for a site into the database
returns keyid
Inserts (updates) a site and its data into the database
returns siteid
Deletes a site from the database
returns deleted siteid or 0 if site wasn't found
splits URL into parts
returns domain,path,page
fetches URL and reachs the content to the parsers
returns nothing
counts all keywords (counts are multiplied with $multiplier) and stores the number of found keywords in the given hash
returns nothing
parses header (HTTP::Headers) and document
returns nothing
strips tags and newlines from HTML
returns textonly
strips special chars and lowercase's
returns textonly
removes all hits for a site
returns nothing
removes finds all links and puts them into db
returns nothing
adds a new url.
returns nothing
prints a message if $VERBOSITY is right
returns nothing
deletes everything in database
returns nothing
fetches an url that is older than $reindexperiod (in hours)
returns true|false
updates the lastcheck field of an URL to make sure the next run doesn't pick that url again:
returns nothing
loads stopwords from $file
returns @stopwords
checks if $needle is in @haystack
returns true|false
prints statistical data
returns nothing
show most common $limit keywords
returns nothing
prints all startURLs
returns nothing
reads a config file
returns hash with config data
removes duplicates and senseless data from the database
returns nothing
Deletes a startURL and all dependant sites and exits.
returns nothing