I was working on trying to optimize inserting data into one of my mysql databases today and came across a method for loading large amounts of data quickly. I was working on parsing through some logs and inserting the important pieces into a mysql database for reporting purposes. The logs, when put in the database amount to about 50 million rows of data, and what I was doing beforehand was a single prepare and execute statement for each line…which wasn’t very efficient. So, I did some research and found that it is much more efficient to send several entries to the database at a time to create less overhead. I found that this combined with turning off indexing beforehand and then turning it back on when I completing the logs increased the speed to at least 1/3rd of the time as before (still need to benchmark it), but quite a noticeable improvement.
Here is some code to show how to do it:
#!/usr/bin/perl
use strict;
use DBI;
my $database = whatever;
my $username = user;
my $password = password;
my $table = table_name;
my $count=0; # Count for database insert
my $max_rows=20000; # Count for bulk insert
my $base_query = qq{INSERT INTO $table VALUES };
my $bulk_query = $base_query;
# Connect to database
my $dbh = DBI->connect("dbi:mysql:database=$database;host=localhost;user=$username;password=$password") or die "Couldn't connect to database: $DBI::errstr\n";
# Turn off Indexes for the insert operation, and then turn it on after run is complete
$sth = $dbh->prepare("ALTER TABLE $table DISABLE KEYS") or die "Couldn't prepare statement: $dbh->errstr\n";
$sth->execute() or die "Couldn't execute query 'sql': $DBI::errstr\n";
while (<FILE>) {
$count++;
my @values = (0,"$name","$ip","$date"); #whatever values you load up from a file, I didn't go to detail
$bulk_query .= "," if ($count>1); # Use the comma after the first entry
$bulk_query .="("
. join(",", map { $dbh->quote($_) } @values )
. ")";
# Insert into database x rows at at time
if ($count > $max_rows) {
print "$count reached\n";
$dbh->do($bulk_query)
or die "You have an error ($DBI::errstr)";
$bulk_query = $base_query;
$count = 0; # Return the count to 0
}
}
&lastdbentry;
# Turn Indexes back on
$sth = $dbh->prepare("ALTER TABLE $table ENABLE KEYS") or die "Couldn't prepare statement: $dbh->errstr\n";
$sth->execute() or die "Couldn't execute query 'sql': $DBI::errstr\n";
# Left over entries
sub lastdbentry {
# Put the remaining entries in database
if ($count != 0) {
print "$count lines left over, inserting into database\n";
$dbh->do($bulk_query)
or die "something wrong ($DBI::errstr)";
}
}