Subscribe to RSS Subscribe to Comments

Programming stuff: Bash, Linux, SQL and Java

mysql-jdbc insert optimisation

Today, at work, we faced insert and delete problems in our mysql database.
usually we have bulks of inserts/updates/deletes from a backend engine, and selects from the website.
the inserts/updates/deletes bulks are usually around 100-500, but at peak times it goes up to 40 000.
the application that does it is a java application and it does it with PreparedStatement and batch.
When we first did this, we thought it would be enough, and no further optimisation could occur.
To my surprise, there is more to it. instead of 5000 "insert into table values (?,?,?,?,?)", this proves to be 5 times faster, in our case:
"insert into table values(?,?,?,?,?),(?,?,?,?,?),(?,?,?,?,?),….5000 times..,(?,?,?,?,?)".
there is some time wasted in building the string, but it is much better this way, as the load on our servers is on the mysql engine, the rest of the applications are low.
you can find the source code of a java class that tests the difference between the 2 approaches, so you can test it at home.
the times are: ~ 1235 ms for the thousand of inserts in batch and 211 ms for the one big insert with 5000 values() pairs.
the stringbuilding of the sql takes ~ 300 ms which is good for our purposes.

1. note: for deleting, always use delete from table where id IN (?,?,?…..?)

2. for the source code file, i did copy from eclipse and paste in vim on the web-server. the code got autoindenting and from 1 comment it commented all the way down. here is the trick that saves you: :set paste before pasting and :set nopaste after.

java GPL - my ideal ./configure line

for my swing applications:
./configure –without-corba –without-rmi –without-imageio –without-management –without-naming
and in java6, –without-derby
also: –enable-opengl and all those desktop and swing optimisations availbale.

but i figure i have to wait before i can do this.

starting work as a sysadmin

I just started my own hosting business, and after buying a server [-dedicated :)], I began configuring my environment.
The hosting company provided me with a 3.1rc3 debian 64 bit version - minimal install - on an AMD dual core 64 bit with 2GB of RAM.
That was BIG, for I was coming from a VPS 200MHz CPU, 200 MB RAM.
So I started to prepare the install phase of all servers I needed.
here is my list:
bind - for nameserver
cyrus-sasl
postfix
postgrey
dovecot
apache-httpd
php
eaccelerator
lighttpd
mysql
java
jetty
openssl
vsftpd
All sources, all compiled as I like - maybe i will tell the arguments passed to the ./configure command line in another post.
Then was the environment.
.bash_history 10000 lines -> HISTSIZE=10000
HISTIGNORE=’\&:fg:bg:ls:pwd:cd ..:cd ~:ls -al:cd:ps -A:mc:ls -l:%1:%2:top:exit:halt*’ , this is in order to save only the important lines
Ctrl-R then would come handy to search for those precious lines.
VISUAL=/usr/bin/vim needed for crontab, and other tools that need an external editor
vim is my editor. VIM rules.

mysql analyze and optimize from bash

You want to optimize your mysql databases on a regular bases [say every night].
this is the script that does it. just put it in the crontab and you will be fine:
You could say you could do it in php, but there are machines dedicated for mysql, so why put php on them when you have bash?
I assure you, in php there are more lines than in this small script.

#!/bin/sh
for table in $(mysql -u MysqlUser -pMysqlPassword -D DBName -e ’show tables’ | awk ‘!/Tables/ {print $1}’); do
mysql -s -s -u MysqlUser -pMysqlPassword -D DBName -e "analyze table $table"
mysql -s -s -u MysqlUser -pMysqlPassword -D DBName -e "optimize table $table"
echo " $table -> optimized"
done

explanations:
awk '!/Tables/ {print $1}' i want the header produced by mysql to not show, so i take Tables_in_DBName out of the output
mysql -s -s -> suppressing some formatting from mysql output

Now, your mysql will run faster and using the indexes you put in place.

eliminating bad emails from a file

You have a file, with an email on each line.
this is the newsletter database for a client.
the emails have been added in this file from a form on the client’s website.
Then he starts to send email to all his users.
20% of the email returned with serious problems. and you have to eliminate those addresses from the file.
So you end up with 2 files one with all emails, and another one with bad ones. One has 1000 lines, the other 200.
You want to automate it, and not to do it by hand. you can’t use diff, as the email addresses are not in the same order.
this script comes to rescue:

#!/bin/sh
declare -a emailsToBeDeleted
declare -a allEmails
emailsToBeDeleted=( `cat $1`)
allEmails=( `cat $2`)
declare -a idxDelete
let idx=0
for toBeDeleted in "${emailsToBeDeleted[@]}"; do
let ctx=0
for email in "${allEmails[@]}"; do
if [ "$email" == "$toBeDeleted" ]
then
idxDelete[$idx]=$ctx
echo "deleting… $ctx $email $toBeDeleted "
idx=$((idx+1))
fi
ctx=$((ctx+1))
done
done
for emailIdx in ${idxDelete[@]}; do
unset allEmails[$emailIdx]
done
for email in ${allEmails[@]}; do
echo $email;
done

$1 represents the argument for the file with all the emails, $2 the file with the email to be deleted.
usage: stripmails.sh allemails.dat bademails.dat > goodemails.dat

Based on FluidityTheme Redesigned by Kaushal Sheth Sponsored by Send Flowers