using vim instead of sed

Posted in Bash on February 2nd, 2007 by gmunteanu

in order to process a text file in any sense, and being used with vim, and not with sed
i use the following for deleting googlebot lines from a web log.

vim -e -c ":d/googlebot/g" -c :wq website.YYYYMMDD.log
-e doesn’t put vim in interactive mode, so you can script it and put it in crontab.

or if you need to pipe it:

cat website.YYYYMMDD.log | vim – -e –c ":d/googlebot/g" -c :wq

maybe i will add more vim issues as comments to this post.

fill a table with random data

Posted in sql on February 2nd, 2007 by gmunteanu

below is a case with only 2 varchar columns and an int
create table example(
id int unsigned auto_increment not null,
column1 varchar(200),
column2 varchar(200),
column3 int unsigned,
primary key(id)
);

mysql -u USERNAME -pPASSWORD DBNAME -e "insert into example (column1,column2,column3) values (char(FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20))),char(FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20)),FLOOR(65 + (RAND() * 20))),FLOOR(1000 + (RAND() * 1000)))"

it will generate uppercase strings and integers between 1000 and 2000 ,and insert them into the table.
it is important to put unsigned, thus you will increase the range for the primary key. you don’t need any negative values for id anyway.

OR SQL operator in mysql and slow query

Posted in sql on February 2nd, 2007 by gmunteanu

….not anymore, if you use IF.

there are cases where you can get rid of OR and get better performance in mysql.
here is a case a table:
book
id
french_name
engl_name

the books have only one name and only one, the other being null
you have a form on your web page and want to do a search for the books that contain a certain substring of characters. the client wants to retrieve both english and french name that contain that substring.

first shot: select * from book where french_name like ‘%CHARS%’ or engl_name like ‘%CHARS%’

this is very expensive.
the better solution:
select * from book where if(french_name is null,engl_name,french_name) like ‘%CHARS%’

on my system is 2 times faster.

group_concat and NULL in mysql

Posted in Bash on January 9th, 2007 by gmunteanu

Well, today I spent 2 hours on a query that involved group_concat.
we had a table that looked like this:
Columns:
……….
client_name
phone
mobile
email
……….
and in a complex query we needed to have group_concat(client_name),  group_concat(phone), group_concat(mobile), group_concat(email) …
[note: actually we did : cast(group_concat(client_name) as char),  cast(group_concat(phone) as char), cast(group_concat(mobile) as char), cast(group_concat(email)  as char)... ]
the thing is that phone and mobile were missing for most of the clients, and the value was NULL.
now if you have for phone values:
NULL
123456789
NULL
NULL
then group_concat(phone) is 123456789
and i woud get:
 clients                                             phone
john,george,alex,vio                123456789

and i wouldn’t know which client had that phone.
what i needed was somethin like:
 clients                                             phone
john,george,alex,vio                _,123456789,_,_
and i would know that george had phone number 123456789.

the sql for that is:
select …. cast(group_concat(ifnull(client_name,’_')) as char),  cast(group_concat(ifnull(phone,’_')) as char), cast(group_concat(ifnull(mobile,’_')) as char) … from clients_table …

outlook – large email sending several times

Posted in sysadmin on December 12th, 2006 by gmunteanu

some of my clients were having trouble sending large attachements throush my postfix server.
It went like this: when the file attached was bigger than 1 MB, the receiver got the mail around 8 times.
First i thought it is the person who sent it, being tired of waiting [they were on a slow connection to the internet] it clicked the send button several times.
it wasn’t the case.
then i thought , maybe there is a microsoft thing that didn’t comply with the standards. No it wasn’t.
i disabled their anti-virus software, check my postfix configuration, analysed the logs. nothing.
then, i saw a setting in the outlook called "server timeout". Bingo. that did it.
it was set to 1 minute and a half, and i increased it to 8 minutes.
no duplicate sent emails any more.

another client satisfied. Next! :)

mysql-jdbc insert optimisation

Posted in java on November 21st, 2006 by gmunteanu

Today, at work, we faced insert and delete problems in our mysql database.
usually we have bulks of inserts/updates/deletes from a backend engine, and selects from the website.
the inserts/updates/deletes bulks are usually around 100-500, but at peak times it goes up to 40 000.
the application that does it is a java application and it does it with PreparedStatement and batch.
When we first did this, we thought it would be enough, and no further optimisation could occur.
To my surprise, there is more to it. instead of 5000 "insert into table values (?,?,?,?,?)", this proves to be 5 times faster, in our case:
"insert into table values(?,?,?,?,?),(?,?,?,?,?),(?,?,?,?,?),….5000 times..,(?,?,?,?,?)".
there is some time wasted in building the string, but it is much better this way, as the load on our servers is on the mysql engine, the rest of the applications are low.
you can find the source code of a java class that tests the difference between the 2 approaches, so you can test it at home.
the times are: ~ 1235 ms for the thousand of inserts in batch and 211 ms for the one big insert with 5000 values() pairs.
the stringbuilding of the sql takes ~ 300 ms which is good for our purposes.

1. note: for deleting, always use delete from table where id IN (?,?,?…..?)

2. for the source code file, i did copy from eclipse and paste in vim on the web-server. the code got autoindenting and from 1 comment it commented all the way down. here is the trick that saves you: :set paste before pasting and :set nopaste after.

java GPL – my ideal ./configure line

Posted in java on November 20th, 2006 by gmunteanu

for my swing applications:
./configure –without-corba –without-rmi –without-imageio –without-management –without-naming
and in java6, –without-derby
also: –enable-opengl and all those desktop and swing optimisations availbale.

but i figure i have to wait before i can do this.

starting work as a sysadmin

Posted in sysadmin on November 16th, 2006 by gmunteanu

I just started my own hosting business, and after buying a server [-dedicated :)], I began configuring my environment.
The hosting company provided me with a 3.1rc3 debian 64 bit version – minimal install – on an AMD dual core 64 bit with 2GB of RAM.
That was BIG, for I was coming from a VPS 200MHz CPU, 200 MB RAM.
So I started to prepare the install phase of all servers I needed.
here is my list:
bind – for nameserver
cyrus-sasl
postfix
postgrey
dovecot
apache-httpd
php
eaccelerator
lighttpd
mysql
java
jetty
openssl
vsftpd
All sources, all compiled as I like – maybe i will tell the arguments passed to the ./configure command line in another post.
Then was the environment.
.bash_history 10000 lines -> HISTSIZE=10000
HISTIGNORE=’\&:fg:bg:ls:pwd:cd ..:cd ~:ls -al:cd:ps -A:mc:ls -l:%1:%2:top:exit:halt*’ , this is in order to save only the important lines
Ctrl-R then would come handy to search for those precious lines.
VISUAL=/usr/bin/vim needed for crontab, and other tools that need an external editor
vim is my editor. VIM rules.

mysql analyze and optimize from bash

Posted in Bash on November 14th, 2006 by gmunteanu

You want to optimize your mysql databases on a regular bases [say every night].
this is the script that does it. just put it in the crontab and you will be fine:
You could say you could do it in php, but there are machines dedicated for mysql, so why put php on them when you have bash?
I assure you, in php there are more lines than in this small script.

#!/bin/sh
for table in $(mysql -u MysqlUser -pMysqlPassword -D DBName -e 'show tables' | awk '!/Tables/ {print $1}'); do
mysql -s -s -u MysqlUser -pMysqlPassword -D DBName -e "analyze table $table"
mysql -s -s -u MysqlUser -pMysqlPassword -D DBName -e "optimize table $table"
echo " $table -> optimized"
done

explanations:
awk '!/Tables/ {print $1}' i want the header produced by mysql to not show, so i take Tables_in_DBName out of the output
mysql -s -s -> suppressing some formatting from mysql output

Now, your mysql will run faster and using the indexes you put in place.

eliminating bad emails from a file

Posted in Bash on November 13th, 2006 by gmunteanu

You have a file, with an email on each line.
this is the newsletter database for a client.
the emails have been added in this file from a form on the client’s website.
Then he starts to send email to all his users.
20% of the email returned with serious problems. and you have to eliminate those addresses from the file.
So you end up with 2 files one with all emails, and another one with bad ones. One has 1000 lines, the other 200.
You want to automate it, and not to do it by hand. you can’t use diff, as the email addresses are not in the same order.
this script comes to rescue:

#!/bin/sh
declare -a emailsToBeDeleted
declare -a allEmails
emailsToBeDeleted=( `cat $1`)
allEmails=( `cat $2`)
declare -a idxDelete
let idx=0
for toBeDeleted in "${emailsToBeDeleted[@]}"; do
let ctx=0
for email in "${allEmails[@]}"; do
if [ "$email" == "$toBeDeleted" ]
then
idxDelete[$idx]=$ctx
echo "deleting... $ctx $email $toBeDeleted "
idx=$((idx+1))
fi
ctx=$((ctx+1))
done
done
for emailIdx in ${idxDelete[@]}; do
unset allEmails[$emailIdx]
done
for email in ${allEmails[@]}; do
echo $email;
done

$1 represents the argument for the file with all the emails, $2 the file with the email to be deleted.
usage: stripmails.sh allemails.dat bademails.dat > goodemails.dat