New Technologies in Web Development: March 2009

Saturday, March 28, 2009

Speed Up Your Javascript Load Time

Javascript is becoming increasingly popular on websites, from loading dynamic data via AJAX to adding special
effects to your page.

Unfortunately, these features come at a price: you must often rely on heavy Javascript libraries that can add dozens
or even hundreds of kilobytes to your page.

Users hate waiting, so here are a few techniques you can use to trim down your sites.

Find The Flab

Like any optimization technique, it helps to measure and figure out what parts are taking the longest. You might find
that your images and HTML outweigh your scripts. Here's a few ways to investigate:

1. The Firefox web-developer toolbar lets you see a breakdown of file sizes for a page (Right Click > Web
Developer > Information > View Document Size). Look at the breakdown and see what is eating the majority if
your bandwidth, and which files:

2. The Firebug Plugin also shows a breakdown of files - just go to the "Net" tab. You can also filter by file type:

3. OctaGate SiteTimer gives a clean, online chart of how long each file takes to download:

Disgusted by the bloat? Decided your javascript needs to go? Let's do it.

Compress Your Javascript

First, you can try to make the javascript file smaller itself. There are lots of utilities to "crunch" your files by
removing whitespace and comments.

You can do this, but these tools can be finnicky and may make unwanted changes if your code isn't formatted
properly. Here's what you can do:

1. Run JSLint (online or downloadable version) to analyze your code and make sure it is well-formatted.

2. Use Rhino to compress your javascript. There are some online packers, but Rhino actually analyzes your source
code so it has a low chance of changing it as it compresses, and it is scriptable.

Install Rhino (it requires Java), then run it from the command-line:

java -jar custom_rhino.jar -c myfile.js > myfile.js.packed 2>&1

This compresses myfile.js and spits it out into myfile.js.packed. Rhino will remove spaces, comments and shorten
variable names where appropriate. The "2>&1″ part means "redirect standard error to the same location as the
output", so you'll see any error messages inside the packed file itself (cool, eh? Learn more here.).

Using Rhino, I pack the original javascript and deploy the packed version to my website.

Debugging Compressed Javascript

Debugging compressed Javascript can be really difficult. I suggest creating a "debug" version of your page that
references the original files. Once you test it and get the page working, pack it, test the packed version, and then
deploy.

If you have a unit testing framework like jsunit, it shouldn't be hard to test the packed version.

Eliminating Tedium

Because typing these commands over and over can be tedious, you'll probably want to create a script to run the
packing commands. This .bat file will compress every .js file and create .js.packed:

compress_js.bat:

for /F %%F in ('dir /b *.js') do java -jar custom_rhino.jar -c %%F > %%F.packed 2>&1

Of course, you can use a better language like perl or bash to make this suit your needs.

Optimize Javascript Placement

Place your javascript at the end of your HTML file if possible. Notice how Google analytics and other stat
tracking software wants to be right before the closing </body> tag.

This allows the majority of page content (like images, tables, text) to be loaded and rendered first. The user sees
content loading, so the page looks responsive. At this point, the heavy javascripts can begin loading near the end.

I used to have all my javascript crammed into the <head> section, but this was unnecessary. Only core files that
are absolutely needed in the beginning of the page load should be there. The rest, like cool menu effects,
transitions, etc. can be loaded later. You want the page to appear responsive (i.e., something is loading) up front.

Load Javascript On-Demand

An AJAX pattern is to load javascript dynamically, or when the user runs a feature that requires your script.
You can load an arbitrary javascript file from any domain using the following import function:

function $import(src){

  var scriptElem = document.createElement('script');

  scriptElem.setAttribute('src',src);

  scriptElem.setAttribute('type','text/javascript');

  document.getElementsByTagName('head')[0].appendChild(scriptElem);

// import with a random query parameter to avoid caching

function $importNoCache(src){

  var ms = new Date().getTime().toString();

  var seed = "?" + ms;

  $import(src + seed);

The function $import('http://example.com/myfile.js') will add an element to the head of your document, just like
including the file directly. The $importNoCache version adds a timestamp to the request to force your browser to
get a new copy.

To test whether a file has fully loaded, you can do something like

if (myfunction){

  // loaded

else{ // not loaded yet

  $import('http://www.example.com/myfile.js');

There is an AJAX version as well but I prefer this one because it is simpler and works for files in any domain.

Delay Your Javascript

Rather than loading your javascript on-demand (which can cause a gap), load your script in the background,
after a delay. Use something like

var delay = 5;

setTimeout("loadExtraFiles();", delay * 1000);

This will call loadExtraFiles() after 5 seconds, which should load the files you need (using $import). You can even
have a function at the end of these imported files that does whatever initialization is needed (or calls an existing
function to do the initialization).

The benefit of this is that you still get a fast initial page load, and users don't have a pause when they want to use
advanced features.

In the case of InstaCalc, there are heavy charting libraries that aren't used that often. I'm currently testing a method
to delay chart loading by a few seconds while the core functionality remains available from the beginning. You may
need to refactor your code to deal with delayed loading of components. Some ideas are to use SetTimeout to poll
the loading status periodically, or having a function called at the end of your included script to tell the main program
the script has been loaded.

Cache Your Files

Another approach is to explicitly set the browser's cache expiration. In order to do this, you'll need access to PHP
so you can send back certain headers.

Rename myfile.js to myfile.js.php and add the following lines to the top:

<?php

               header("Content-type: text/javascript; charset: UTF-8");

               header("Cache-Control: must-revalidate");

               $offset = 60 * 60 * 24 * 3;

               $ExpStr = "Expires: " .

               gmdate("D, d M Y H:i:s",

               time() + $offset) . " GMT";

               header($ExpStr);

?>

In this case, the cache will expire in (60 * 60 * 24 * 3) seconds or 3 days. Be careful with using this for your own
files, especially if they are under development. I'd suggest caching library files that you won't change often.

If you accidentally cache something for too long, you can use the $importNoCache trick to add a datestamp like
"myfile.js?123456″ to your request (which is ignored). Because the filename is different, the browser will request a
new version.

Setting the browser cache doesn't speed up the initial download, but can help if your site references the same files
on multiple pages, or for repeat visitors.

Combine Your Files

A great method I initially forgot is merging several javascript files into one. Your browser can only have so many
connections to a website open at a time — given the overhead to set up each connection, it makes sense to
combine several small scripts into a larger one.

But you don't have to combine files manually! Use a script to merge the files — check out part 2 for an example
script to do this. Giant files are difficult to edit - it's nice to break your library into smaller components that can be
combined later, just like you break up a C program into smaller modules.

Should I Gzip It?

Probably not. Although some browsers can accept compressed javascript (myfile.js.gz) or files returned with the
"gzip" encoding header, this behavior is not consistent between browsers and can be problematic.

If you're an expert, feel free to experiment, but for the majority of us I don't think it's worth the effort or potential
headache.

All done? Keep learning.

Once you've performed the techniques above, recheck your page size using the tools above to see the
before-and-after difference.

I'm not an expert on these methods — I'm learning as I go. Here are some additional references to dive in deeper:

Platform as a Service -- Project Caroline (Sun Microsystems)

Project Caroline Overview

Project Caroline is developing a horizontally scalable platform for the development and deployment of Internet services. The initial design center is to make the platform available as a utility, where a pool of virtualized resources is shared between many customers each with one or more services. Through the utility model services can dynamically flex their usage of the platform's distributed resources up and down, matching usage to observed load. The horizontal scalability of the platform allows for the efficient delivery of resources and supports a pay-for-use (verses pay-for-capacity) billing model. Customers of the utility are isolated from each other and mechanisms are provided for the isolation of services.

The primary resource provided by Project Caroline is a set of horizontally scaled machines for running customer processes. Customers specify for each process the program to run, what networked file systems it should have access to, and IP addresses it should be reachable at. The platform takes care of the details of finding a machine for the process to run on, configuring the machine, network, and Internet connectivity. Operating system-level virtualization is used to isolate processes sharing the same physical machine while keeping per-process overhead low. Customer programs are expressed in languages like Java byte code, perl, and python that provide OS and instruction set independence. Other resources include IP sub-nets, network file systems, databases, external IP addresses, L4 and L7 load balancers, and DNS bindings. Applications can allocate, configure, and release these resources using the platform API. Through the platform API, applications can acquire and release resources in seconds.

In addition to the platform API, various tools and components have been layered on top of the platform API, including: cash, a Ruby-derived interactive shell; a standalone GUI and a NetBeans plug-in for direct manipulation of Project Caroline resources; a set of (J)Ruby centric tools for using Project Caroline; Apache Ant tasks for easily automating the creation and management of Project Caroline resources; and macro-components such as the Project Caroline Web Server (based on Project GlassFish v3) which automates setup and management of a horizontally-scaled web tier.

Oracle-like ROWNUM in MySQL

It needs sometimes to exactly mimic Oracle’s ROWNUM where is no possibility to initiate a counter in previous statement by SET @rownum:=0;.

It is still possible in a single SQL.

SELECT @rownum:=@rownum+1 rownum, t.*FROM (SELECT @rownum:=0) r, mytable t;

Tuesday, March 24, 2009

Hibernate LazyInitializer error in tomcat

	java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; caused by: org.apache.jasper.JasperException: java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; caused by: javax.servlet.ServletException: java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; caused by: java.lang.NoSuchMethodError: java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; After using the web client for a few minutes.
	I keep getting this error: javax.faces.FacesException: java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; caused by: org.apache.jasper.JasperException: java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; caused by: javax.servlet.ServletException: java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; caused by: java.lang.NoSuchMethodError: java.lang.Object.getHibernateLazyInitializer()Lorg/hibernate/proxy/LazyInitializer; After using the web client for a few minutes.

This exception is usually due to the JVM not being run with the -server option. The startup.bat/sh scripts set up the appropriate JAVA_OPTS.

Thursday, March 12, 2009

The Art of Estimating a Project Plan (Agile Model)

Estimation of features is a critical part of the software development process, especially if you are using an Agile methodology. The ability to accurately estimate features is an often overlooked skill that every developer should have.

As a project engineer, responsible for making sure that we set realistic project timelines and that we actually carry those out. Managing expectations is one of main responsibilities. This is where Agile is so nice — as long as you do it right.

When we start on a new project, usually one of the first things that happens is meeting with the client or whoever is sponsoring the project.

In these initial meetings, at some point we will probably be asked one of two things:

1) how long will it take to accomplish X?

2) how much can we get done before X (some date)?

Create a Virtual Directory in IIS 7.0

A virtual directory is a directory name, used in an address, which corresponds to a physical directory on the server. You can add a virtual directory that will include directory content in a site or an application without having to move the content physically into that site or application directory.

To add a virtual directory

You can perform this procedure by using the user interface (UI), by running Appcmd.exe commands in a command-line window, by editing configuration files directly, or by writing WMI scripts.
User Interface
To use the UI

Open IIS Manager. For information about opening IIS Manager, see IIS 7.0: Open IIS Manager.
2.

In the Connections pane, expand the Sites node in the tree, and then click to select the site in which you want to create a virtual directory.
3.

In the Actions pane, click View Virtual Directories.
4.

On the Virtual Directories page, in the Actions pane, click Add Virtual Directory.
5.

In the Add Virtual Directory dialog box, type a name in the Alias box. This alias is used to access the content from a URL.
6.

In the Physical path box, type the physical path of the content folder, or click Browse to navigate through the file system to find the folder.
7.

Optionally, click Connect as to specify credentials that have permission to access the physical path. If you do not use specific credentials, select the Application user (pass-thru authentication) option in the Connect As dialog box.
8.

Optionally, click Test Settings to verify the settings that you specified for the virtual directory.
9.

Click OK.

Command Line

To add a virtual directory to the root application in a site, use the following syntax:

appcmd add vdir /app.name:string/ /path:string /physicalPath:string

The variable app.namestring is the site name and the / following the name specifies that the virtual directory should be added to the root application of the site. The variable pathstring is the virtual path of the virtual directory, such as /photos, and physicalPathstring is the physical path of the virtual directory’s content in the file system.

For example, to add a virtual directory named photos with a physical location of c:\images to the root application in a site named contoso, type the following at the command prompt, and then press ENTER:

appcmd add vdir /app.name:contoso/ /path:/photos /physicalPath:c:\images

To add a virtual directory to an application other than a site’s root application, type the name of the application after / in app.namestring. For example, to add the same virtual directory used in the previous example to an application named marketing in a site named contoso, type the following at the command prompt, and then press ENTER.

Wednesday, March 11, 2009

Setup and configuration for New Blogger Tag Cloud / Label Cloud

Setup and configuration for New Blogger Tag Cloud / Label Cloud

Comments and Problems can be reported at the home post for this at
Code for New Blogger Tag Cloud / Label Cloud

Here is the code and setup information to use the Label Cloud in New Blogger.
First you obviously have to have a blog on New Blogger, and you MUST be using the layouts templates,
(this isn't available for classic templates, or FTP published blogs ) and you must have some posts labeled already. (There needs to be at least ONE label with more than ONE entry or the scripts hit a bug - so have at least one label with more than one entry before starting).

Make sure you backup your template before making any changes!

Log into Blogger and go to your layouts section. On the 'Page Elements' setup page
make sure you have a label widget already installed where you want it (it can be moved around
later). Then go to the Edit HTML settings and leave the widgets NOT exapanded. It will make
things easier to deal with.

Now the code comes in 3 parts. A section for the stylesheet, a configurations section,
and then the actual widget itself.

The first part to put in is the stylesheet section. The following code needs to be copied
and inserted into your stylesheet, which in the layouts is marked out by the <b:skin> tags.
Easiest thing to do is find the closing skin tag

]]></b:skin>

and place the code right BEFORE that.
Here it is, copy and paste without modification right now. I'll explain what can be tweaked
later.

/* Label Cloud Styles
----------------------------------------------- */
#labelCloud {text-align:center;font-family:arial,sans-serif;}
#labelCloud .label-cloud li{display:inline;background-image:none !important;padding:0 5px;margin:0;vertical-align:baseline !important;border:0 !important;}
#labelCloud ul{list-style-type:none;margin:0 auto;padding:0;}
#labelCloud a img{border:0;display:inline;margin:0 0 0 3px;padding:0}
#labelCloud a{text-decoration:none}
#labelCloud a:hover{text-decoration:underline}
#labelCloud li a{}
#labelCloud .label-cloud {}
#labelCloud .label-count {padding-left:0.2em;font-size:9px;color:#000}
#labelCloud .label-cloud li:before{content:"" !important}

This next section is the configuration section for the Cloud. It also goes in the head
of the template, but outside of the stylesheet part. Easiest thing to do again is to find
the closing stylesheet tag

]]></b:skin>

But this time place the code right AFTER that line, but BEFORE the </head> tag. Here it is.

All of these settings can be changed but I'll explain them in a moment. The defaults will work for now.

Now the widget itself. Scroll down and find the label widget in your sidebar. It should look
something like this.

<b:widget id='Label1' locked='false' title='Labels' type='Label'/>

Copy the following code (from beginning widget tag to ending widget tag) and replace
the line above with it.

<b:widget id='Label1' locked='false' title='Label Cloud' type='Label'>
<b:includable id='main'>
<b:if cond='data:title'>
    <h2><data:title/></h2>
</b:if>

<div class='widget-content'>
<div id='labelCloud'/>
<script type='text/javascript'>

// Don't change anything past this point -----------------
// Cloud function s() ripped from del.icio.us
function s(a,b,i,x){
      if(a>b){
          var m=(a-b)/Math.log(x),v=a-Math.floor(Math.log(i)*m)
             }
      else{
          var m=(b-a)/Math.log(x),v=Math.floor(Math.log(i)*m+a)
          }
      return v
   }

var c=[];
var labelCount = new Array();
var ts = new Object;
<b:loop values='data:labels' var='label'>
var theName = "<data:label.name/>";
ts[theName] = <data:label.count/>;
</b:loop>

for (t in ts){
     if (!labelCount[ts[t]]){
           labelCount[ts[t]] = new Array(ts[t])
           }
        }
var ta=cloudMin-1;
tz = labelCount.length - cloudMin;
lc2 = document.getElementById('labelCloud');
ul = document.createElement('ul');
ul.className = 'label-cloud';
for(var t in ts){
    if(ts[t] < cloudMin){
       continue;
       }
    for (var i=0;3 > i;i++) {
             c[i]=s(minColor[i],maxColor[i],ts[t]-ta,tz)
              }
         var fs = s(minFontSize,maxFontSize,ts[t]-ta,tz);
         li = document.createElement('li');
         li.style.fontSize = fs+'px';
         li.style.lineHeight = '1';
         a = document.createElement('a');
         a.title = ts[t]+' Posts in '+t;
         a.style.color = 'rgb('+c[0]+','+c[1]+','+c[2]+')';
         a.href = '/search/label/'+encodeURIComponent(t);
         if (lcShowCount){
             span = document.createElement('span');
             span.innerHTML = '('+ts[t]+') ';
             span.className = 'label-count';
             a.appendChild(document.createTextNode(t));
             li.appendChild(a);
             li.appendChild(span);
             }
          else {
             a.appendChild(document.createTextNode(t));
             li.appendChild(a);
             }
         ul.appendChild(li);
         abnk = document.createTextNode(' ');
         ul.appendChild(abnk);
    }
lc2.appendChild(ul);
</script>

<noscript>
    <ul>
    <b:loop values='data:labels' var='label'>
      <li>
        <b:if cond='data:blog.url == data:label.url'>
          <data:label.name/>
        <b:else/>
          <a expr:href='data:label.url'><data:label.name/></a>
        </b:if>
        (<data:label.count/>)
      </li>
    </b:loop>
    </ul>
</noscript>
    <b:include name='quickedit'/>
</div>

</b:includable>
</b:widget>

Now if all has gone well, and you have posts already labeled, then if you preview the
blog you should see some form of the Cloud appearing. If it doesn't appear, then something
went wrong. You should probably back out and try it again from the start.

Update : I've found 2 things to check for first if the label cloud isn't showing. First make sure that at least one of your labels has more than one entry. A bug in the script causes it to fail when all the labels have only one entry.(As soon as any label has more than one entry, then it should be ok from then on) Also, make sure that none of your labels contain quote marks " . Apostrophes or single ticks ' are ok. ------

Most likely the cloud with it's default settings won't be what you ultimately want. But all
the colors and sizes are configurable to match your tastes. If the cloud is appearing in preview
then you can go about changing some of the variables so they suit.

The settings in the Variables section will be where you make most of your adjustments. Here I'll
explain what each setting does.

var cloudMin= 1;

This setting you can use to limit the number of labels shown (for example if you have a lot of labels). Leave the setting at 1 to show ALL labels. If you enter in a higher number, then only labels that have at least that number of entries will appear in the cloud.

var maxFontSize = 20;
var maxColor = [0,0,255];
var minFontSize = 10;
var minColor = [0,0,0];
var lcShowCount = false;

The lines for
    maxFontSize
    maxColor
do what you may think they do. The first one sets the size (in pixels) of the label with the
most amount entries. The maxColor sets the color of that entry (in RGB format). Similiar with
the next two
   minFontSize
   minColor
Just these are for the label with the least amount of entries. Again the size is in pixels,
the color is in RGB format. Any labels between the two will get their color/sizes based on
how many labels they are, and where their entry count falls, giving the much desired cloud
effect.

From my experimenting, there are many factors that make up a pleasant looking cloud. From
color/size choice, to the number of actual labels, to how well dispersed the entries are amoung
the labels. 3 Labels don't make a good cloud as there isn't much to work with. You just have
to experiment around to see what looks good with your setup.

IMPORTANT, when change the color settings, Keep them in the format supplied. In between the [] and
the numbers separated by commas. The default colors are BLUE for the max and BLACK for the min.
You can select any valid RGB color combination. If you don't know what RGB colors are, don't
worry. It's just a way of defining a color. You can use many charts on the Internet to
get the correct RGB value for the color you want to try. Here's one that is fairly good.

RGB Color Code Chart

Remember, if you get the 3 sets of numbers to enter them in correctly. Inside the [ ] separated by
commas.

Also experiment with different font sizes. Again it depends on how many entries, how dispersed
they are, and how much room for the cloud is available as to what looks good.

The last variable there is

lcShowCount

This can either be false (default) or true. All this does is turn off/on the post count displayed
next to the label. Usually in a 'traditional' cloud the count isn't used. But if you go to a
'flat' listing then it's sometimes useful to turn it on.

Now to the CSS section. Most people won't need to tweak these much, and it's not necessary to
understand what all those entries are for. Most are just to make sure that other styling
elements from the rest of your page don't inherit in and ruin the cloud. But there are a few
that you may want to change to suit.
The first line

     #labelCloud {text-align:center;font-family:arial,sans-serif;}

You could change the fonts used in the cloud here if you wanted.
Also, the text-align statement can also be changed. I have it set to center by default but you
could use

text-align:justify;
text-align:right;
text-align:left;

If those suit better.

The next line

     #labelCloud .label-cloud li{display:inline;background-image:none !important;padding:0 5px;margin:0;vertical-align:baseline !important;border:0 !important;}

Well don't worry about most of it unless you are a hardcore CSS'er. The only one of real
importance is the first entry

display:inline;

You can change that to

display:block;

To get the 'Flat' (each entry on it's own separate line) listing of the weighted entries.
Usually if that is set to block you would probably want to change the sort frequency from
alphabetical to frequency. You do that by editing the widget from the Page Elements tab in
Blogger.

And the last bit I'll mention is the line

     #labelCloud .label-count

If you set the lcShowCount variable to true to show the post counts, you could change the
color/size of those numbered entries with that line.

Reference : http://phy3blog.googlepages.com/Beta-Blogger-Label-Cloud.html

Stored Procedures are EVIL

By Tony Marston

3rd September 2006

A lot of developers are taught to use database stored procedures, triggers and database constraints at every possible opportunity, and they cannot understand why an old dinosaur like me should choose to take an opposite view. The reason can be summed up quite simply:

You only know what you have been taught, whereas I know what I have learned.

I was weaned on file systems and databases which did not have any facilities for stored procedures and triggers, so I learned how to build applications without them. When such facilities became available my colleagues and I still never used them for practical reasons:

It meant learning a new language, and we didn't have the time.
It meant taking longer to implement and maintain, therefore cost more to develop. This is an important consideration for a software house which can only win business by providing cost-effective solutions.
There was no advantage in doing so, so why bother?

Our golden rule was:

Use stored procedures and triggers only when it is an absolutely necessity.

This is in total conflict with the attitude of today's wet-behind-the-ears tenderfoot greenhorn who seems to think:

Use stored procedures and triggers at every possible opportunity simply because you can.

Amongst the arguments in favour of stored procedures are:

Stored procedures are not as brittle as dynamic SQL

Some people argue that putting ad-hoc SQL in your business layer (BL) code is not that good. Agreed, but who said that the only alternative is stored procedures? Why not have a DAL that generates the SQL query at runtime based on information passed to it by the BL? It is correct to say that small changes to the database can have severe impacts on the application. However, changes to a relational model will always have an impact on the application that targets that model: add a non-nullable column to a table and you will see what I mean. You can use stored procedures or ad-hoc queries, you have to change the calling code to make sure that column gets a value when a new row is inserted. For Ad-hoc queries, you change the query, and you're set. For stored procedures, you have to change the signature of the stored procedure, since the INSERT/UPDATE procs have to receive a value for the new column. This can break other code targeting the stored procedure as well, which is a severe maintenance issue. A component which generates the SQL on the fly at runtime doesn't suffer from this: it will for example receive an entity which has to be saved to the database, that entity contains the new field, the SQL is generated and the entity is saved. No maintenance problems. With a stored procedure this wouldn't be possible.

Stored procedures are more secure

This is a common argument that many people echo without realising that it became defunct when role-based security was made available. A good DBA defines user-roles in the database, and users are added to those roles and rights are defined per role, not per user. This way, it is easy to control which users can insert / update and which users can for example select or delete or have access to views in an easy way.

With a view it is possible to control which data is accessed on a column basis or row basis. This means that if you want user U to select only 2 or so columns from a table, you can give that user access to a view, not the underlying table. The same goes for rows in one or more tables. Create a view which shows those rows, filtering out others. Give access rights to the view, not the table, obviously using user-roles. This way you can limit access to sensitive data without having to compromise your programming model because you have to move to stored procedures.

It is also said that stored procedures are more secure because they prevent SQL injection attacks. This argument is false for the simple reason that it is possible to have a stored procedure which concatenates strings together and therefore open itself up to sql injection attacks (generally seen in systems which use procedures and have to offer some sort of general search routine), while the use of parameterized queries removes this vulnerability as no value can end up as being part of the actually query string.

Stored procedures are more efficient

The execution of SQL statements in stored procedures may have been faster than with dynamic SQL in the early days of database systems, but that advantage has all but disappeared in the current versions. In some cases a stored procedure may even be slower than dynamic SQL, so this argument is as dead as a Dodo.

Performance should not be the first question. My belief is that most of the time you should focus on writing maintainable code. Then use a profiler to identify hot spots and then replace only those hot spots with faster but less clear code. The main reason to do this is because in most systems only a very small proportion of the code is actually performance critical, and it's much easier to improve the performance of well factored maintainable code.

While stored procedures may run faster, they take longer to build, test, debug and maintain, therefore this extra speed comes at a price. If the same function can be performed inside application code at an acceptable speed, what is the advantage of spending more money to make it run faster at a more-than-acceptable speed? It is OK to use stored procedures when you absolutely need a performance gain, but until then they're nothing but premature optimization.

The company has paid for them, so why not use them?

A similar argument is that by not using what the company has paid for, you are effectively wasting the company's money. I'm sorry, but using something because it's there is just not good enough. If I can achieve something inside my application with application code, then I must be given a very good reason to move it out of my application and into the database. Believe it or not there are costs involved in moving logic from one place to another, and those costs must be offset by measurable benefits.

Application code or database code - it's still code, isn't it?

No it's not. Application code is built using a programming language whereas SQL is nothing more than a data manipulation language, and is therefore very limited in its scope. There is absolutely nothing that can be done in a stored procedure that cannot also be done in application code, but the converse is not true.

Amongst the arguments against stored procedures are:

It mangles the 3 Tier structure

Instead of having a structure which separates concerns in a tried and trusted way - GUI, business logic and storage - you now have logic intermingling with storage, and logic on multiple tiers within the architecture. This causes potential headaches down the road if that logic has to change.

Stored procedures are a maintenance problem

The reason for this is that stored procedures form an API by themselves. Changing an API is not that good, it will break a lot of code in some situations. Adding new functionality or new procedures is the "best" way to extend an existing API. A set of stored procedures is no different. This means that when a table changes, or behaviour of a stored procedure changes and it requires a new parameter, a new stored procedure has to be added. This might sound like a minor problem but it isn't, especially when your system is already large and has run for some time. Every system developed runs the risk of becoming a legacy system that has to be maintained for several years. This takes a lot of time, because the communication between the developer(s) who maintain/write the stored procedures and the developer(s) who write the DAL/BL code has to be intense: a new stored procedure will be saved fine, however it will not be called correctly until the DAL code is altered. When you have Dynamic SQL in your BL at your hands, it's not a problem. You change the code there, create a different filter, whatever you like and whatever fits the functionality to implement.

Microsoft also believes stored procedures are over: it's next generation business framework MBF is based on Objectspaces, which generates SQL on the fly.

Stored procedures take longer to test

Business logic in stored procedures is more work to test than the corresponding logic in the application. Referential integrity will often force you to setup a lot of other data just to be able to insert the data you need for a test (unless you're working in a legacy database without any foreign key constraints). Stored procedures are inherently procedural in nature, and hence harder to create isolated tests and prone to code duplication. Another consideration, and this matters a great deal in a sizable application, is that any automated test that hits the database is slower than a test that runs inside of the application. Slow tests lead to longer feedback cycles.

BL in stored procedures does not scale

If all the business logic is held in the database instead of the application then the database becomes the bottleneck. Once the load starts increasing the performance starts dropping. With business logic in the application it is easy to scale up simply by adding another processor or two, but that option is not readily available if all that logic is held in the database.

If you have a system with 100's of distributed databases it is far more difficult to keep all those stored procedures and triggers synchronized than it is to keep the application code synchronized.

Stored procedures are not customisable

This is a big issue if you want an application where the customer can insert their own business logic, or where different logic is required by different customers. Achieving this with application code is a piece of cake, but with database logic it is a can of worms.

Database triggers are hidden from the application

A big problem with database triggers is that the application does not know that they exist, therefore does not know whether they have run or not. This became a serious issue in one application (not written by me) which I was maintaining. A new DBA who was not aware of the existence of all these triggers did something which deactivated every trigger on the main database. The triggers were still there, they had not been deleted, but they had been turned off so did not fire and do what they were supposed to do. This mistake took several hours to spot and several days to fix.

Version Control

It is easy to control all changes to application code by running it through a proper version control system, but those facilities do not exist for stored procedures and triggers. How much damage could be caused if a stored procedure were to get out of sync with the application code? How easy is it to check that the application is running with the correct versions? How much more difficult would it be if the application you were supporting was running on a remote site with nothing more than a dial-up connection?

This is a reason why some teams avoid stored procedures like the plague - it eliminates an area of potentially disastrous screw-ups.

Vendor lock-in

You may think that this is not a problem if you build and maintain the applications for a single company where a change in database vendor is highly unlikely, but what happens should the company decide that their DBMS is no longer flavour of the month and they want to change to a different DBMS? This may be due to several factors, such as spiraling costs or poor performance, but when it happens you will find that a lot of code will have to be rewritten. Porting the data will be one exercise, but porting the stored procedures and triggers will be something else entirely. Now, if all that logic were held inside the application, how much simpler would it be?

Believe it or not there are people out there who write applications which are database-independent for the simple reason that the applications may be used by many different companies, and those many companies may not all use the same DBMS. Those that do use the same DBMS may not be using the same version, and stored procedures written for one version may not be compatible with another.

As far as I am concerned the use of stored procedures, database triggers and foreign key restraints is OPTIONAL, not MANDATORY, therefore I am free to exercise my option not to use them. That is my choice, and the software that I produce does not suffer in any way, therefore it cannot be defined as the wrong choice.

The web application framework that I have built using PHP does not use stored procedures, database triggers or foreign key constraints, yet it does not suffer from any lack of functionality. This is possible simply because I can do everything I want inside my application where it is instantly accessible and customisable. To those of you who instantly jump to the (wrong) conclusion that this must mean that I have to write a huge amount of duplicated SQL statements my answer is simple - I don't write any SQL statements at all, they are all generated dynamically at runtime. This is all due to the framework being built using the 3 Tier Architecture which has a clear separation of concerns:

There is a separate object in the Business Layer for each database table. This is where all business rules are applied as data passes from the Presentation Layer (UI), through the Business Layer to the Data Access Layer, and back again. The Business Layer does not have any direct communication with the database - this is all handled by the Data Access Layer.
There is a single object in the Data Access Layer known as the Data Access Object (DAO). The DAO receives a request from the Business Layer and dynamically constructs and executes the SQL query string to satisfy that request. This implementation means that I can easily switch to another DBMS simply by switching to another DAO, and without having to change a single line of code in any Business Layer object.
Referential integrity is also handled by standard code within the framework and requires no additional coding from any developer whatsoever. It uses information which is exported from the Data Dictionary which tells it what to do with every relationship, and the standard code in the framework simply performs the relevant processing. The advantage of this approach is that it is easy to amend or even turn off any of these rules at runtime, which makes the application infinitely more flexible.
All changes made to the database can be logged without using a single database trigger. How? By adding extra code into the DAO to write all relevant details out to the AUDIT database. This functionality is totally transparent to all the objects in the Business Layer, and they do not need any extra code to make it work.

References

Tuesday, March 3, 2009

What's New in MySQL 5.1

What’s New in MySQL 5.1
Partitioning, Event scheduler are two big features that are introduced in this version.
and a major of sql optimization has been done.
MySQL 5.1 performs much better in all tests: it can handle more transactions per second and it does not deadlock with 256 threads, unlike 5.0.

The following features have been added to MySQL 5.1.

Partitioning. This capability enables distributing portions of individual tables across a file system, according to rules which can be set when the table is created. In effect, different portions of a table are stored as separate tables in different locations, but from the user point of view, the partitioned table is still a single table. Syntactically, this implements a number of new extensions to the CREATE TABLE, ALTER TABLE, and EXPLAIN … SELECT statements. As of MySQL 5.1.6, queries against partitioned tables can take advantage of partition pruning. In some cases, this can result in query execution that is an order of magnitude faster than the same query against a non-partitioned version of the same table. See Chapter 18, Partitioning, for further information on this functionality. (Author: Mikael Ronström)
*

Row-based replication. Replication capabilities in MySQL originally were based on propagation of SQL statements from master to slave. This is called statement-based replication. As of MySQL 5.1.5, another basis for replication is available. This is called row-based replication. Instead of sending SQL statements to the slave, the master writes events to its binary log that indicate how individual table rows are effected. As of MySQL 5.1.8, a third option is available: mixed. This will use statement-based replication by default, and only switch to row-based replication in particular cases. See Section 16.1.2, “Replication Formats”. (Authors: Lars Thalmann, Guilhem Bichot, Mats Kindahl)
*

Plugin API. MySQL 5.1 adds support for a very flexible plugin API that enables loading and unloading of various components at runtime, without restarting the server. Although the work on this is not finished yet, plugin full-text parsers are a first step in this direction. This allows users to implement their own input filter on the indexed text, enabling full-text search capability on arbitrary data such as PDF files or other document formats. A pre-parser full-text plugin performs the actual parsing and extraction of the text and hands it over to the built-in MySQL full-text search. See Section 22.2, “The MySQL Plugin Interface”. (Author: Sergey Vojtovich)
*

Event scheduler. MySQL Events are tasks that run according to a schedule. When you create an event, you are creating a named database object containing one or more SQL statements to be executed at one or more regular intervals, beginning and ending at a specific date and time. Conceptually, this is similar to the idea of the Unix crontab (also known as a “cron job”) or the Windows Task Scheduler. See Section 19.4, “Using the Event Scheduler”. (Author: Andrey Hristov)
*

Server log tables. Before MySQL 5.1, the server writes general query log and slow query log entries to log files. As of MySQL 5.1, the server’s logging capabilities for these logs are more flexible. Log entries can be written to log files (as before) or to the general_log and slow_log tables in the mysql database. If logging is enabled, either or both destinations can be selected. The –log-output option controls the destination or destinations of log output. See Section 5.2.1, “Selecting General Query and Slow Query Log Output Destinations”. (Author: Petr Chardin)
*

Upgrade program. The mysql_upgrade program (available as of MySQL 5.1.7) checks all existing tables for incompatibilities with the current version of MySQL Server and repairs them if necessary. This program should be run for each MySQL upgrade. See Section 4.4.8, “mysql_upgrade — Check Tables for MySQL Upgrade”. (Authors: Alexey Botchkov, Mikael Widenius)
*

MySQL Cluster. MySQL Cluster is now released as a separate product, based on MySQL 5.1 but with the addition of the NDBCLUSTER storage engine. Clustering support is no longer available in mainline MySQL 5.1 releases. MySQL Cluster releases are identified by a 3-part NDB version number; currently, the MySQL Cluster NDB 6.2 and MySQL Cluster NDB 6.3 release series are available for production use.

Some of the changes in MySQL Cluster since MySQL 5.0 are listed here:
o

MySQL Cluster replication. Replication between MySQL Clusters is now supported. It is now also possible to replicate between a MySQL Cluster and a non-cluster database. See Section 17.10, “MySQL Cluster Replication”.
o

MySQL Cluster disk data storage. Formerly, the NDBCLUSTER storage engine was strictly in-memory; now, it is possible to store Cluster data (but not indexes) on disk. This allows MySQL Cluster to scale upward with fewer hardware (RAM) requirements than previously. In addition, the Disk Data implementation includes a new “no-steal” restoration algorithm for fast node restarts when storing very large amounts of data (terabyte range). See Section 17.11, “MySQL Cluster Disk Data Tables”, for more information.
o

Improved backups for MySQL Cluster. A fault arising in a single data node during a Cluster backup no longer causes the entire backup to be aborted, as occurred in previous versions of MySQL Cluster.

Many other new features and improvements have been made to the NDBCLUSTER storage engine in MySQL Cluster NDB 6.2 and MySQL Cluster NDB 6.3; for more information about these, see Section 17.14, “MySQL Cluster Development Roadmap”.
*

Backup of tablespaces. The mysqldump utility now supports an option for dumping tablespaces. Use -Y or –all-tablespaces to enable this functionality.
*

Improvements to INFORMATION_SCHEMA. MySQL 5.1 provides much more information in its metadata database than was available in MySQL 5.0. New tables in the INFORMATION_SCHEMA database include FILES, EVENTS, PARTITIONS, PROCESSLIST, ENGINES, and PLUGINS.
*

XML functions with XPath support. ExtractValue() returns the content of a fragment of XML matching a given XPath expression. UpdateXML() replaces the element selected from a fragment of XML by an XPath expression supplied by the user with a second XML fragment (also user-supplied), and returns the modified XML. See Section 11.10, “XML Functions”. (Author: Alexander Barkov)
*

Load emulator. The mysqlslap program is designed to emulate client load for a MySQL server and report the timing of each stage. It works as if multiple clients were accessing the server. See Section 4.5.7, “mysqlslap — Load Emulation Client”. (Authors: Patrick Galbraith, Brian Aker)