Taming Tomcat, Day 2: Development and Administration
Author and presenter: Simon Brooke.
The full text of this presentation is online at <URL:
http://www.weft.co.uk/library/tomcat/>
Written March-April 2006; $Revision: 1.7 $ of $Date:
2006-04-28$
Changes to the presentation since your
handouts were printed are highlighted like this.
Simon Brooke, 21 Main Street, Auchencairn, DG7 1QU,
Scotland.
Day 1: The Basics
Day 3: Security and
Performance 
Programme for Today
- This afternoon
- Before break
- After break
- Development frameworks for Webapps
What I'm trying to achieve today
- By the end of today you should
- Understand what a Webapp is
- And have some introductory idea of how to develop them
- Know how to administer Tomcat
Anatomy of a Webapp
- A collection of resources
- Held in a single WAR file
- An archive file
- technically identical to a JAR file
- but containing a special layout
- Configured by a single configuration file:
web.xml
Anatomy of a Webapp (ii): Layout of resources
- Magic directory WEB-INF
- Content never directly served by Tomcat
- Which means you can stick things in here you don't
want users to be able to access
- Contains main configuration file web.xml
- Name is magic: it must be exactly that
- Contains subdirectories
- lib: any jar files - libraries you may
want to use in your application
- classes: Java class and resource files
you may want to use
- Other directories for servable content
- e.g. plain HTML, CSS, pictures, other media
- JSP files
the Webapp classpath
- all the jar files in WEB-INF/lib
- in alphabetical order
- if replacing e.g. mylib0.1.jar with mylib0.2.jar, don't
leave the old version in there!
- WEB-INF/classes
TODO: in what order?
the web.xml file
- The principal configuration file of a webapp
- In XML syntax, but...
- The most bizzarely and badly designed DTD I've ever seen
- must add children in right order
- e.g. all context-param elements before all servlet
elements
- can't mix them
- No attributes at all - not even for things which are
obviously attributes
- Root element: web-app
DTD and Schema
the <web-app> element
- Children, in this order:
- 0 or 1 icon
- 0 or 1 display-name
- 0 or 1 description
- 0 or 1 distributable (potentially
useful)
- 0 or many context-param (useful)
- 0 or many servlet (useful)
- 0 or many servlet-mapping (useful)
- 0 or 1 session-config (you might use
this)
- 0 or many mime-mapping
- 0 or 1 welcome-file-list (useful)
- 0 or many error-page (useful)
- 0 or many taglib
- 0 or many resource-ref
- 0 or many security-constraint
- 0 or 1 login-config
- 0 or many security-role
- 0 or many env-entry
- 0 or many ejb-ref
But wait! There's more!
- Added in version 2.4:
- 0 or many ejb-local-ref
- 0 or many filter
- 0 or many filter-mapping
TODO: and more
The icon element
- Intended to provide an icon for the webapp in GUI based
managers
- Don't know if anything uses it - the Manager
and Admin webapps certainly don't.
- Children, in this order
- 0 or 1 small-icon
- contents the location within the web application of
a file containing a small (16x16 pixel) icon
image.
- 0 or 1 large-icon
- contents the location within the web application of
a file containing a large (32x32 pixel) icon
image.
The display-name element
- The display-name element contains a short name
that is intended to be displayed by GUI tools
- The Manager webapp uses this
- No children
The description element
- The description element is used to provide
descriptive text about the parent element.
- Many other elements take description as a
child
- But some which did in the Servlet 2.2 DTD don't in the
Servlet 2.4 schema!
- No children
The distributable element
- The distributable element, by its presence in
a web application deployment descriptor, indicates that this
web application is programmed appropriately to be deployed into
a distributed servlet container
- no local state
- Session data can't be shared between tomcat instances
- mod_jk2 was supposed to ensure that requests for
the same session got directed to the Tomcat instance
that originated that session, even in a cluster
- But mod_jk2 is deprecated
- If you're going to distribute apps over clustered
Tomcats, store your session identifier on a cookie on
the client, and your session state in the
database!
The context-param element
- The context-param element contains the
declaration of a web application's servlet context
initialization parameters.
- configuration parameters available to all servlets
within the web-app
- Children, in this order
- 1 param-name
- 1 param-value
- a value to associate with that name
- 0 or 1 description
The servlet element
- The servlet element contains the declarative
data of a servlet. If a jsp-file is specified and the
load-on-startup element is present, then the JSP should be
precompiled and loaded.
- Children, in this order
- 0 or 1 icon
- same as icon for web-app.
- You don't need this
- 1 servlet-name
- the canonical name of this servlet - keep it short
and simple
- You do need this!
- 0 or 1 display-name
- same as display-name for web-app
- You don't need this
- 0 or 1 description
- same as description for web-app
- You don't need this
- either 1 servlet-class
- fully qualified name of a class which implements
javax.servlet.Servlet
- must be on the Webapp classpath!
- or 1 jsp-file
- relative pathname of the file from the webapp
root
- either this or servlet-class must be present
- 0 or many init-param
- Configuration parameters available to this servlet
only
- Children in this order
- 1 param-name - a short distinct
name
- 1 param-value - a value to
associate with that name
- 0 or 1 description
- 0 or 1 load-on-startup
- The load-on-startup element indicates that this
servlet should be loaded on the startup of the web
application. The optional contents of these element
must be a positive integer indicating the order in
which the servlet should be loaded. Lower integers are
loaded before higher integers. If no value is
specified, or if the value specified is not a positive
integer, the container is free to load it at any time
in the startup sequence.
- I've rarely found a need to use this
- 0 or many security-role-ref
- Same as security-role-ref for
web-app
context-params and
servlet-params
- Are both available to the Servlet at initialisation time,
as two separate namespaces.
- I prefer to merge the two namespaces with servlet-params
taking precedence over context-params
- This means I can set overall policy in the context-params
and fine-tune it in the servlet-params
The servlet-mapping element
- The servlet-mapping element defines a mapping between a
servlet and a url pattern
- Important! You need this.
- This allows you to specify that the same servlet will
respond to different URLs
- Allows you to specify classes of URLs which a specific
servlet will respond to
- Children, in this order
- 1 servlet-name the canonical name you gave
the servlet
- 1 url-pattern the URL pattern you want it
to respond to
url-pattern: examples
- /*
- matches everything the path part of whose URL starts
with appname/
- /token
- matches everything the path part of whose URL is
exactly appname/token
- /token*
- matches everything the path part of whose URL begins
with appname/token
- /*.token
- matches everything the path part of whose URL ends with
appname/token
Sorry? The path part?
- Of course you remember that a full URL comprises
-
protocol-part:[subprotocol-part:]//[username[:password]@]host-part[:port]/path-part[?query-part]
- Where
- Everything is optional, but some things are more
optional than others
- The protocol part selects the protocol to use
- The subprotocol part specifies the variant of that
protocol, if any (common in JDBC URLS, for example)
- The username is a username, and
- The password is a (hopefully matching) password
- The host part is the domain name or internet address of
a host
- The port is a port number
- The path part is a sequence of characters originally
representing the path to a file
- The query part is normally a sequence of token=value
pairs.
- You did remember all that, didn't you. Yes, I know you
did.
The session-config element
- Essentially only allows you to control the session
timeout
- One child, session-timeout, theoretically
optional
- content is an integer number of minutes
- no point in having a session-config
element unless you specify this
The mime-mapping element
- Allows you to map file-extensions to mime types
- So that Tomcat serves appropriate mime-types for your
static content
- Tomcat isn't excellent at serving static content
- Better to serve your static content from (e.g.)
Apache, if this can
be arranged
- Children, in this order
- 1 extension (e.g. htm)
- 1 mime-type (e.g.
text/html)
The welcome-file-list element
- Allows you to define the files that should be used to
provide default content for a directory
- Children
- 1 or many welcome-file, content a file
name (e.g. index.html )
- As of the Servlet 2.4 specification (Tomcat 5), a
welcome-file element can specify a Servlet
name
- welcome-file elements are listed in order, so if you have
<welcome-file-list>
<welcome-file>index.jsp</welcome-file>
<welcome-file>index.html</welcome-file>
</welcome-file-list>
files called index.jsp will be preferred to
files called index.html
The error-page element
- The error-page element contains a mapping between an error
code or exception type to the path of a resource in the web
application
- Allows you to handle different errors differently...
- One of the benefits of writing a course like this
- Hadn't realised how flexible this was
- Will use it in future!
- Children, in this order
- either 1 error-code, content an HTTP error
code (e.g. 404 )
- or 1 exception-type, content a fully
qualified class name of a Java exception (e.g.
uk.co.weft.htform.DataStoreException )
- 1 location, content the relative pathname
of a resouce within the webapp (e.g.
/errors/dontdothat.html )
- not a URL, can't point it at a resource outside the
webapp (shame)
The taglib element
- Actually redundant; you can embed your taglib declarations
in your JSP files, and most JSP developers do.
- Children, in this order
- 1 taglib-url, content a URL which will be
used as a handle for this taglib description
- does not need to point to anything real
- 1 taglib-location, contents the relative
pathname of the Tag Library Description (TLD) file within
the webapp
The resource-ref element
- The resource-ref element contains a declaration of a Web
Application's reference to an external Java resource.
- Children, in this order
- 0 or 1 description
- 1 res-ref-name content a distinct name for
this reference
- 1 res-type content the name of the Java
class which is referred to
- 1 res-auth content either CONTAINER or
SERVLET
- 'The res-auth element indicates whether the
application component code performs resource signon
programmatically or whether the container signs onto
the resource based on the principle mapping information
supplied by the deployer.'
- You won't need this (at least, I can't imagine why you
should).
The security-constraint element
- The security-constraint element is used to associate
security constraints with one or more web resource
collections
- That is, it forces users wishing to access them to log in,
using usernames and passwords typically defined in
tomcat-users.xml
- Children, in this order
- 1 or many web-resource-collection, with
children
- 1 web-resource-name a distinct name
for this resource
- 0 or 1 description
- 0 or many url-pattern (we've seen
these before)
- 0 or many http-method contents e.g.
GET, POST, ...
- 0 or 1 auth-constraint, with children
- 0 or 1 description
- 1 role-name contents a role-name as
defined in tomcat-users.xml
- 0 or 1 user-data-constraint, with children
- 0 or 1 description
- 1 transport-guarantee : 'The
transport-guarantee element specifies that the
communication between client and server should be
NONE, INTEGRAL, or
CONFIDENTIAL. NONE means that
the application does not require any transport
guarantees. A value of INTEGRAL means that
the application requires that the data sent between the
client and server be sent in such a way that it can't
be changed in transit. CONFIDENTIAL means
that the application requires that the data be
transmitted in a fashion that prevents other entities
from observing the contents of the transmission. In
most cases, the presence of the INTEGRAL
or CONFIDENTIAL flag will indicate that
the use of SSL is required.'
- You rarely need this. If your webapp talks to a database,
it's best to do user authentication in the database layer.
The login-config element
- The login-config element is used to configure the
authentication method that should be used, the realm name that
should be used for this application, and the attributes that
are needed by the form login mechanism.
- That is, it determines how the authentication is challenged
for
- Children, in this order
- 1 auth-method, content one of the HTTP
authentication methods, namely
- BASIC
- FORM
- DIGEST
- CLIENT-CERT
- 1 realm-name, the name of the realm which
users are challenged to login to (e.g. authorised
employees only)
- 0 or 1 form-login-config, with children
- 1 form-login-page contents the
relative pathname of the location in the web app where
the page that can be used for login can be found
- 1 form-error-page contents the
relative pathname of the location in the web app where
the page that is displayed when login is not successful
can be found
Only useful if auth-method is
FORM
- Once again, if you're protecting static content, you
probably shouldn't be serving it with Tomcat anyway. If you're
protecting an application which talks to a database, you
probably want to do authentication at the database layer
- Although CLIENT-CERT can be useful
The security-role element
- Really only here for documentation; allows you to supply a
description for roles you've defined in
tomcat-users.xml and used in
security-constraints.
- Children, in this order
- 0 or 1 description not required but the
whole element is redundant if you don't supply it
- 1 role-name as above
- Once again, generally, the database is the appropriate
layer to do this kind of thing
The env-entry element
The ejb-ref element
[Break]
Tomcat and the database
- As of Tomcat 5, Tomcat includes a database connection pool
- Why is this significant?
- Do you need it?
- How do you use it?
Connections
- The database is where your persistent data is stored
- and much of your transient data
- the crown jewels
- Database layer security is the key to data security
- the layer you can't circumvent
- Database layer constraints are the key to data
integrity
- Connections provide access to your data with
authentication
Connection Lifecycle
- To use a Connection, you have to set it up
- authentication tokens across the network
- Each Connection uses resources at both ends
- database and Java application
- surprisingly greedy
- Closing Connections early risks data loss
- Contents of all ResultSets is lost immediately
Connection Pool: objectives
- Create new Connections as rarely as possible
- Keep as few Connections open at one time as possible
- Don't let authenticated connections get hijacked.
Connection Pools: options
- Tomcat's connection pool is not the only one available
- It's new and people have needed them for a long time
- I use
my own
- Simple, lightweight, reliable
- Designed for use with Servlets
- No special configuration
- Not transparent - software must invoke it
explicitly
- Robust at preventing leaks
- Other open source connection pools include
-
Primrose
- Designed explicitly for use with Tomcat
-
Proxxool
- Claimed to be transparent - no need for explicit
invokation.
-
Jakarta
Commons DBCP
- 'Jakarta
Commons' is the Apache Foundation's project to
produce a collection of reliable and resusable Java
utility components
- To prevent too much wheel re-inventing
- Tomcat's connection pool is built on top of Jakarta
Commons DBCP
- Not robust at preventing leaks
Introducing Tomcat's connection pool
- A very quick introduction to JNDI
- Configuring JNDI
A very quick introduction to JNDI
- Java Naming and Directory Interface
- Abstraction layer for naming services
- Tomcat's connection pool is built on top of Tomcat's JNDI
implementation
The Tomcat Connection Pool
- Lots of awful warnings:
JNDI resource configuration has changed between
Tomcat 5.0.x and Tomcat 5.5.x
JNDI Datasource configuration is covered extensively
in the
JNDI-Resources-HOWTO. However, feedback from
tomcat-user has shown that specifics for individual
configurations can be rather tricky.
Please let us know if you have used DBCP [the Tomcat
connection pool] and its JDBC 3.0 features with a 1.4
JVM.
- I suspect this technology is not ready for prime time.
[Lunch]
Administering Tomcat
- The Manager Webapp
- The Admin Webapp
The Manager Webapp
- Security, and enabling the Manager
- Enabling the GUI
- What the Manager Webapp can do
Manager webapp Security
- With the Manager webapp, you can manage your Tomcat over
the Web...
- ... so can anyone else who can intercept or guess your
password
- Not enabled by default
- Not included in some distributions
What I do (and suggest)
- Tomcat serves HTTP on port 8180
- Tomcat serves AJP13 on port 8009
- Apache forwards public webapps from AJP13 on 8009 to HTTP
on 80
- Firewall only passes port 80
- So private Webapps only available inside the firewall
Enabling the Manager webapp
Enabling the Web interface
- The Manager webapp has a (reasonably) user friendly
interface
- But that isn't enabled by default, either
- Edit webapps/manager/WEB-INF/web.xml
- Change
org.apache.catalina.servlets.ManagerServlet to
org.apache.catalina.servlets.HTMLManagerServlet
- Restart Tomcat
What the Manager Webapp can do (i)
- List known webapps
http://host:port/manager/list
- Start a webapp
http://host:port/manager/start?path=/appname
- Stop a webapp
http://host:port/manager/stop?path=/appname
- Reload a running webapp
http://host:port/manager/reload?path=/appname
- For example, when you've edited the web.xml or dropped
in a new version of a class or jar file
What the Manager Webapp can do (ii)
- Install a complete new webapp, from a war file
http://host:port/manager/install?war=fullurlofwar
- Provided
- If a file: URL, the file is on the
server
- And is readable by the user tomcat is running
as
- http: URLs don't work, so you can't
upload and install
- Not as useful as it sounds
- On later Tomcat 4 and Tomcat 5, you can upload a WAR
through the Web interface
- Deploy a new webapp whose war file has been dropped into
the webapps directory
http://host:port/manager/deploy?path/appname
What the Manager Webapp does not always reliably do
- In my experience reloading a Webapp with the Manager Webapp
does not always reload all resources
- Neither does stopping and restarting a Webapp
- Sometimes restarting Tomcat is the only solution
The Admin Webapp
- Security issues much the same as for Manager
- No longer part of the standard distribution
- Download the Admin Webapp from here:
- Wide variety of configuration tasks
- Same things as you can do with the server.xml file
- If you use it, your configuration will be out of sync
with server.xml
- Many low level and dangerous
- Pointy-clicky, few warnings
- Use carefully!
Exercise: Administering Tomcat
- Use the Manager Webapp to start and stop particular webapps
- For Debian users, the manager webapp is in the package
tomcat4-admin
- apt-get install tomcat4-admin
- For Windows users, you'll need to download the manager
webapp from here
- Use the Manager Webapp to deploy a new Webapp on someone
else's machine (with their agreement)
- Explore your settings with the Admin Webapp
- For Debian users, the admin webapp is in the same
package as the manager webapp, so you've already installed
it.
- For Windows users, you'll need to download the manager
webapp from here
Log file analysis
- Apache HTTPD logs
- The Tomcat logs
- The Catalina log
- The Host log
- Where your debugging output goes
Apache HTTPD logs
- Apache HTTPD by default has two main logs
- access.log
- error.log
- you may configure separate logs for separate virtual
hosts
- The access log logs exactly one line for every request
serviced
- the format is simple
- every line has the same number of fields
- The error log logs exactly one line for each error that
occurs
- the format is simple
- every line has the same number of fields
- Many tools exist to analyse these logs
The Tomcat logs
- Tomcat logs are not like Apache logs much less disciplined
harder to analyse
- Tomcat does not by default log each service athough your
servlets may do so if you so choose
- Tomcat's logging is changing radically with Tomcat 5.5,
documentation here
The Catalina log
- Appears as catalina_date.log
- Many entries are produced by the core Tomcat engine, and
are neatly timestamped
- But there's also a lot of extraneous junk
- anything printed to stdout ends up here
- any unhandled exception dump ends up here
- Can be messy and very hard to read
The Host log
- Normally localhost_date.log but name depends on
Hose element in server.xml
- Each entry is timestamped and labelled with the component
which logged it
- Contains information explicitly logged through the
javax.servlet.Servlet.log() method
- By your Servlets and by Tomcat's own internal
mechanisms
- Also contains Java exception dumps from exceptions thrown
by Servlets
- From Tomcat 5.5 the exception dumps appear in the
stdout.log
Where your debugging output goes
- If you use the Servlet.log() method, your messages will
appear in the host log
- you should do this
- but sometimes you don't have a handle on a Servlet
- If you print to stderr, your messages appear in the
Catalina log
- you can always do this
- I tend to do this rather too much
Reading Java exception dumps
- When Java throws an exception, by default it dumps
everything on the stack at a time
- In something as complex as Tomcat, this produces vast
amounts of output
- Usually only the first two or three lines are
interesting
- Usually, each line in the dump will show the source
line number of the Java source file which was being
executed at that point
- When a higher level component throws an exception, it often
causes a lower level component to throw an exception
- The lower level component's exception dump appears
first
- Scroll through it and find lines marked 'Root cause'
- start of the higher level exception dump
- Usually it is the highest leve (last) dump that is
interesting.
When things go wrong
- Look at the last thousand lines of the host log and
catalina log
- Do not be ovewhelmed by the size of exception dumps
- Remember the key part of any exception dump is the two or
three lines after the last 'Root cause' line
- Apache logs are better for doing traffic analysis
[Break]
Development frameworks for Webapps
Lots and lots of others...
A brief tantrum about buzzwords
- What is a 'business' rule, and how is it different from any
other sort of domain rule?
- What is a 'business' object, and how is it different from
any other sort of domain object?
- In my experience whenever you see the word 'business' or
'enterprise' tagged on to a generic software concept, you know
- That the marketdroids have trampled all over it
- That it is manufactured exclusively from the waste
product of stables and cattle sheds
- This applies especially to 'Enterprise Java
Beans'
- About which I shall say no more.
Struts
- A Struts application comprises a single controller servlet
(ActionServlet) which is essentially a switch
- Pluggable action handlers match paths and either succeed or
fail
- The action handler generates a redirect to one page on
success, to a different one on failure
- What the user sees is just JSP pages, supported by
taglibs
The ActionServlet
The Actions
- Are what you write to create your particular application
- Although Struts comes with some useful generic
ones
- If they succeed they return a token which advises the
ActionServlet where to redirect to next
- A token, not a hard-wired URL
- The token gets matched in a configuration file to
determine the actual URL to redirect to
- To fail, they just have to throw an untrapped
exception
The Exception Handlers
The ActionForm
- In Struts, the forms users fill in are just HTML or
JSP
- The application layer has no access to their structure
- So you need an object which knows what the form should
return and how to validate it
- Essentially a Java Bean with a validate(
ActionMapping, HttpServletRequest) method
Struts: Summary
- Quite hard to get into - steep initial learning curve, easy
things are quite hard
- May make harder things easier to build, but I confess I
don't see this
- Plugability is interesting and flexible, making component
re-use very easy
- Excellent separation of logic and presentation
- Architecturally rather teutonic and heavyweight
- Don't really see what is gained by the single
ActionServlet
- The Servlet Container (Tomcat) already does switching
based on URL substring match (see
servlet-mapping, this morning)
- Disappointing there isn't a richer action selection
vocabulary
Tapestry
an open-source framework for creating dynamic, robust,
highly scalable web applications in Java
- Yet another Jakarta project
- four key principles:
- Simplicity - web applications shouldn't be rocket
science!
- Consistency - what works in pages should work in
components. What works in small applications should work in
large applications. Different developers should find
similar solutions to similar problems.
- Efficiency - applications should be performant and
scalable
- Feedback - when things go wrong, the framework should
not get in the way; in fact, it should provide useful
diagnostics
Jacquard
- All my own work, so I'm not neutral about it
- Has grown like Topsy, so a bit indisciplined in places
- I've been trying to get time to write a version 2 for
years...
- Extremely easy and quick to build data driven applications
in
Three main subpackages
- htform
- components to build sophisticated web forms
quickly
- ancient - developed from 1997
- just won't die - too easy to use
- dbutil
- database (and potentially other data source)
abstraction layer
- with robust connection pooling
- also ancient, but architecture is good
- domutil
- components to generate and transform XML DOM
structures
- intended to replace htform
- hasn't done so, but enables very flexible content
generation
In use
- Specialise
TableWrapperForm
- Add appropriate
Widgets for all editable fields
- Use the
List Servlet, you very rarely need to write any code at
all
Jacquard: Summary
- Lightweight and opportunistic
- Easy things are easy, harder things are fairly easy
- Particular strengths in the data abstraction layer
- Presentation not as flexible as some of the
alternatives
Questions and Suggestions for tomorrow
- Any questions or things I haven't been clear enough
about?
- Any things you would like to change about the programme for
tomorrow