© 2002 BEA Systems, Inc.


com.bea.p13n.content.document.ref.loader
Class MetaParser

java.lang.Object
  |
  +--com.bea.p13n.content.document.ref.loader.MetaParser

public class MetaParser
extends java.lang.Object
implements LoaderFilter

A utility which can pull META tags from an HTML file.

This will also pull the title of an HTML from the <title></title> section anywhere in the document. No matter what the casing of the title tag, it will be put in the metadata properties as "title".


Constructor Summary
MetaParser()
           
 
Method Summary
static java.lang.String determineEncoding(java.io.BufferedReader reader)
          Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.
static java.lang.String determineEncoding(java.io.File f, java.lang.String encoding)
          Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.
static java.util.Properties load(java.io.BufferedReader reader, java.util.Properties p)
          Load the META tag name/value pairs from the input stream into p.
static java.util.Properties load(java.io.File f, java.util.Properties p)
          Load the META tag name/value pairs from f into p.
static java.util.Properties load(java.io.File f, java.util.Properties p, java.lang.String enc)
          Load the META tag name/value pairs from f into p.
 void loadProperties(java.io.File f, java.util.Properties p, BulkLoader loader)
          Invoke as a LoaderFilter.
static java.io.BufferedReader open(java.io.File f, java.lang.String encoding)
          Open a file with the given encoding.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MetaParser

public MetaParser()
Method Detail

loadProperties

public void loadProperties(java.io.File f,
                           java.util.Properties p,
                           BulkLoader loader)
Invoke as a LoaderFilter.
Specified by:
loadProperties in interface LoaderFilter


load

public static java.util.Properties load(java.io.File f,
                                        java.util.Properties p)
                                 throws java.io.IOException
Load the META tag name/value pairs from f into p.

Parameters:
f - the file.
p - the properties object (null to create new).
Returns:
the META tag name/values (p if p was not null).
Throws:
java.io.IOException - thrown on an error reading the file.

load

public static java.util.Properties load(java.io.File f,
                                        java.util.Properties p,
                                        java.lang.String enc)
                                 throws java.io.IOException
Load the META tag name/value pairs from f into p.

This will look for the encoding name to use by trying to find a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > tag in the HTML (this will use the passed in encoding to find the encoding). If a valid encoding name is found, the file will be opened with that encoding, and the "encoding" property in the Properties will be set. If a valid encoding is not found and an encoding was passed in, that encoding will be used. If an encoding was not passed in, the system default will be used.

Parameters:
f - the file.
p - the properties object (null to create new).
enc - the file encoding name to try (null for system default).
Returns:
the META tag name/values (p if p was not null).
Throws:
java.io.IOException - thrown on an error reading the file.

load

public static java.util.Properties load(java.io.BufferedReader reader,
                                        java.util.Properties p)
                                 throws java.io.IOException
Load the META tag name/value pairs from the input stream into p.

This operates on a last-seen-is-returned alogirithm for META tags with the same name. It will also find all meta tags in file, not just those in the head.

Parameters:
reader - the input reader.
p - the properties object (null to create new).
Returns:
the META tag name/values (p if p was not null).
Throws:
java.io.IOException - thrown on an error reading the file.

determineEncoding

public static final java.lang.String determineEncoding(java.io.File f,
                                                       java.lang.String encoding)
                                                throws java.io.IOException
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.

If an encoding is passed in and the files doesn't contain an appropriate META tag, that encoding will be returned.

Parameters:
f - the input file
encoding - the encoding to open the file with (null for default).
Returns:
the encoding to use, null for unknown.
Throws:
java.io.IOException - thrown on an error reading the file.

determineEncoding

public static final java.lang.String determineEncoding(java.io.BufferedReader reader)
                                                throws java.io.IOException
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.

Parameters:
in - the input stream
Returns:
the encoding to use, null for unknown.
Throws:
java.io.IOException - thrown on an error reading the file.

open

public static final java.io.BufferedReader open(java.io.File f,
                                                java.lang.String encoding)
                                         throws java.io.IOException
Open a file with the given encoding.

Parameters:
f - a file object.
encoding - the encoding to use, null for default.
Throws:
java.io.IOException - if invalid encoding or unable to open file.

© 2002 BEA Systems, Inc.

Copyright © 2002 BEA Systems, Inc. All Rights Reserved