com.bea.p13n.util
Class MetaParser

java.lang.Object
  extended by com.bea.p13n.util.MetaParser

public class MetaParser
extends Object

A utility which can pull META tags from an HTML file.

This will also pull the title of an HTML from the <title></title> section anywhere in the document. No matter what the casing of the title tag, it will be put in the metadata properties as "title".


Method Summary
static String determineEncoding(BufferedReader reader)
          Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.
static String determineEncoding(File f, String encoding)
          Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.
static Properties load(BufferedReader reader, Properties p)
          Load the META tag name/value pairs from the input stream into p.
static Properties load(File f, Properties p)
          Load the META tag name/value pairs from f into p.
static Properties load(File f, Properties p, String enc)
          Load the META tag name/value pairs from f into p.
static BufferedReader open(File f, String encoding)
          Open a file with the given encoding.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

load

public static Properties load(File f,
                              Properties p)
                       throws IOException
Load the META tag name/value pairs from f into p.

Parameters
f - the file.
p - the properties object (null to create new).
Returns
the META tag name/values (p if p was not null).
Throws
IOException - thrown on an error reading the file.

load

public static Properties load(File f,
                              Properties p,
                              String enc)
                       throws IOException
Load the META tag name/value pairs from f into p.

This will look for the encoding name to use by trying to find a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > tag in the HTML (this will use the passed in encoding to find the encoding). If a valid encoding name is found, the file will be opened with that encoding, and the "encoding" property in the Properties will be set. If a valid encoding is not found and an encoding was passed in, that encoding will be used. If an encoding was not passed in, the system default will be used.

Parameters
f - the file.
p - the properties object (null to create new).
enc - the file encoding name to try (null for system default).
Returns
the META tag name/values (p if p was not null).
Throws
IOException - thrown on an error reading the file.

load

public static Properties load(BufferedReader reader,
                              Properties p)
                       throws IOException
Load the META tag name/value pairs from the input stream into p.

This operates on a last-seen-is-returned alogirithm for META tags with the same name. It will also find all meta tags in file, not just those in the head.

Parameters
reader - the input reader.
p - the properties object (null to create new).
Returns
the META tag name/values (p if p was not null).
Throws
IOException - thrown on an error reading the file.

determineEncoding

public static final String determineEncoding(File f,
                                             String encoding)
                                      throws IOException
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.

If an encoding is passed in and the files doesn't contain an appropriate META tag, that encoding will be returned.

Parameters
f - the input file
encoding - the encoding to open the file with (null for default).
Returns
the encoding to use, null for unknown.
Throws
IOException - thrown on an error reading the file.

determineEncoding

public static final String determineEncoding(BufferedReader reader)
                                      throws IOException
Try to determine the encoding from a <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" > instruction.

Parameters
in - the input stream
Returns
the encoding to use, null for unknown.
Throws
IOException - thrown on an error reading the file.

open

public static final BufferedReader open(File f,
                                        String encoding)
                                 throws IOException
Open a file with the given encoding.

Parameters
f - a file object.
encoding - the encoding to use, null for default.
Throws
IOException - if invalid encoding or unable to open file.


Copyright © 2011, Oracle. All rights reserved.