Tika-file-formats

提供:Dev Guides
移動先:案内検索

TIKA-ファイル形式

Tikaがサポートするファイル形式

次の表に、Tikaがサポートするファイル形式を示します。

File format Package Library Class in Tika
XML org.apache.tika.parser.xml XMLParser
HTML org.apache.tika.parserl and it uses Tagsoup Library HtmlParser
MS-Office compound document Ole2 till 2007 ooxml 2007 onwards

org.apache.tika.parser.microsoft

org.apache.tika.parser.microsoft.ooxmlおよびApache Poiライブラリを使用

a

OfficeParser(ole2)

OOXMLParser(ooxml)

OpenDocument Format openoffice org.apache.tika.parser.odf OpenOfficeParser
portable Document Format(PDF) org.apache.tika.parser.pdf and this package uses Apache PdfBox library PDFParser
Electronic Publication Format (digital books) org.apache.tika.parser.epub EpubParser
Rich Text format org.apache.tika.parser.rtf RTFParser
Compression and packaging formats org.apache.tika.parser.pkg and this package uses Common compress library PackageParser and CompressorParser and its sub-classes
Text format org.apache.tika.parser.txt TXTParser
Feed and syndication formats org.apache.tika.parser.feed FeedParser
Audio formats org.apache.tika.parser.audio and org.apache.tika.parser.mp3 AudioParser MidiParser Mp3- for mp3parser
Imageparsers org.apache.tika.parser.jpeg JpegParser-for jpeg images
Videoformats org.apache.tika.parser.mp4 and org.apache.tika.parser.video this parser internally uses Simple Algorithm to parse flash video formats Mp4parser FlvParser
java class files and jar files org.apache.tika.parser.asm ClassParser CompressorParser
Mobxformat (email messages) org.apache.tika.parser.mbox MobXParser
Cad formats org.apache.tika.parser.dwg DWGParser
FontFormats org.apache.tika.parser.font TrueTypeParser
executable programs and libraries org.apache.tika.parser.executable ExecutableParser