summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorjvoisin2011-08-05 22:35:05 +0200
committerjvoisin2011-08-05 22:35:05 +0200
commit10e3de8ad65f98804737e1d3ddb3c26b224d3f33 (patch)
tree137a15dc4416f469ec06a2100494b15450f05c06
parent4b4c34d561c4ba81274c861de47c9807dbe76ca8 (diff)
Complete the documentation
-rw-r--r--README89
-rw-r--r--lib/mat.py2
2 files changed, 86 insertions, 5 deletions
diff --git a/README b/README
index 9577afd..2b74d21 100644
--- a/README
+++ b/README
@@ -1,3 +1,16 @@
1METADATA:
2 Metadata consist of information that characterizes data.
3 Metadata are used to provide documentation for data products.
4 In essence, metadata answer who, what, when, where, why, and how about
5 every facet of the data that are being documented.
6
7METADATA AND PRIVACY:
8 Metadata within a file can tell a lot about you.
9 Cameras record data about when a picture was taken and what
10 camera was used. Office documents like pdf or Office automatically adds
11 author and company information to documents and spreadsheets.
12 Maybe you don't want to disclose those informations on the web.
13
1WARNING : 14WARNING :
2 Mat only remove metadata from your files, it does not anonymise their 15 Mat only remove metadata from your files, it does not anonymise their
3 content, nor it can handle watermarking, steganography, or any too custom 16 content, nor it can handle watermarking, steganography, or any too custom
@@ -25,9 +38,76 @@ USAGE:
25 38
26 39
27SUPPORTED FORMAT: 40SUPPORTED FORMAT:
28 python cli -l 41 Portable Network Graphics (.png)
29 or 42 support : full
30 python gui.py -> help -> supported formats 43 metadata : textual metadata + date
44 method : removal of harmful fields is done with hachoir
45
46
47 Jpeg (.jpeg, .jpg)
48 support : full
49 metadata : comment + exif/photoshop/adobe
50 method : removal of harmful fields is done with hachoir
51
52
53 Open Document (.odt, .odx, .ods, ...)
54 support : full
55 metadata : a meta.xml file
56 method : removal of the meta.xml file
57
58
59 Office Openxml (.docx, .pptx, .xlsx, ...)
60 support : full
61 metadata : a docProps folder containings xml metadata files
62 method : removal of the docProps folder
63
64
65 Portable Document Fileformat (.pdf)
66 support : full
67 metadata : a lot
68 method : rendering of the pdf file on a cairo surface with the help of
69 poppler in order to remove all the internal metadata,
70 then removal of the remaining metadata fields of the pdf itself with
71 pdfrw (the next version of python-cairo will support metadata,
72 so we should get rid of pdfrw)
73
74
75 Tape ARchive (.tar, .tar.bz2, .tar.gz)
76 support : full
77 metadata : metadata from the file itself, metadata from the file contained
78 into the archive, and metadata added by tar to the file at then
79 creation of the archive
80 method : extraction of each file, treatement of the file, add treated file
81 to a new archive, right before the add, remove the metadata added by tar
82 itself. When the new archive is complete, remove all his metadata.
83
84
85 Zip (.zip)
86 support : .partial
87 metadata : metadata from the file itself, metadata from the file contained
88 into the archive, and metadata added by zip to the file when added to
89 the archive.
90
91 method : extraction of each file, treatement of the file, add treated file
92 to a new archive. When the new archive is complete, remove all his metadata
93
94
95 MPEG Audio (.mp3, .mp2, .mp1)
96 support : full
97 metadata : id3
98 method : removal of harmful fields is done with hachoir
99
100
101 Ogg Vorbis (.ogg)
102 support : full
103 metadata : Vorbis
104 method : removal of harmful fields is done with mutagen
105
106
107 Free Lossless Audio Codec (.flac)
108 support : full
109 metadata : Flac, Vorbis
110 method : removal of harmful fields is done with mutagen
31 111
32 112
33LICENSE: 113LICENSE:
@@ -57,4 +137,5 @@ THANKS:
57 137
58 138
59KNOWN BUGS: 139KNOWN BUGS:
60 Zipfiles are not totally cleaned 140 Zipfiles are not totally cleaned, I know.
141 I am working on a patch for zipfile.py
diff --git a/lib/mat.py b/lib/mat.py
index 23255d5..ad66d92 100644
--- a/lib/mat.py
+++ b/lib/mat.py
@@ -80,7 +80,7 @@ class XMLParser(xml.sax.handler.ContentHandler):
80 self.list.append(self.dict.copy()) 80 self.list.append(self.dict.copy())
81 self.dict.clear() 81 self.dict.clear()
82 else: 82 else:
83 content = self.content.replace('\n', ' ') 83 content = self.content.replace('\s', ' ')
84 self.dict[self.key] = content 84 self.dict[self.key] = content
85 self.between = False 85 self.between = False
86 86