diff options
| author | jvoisin | 2011-08-05 22:35:05 +0200 |
|---|---|---|
| committer | jvoisin | 2011-08-05 22:35:05 +0200 |
| commit | 10e3de8ad65f98804737e1d3ddb3c26b224d3f33 (patch) | |
| tree | 137a15dc4416f469ec06a2100494b15450f05c06 | |
| parent | 4b4c34d561c4ba81274c861de47c9807dbe76ca8 (diff) | |
Complete the documentation
| -rw-r--r-- | README | 89 | ||||
| -rw-r--r-- | lib/mat.py | 2 |
2 files changed, 86 insertions, 5 deletions
| @@ -1,3 +1,16 @@ | |||
| 1 | METADATA: | ||
| 2 | Metadata consist of information that characterizes data. | ||
| 3 | Metadata are used to provide documentation for data products. | ||
| 4 | In essence, metadata answer who, what, when, where, why, and how about | ||
| 5 | every facet of the data that are being documented. | ||
| 6 | |||
| 7 | METADATA AND PRIVACY: | ||
| 8 | Metadata within a file can tell a lot about you. | ||
| 9 | Cameras record data about when a picture was taken and what | ||
| 10 | camera was used. Office documents like pdf or Office automatically adds | ||
| 11 | author and company information to documents and spreadsheets. | ||
| 12 | Maybe you don't want to disclose those informations on the web. | ||
| 13 | |||
| 1 | WARNING : | 14 | WARNING : |
| 2 | Mat only remove metadata from your files, it does not anonymise their | 15 | Mat only remove metadata from your files, it does not anonymise their |
| 3 | content, nor it can handle watermarking, steganography, or any too custom | 16 | content, nor it can handle watermarking, steganography, or any too custom |
| @@ -25,9 +38,76 @@ USAGE: | |||
| 25 | 38 | ||
| 26 | 39 | ||
| 27 | SUPPORTED FORMAT: | 40 | SUPPORTED FORMAT: |
| 28 | python cli -l | 41 | Portable Network Graphics (.png) |
| 29 | or | 42 | support : full |
| 30 | python gui.py -> help -> supported formats | 43 | metadata : textual metadata + date |
| 44 | method : removal of harmful fields is done with hachoir | ||
| 45 | |||
| 46 | |||
| 47 | Jpeg (.jpeg, .jpg) | ||
| 48 | support : full | ||
| 49 | metadata : comment + exif/photoshop/adobe | ||
| 50 | method : removal of harmful fields is done with hachoir | ||
| 51 | |||
| 52 | |||
| 53 | Open Document (.odt, .odx, .ods, ...) | ||
| 54 | support : full | ||
| 55 | metadata : a meta.xml file | ||
| 56 | method : removal of the meta.xml file | ||
| 57 | |||
| 58 | |||
| 59 | Office Openxml (.docx, .pptx, .xlsx, ...) | ||
| 60 | support : full | ||
| 61 | metadata : a docProps folder containings xml metadata files | ||
| 62 | method : removal of the docProps folder | ||
| 63 | |||
| 64 | |||
| 65 | Portable Document Fileformat (.pdf) | ||
| 66 | support : full | ||
| 67 | metadata : a lot | ||
| 68 | method : rendering of the pdf file on a cairo surface with the help of | ||
| 69 | poppler in order to remove all the internal metadata, | ||
| 70 | then removal of the remaining metadata fields of the pdf itself with | ||
| 71 | pdfrw (the next version of python-cairo will support metadata, | ||
| 72 | so we should get rid of pdfrw) | ||
| 73 | |||
| 74 | |||
| 75 | Tape ARchive (.tar, .tar.bz2, .tar.gz) | ||
| 76 | support : full | ||
| 77 | metadata : metadata from the file itself, metadata from the file contained | ||
| 78 | into the archive, and metadata added by tar to the file at then | ||
| 79 | creation of the archive | ||
| 80 | method : extraction of each file, treatement of the file, add treated file | ||
| 81 | to a new archive, right before the add, remove the metadata added by tar | ||
| 82 | itself. When the new archive is complete, remove all his metadata. | ||
| 83 | |||
| 84 | |||
| 85 | Zip (.zip) | ||
| 86 | support : .partial | ||
| 87 | metadata : metadata from the file itself, metadata from the file contained | ||
| 88 | into the archive, and metadata added by zip to the file when added to | ||
| 89 | the archive. | ||
| 90 | |||
| 91 | method : extraction of each file, treatement of the file, add treated file | ||
| 92 | to a new archive. When the new archive is complete, remove all his metadata | ||
| 93 | |||
| 94 | |||
| 95 | MPEG Audio (.mp3, .mp2, .mp1) | ||
| 96 | support : full | ||
| 97 | metadata : id3 | ||
| 98 | method : removal of harmful fields is done with hachoir | ||
| 99 | |||
| 100 | |||
| 101 | Ogg Vorbis (.ogg) | ||
| 102 | support : full | ||
| 103 | metadata : Vorbis | ||
| 104 | method : removal of harmful fields is done with mutagen | ||
| 105 | |||
| 106 | |||
| 107 | Free Lossless Audio Codec (.flac) | ||
| 108 | support : full | ||
| 109 | metadata : Flac, Vorbis | ||
| 110 | method : removal of harmful fields is done with mutagen | ||
| 31 | 111 | ||
| 32 | 112 | ||
| 33 | LICENSE: | 113 | LICENSE: |
| @@ -57,4 +137,5 @@ THANKS: | |||
| 57 | 137 | ||
| 58 | 138 | ||
| 59 | KNOWN BUGS: | 139 | KNOWN BUGS: |
| 60 | Zipfiles are not totally cleaned | 140 | Zipfiles are not totally cleaned, I know. |
| 141 | I am working on a patch for zipfile.py | ||
| @@ -80,7 +80,7 @@ class XMLParser(xml.sax.handler.ContentHandler): | |||
| 80 | self.list.append(self.dict.copy()) | 80 | self.list.append(self.dict.copy()) |
| 81 | self.dict.clear() | 81 | self.dict.clear() |
| 82 | else: | 82 | else: |
| 83 | content = self.content.replace('\n', ' ') | 83 | content = self.content.replace('\s', ' ') |
| 84 | self.dict[self.key] = content | 84 | self.dict[self.key] = content |
| 85 | self.between = False | 85 | self.between = False |
| 86 | 86 | ||
