diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/implementation_notes.md | 18 | ||||
| -rw-r--r-- | doc/mat2.1 | 6 | ||||
| -rw-r--r-- | doc/threat_model.md | 24 |
3 files changed, 24 insertions, 24 deletions
diff --git a/doc/implementation_notes.md b/doc/implementation_notes.md index 7555d2e..e298646 100644 --- a/doc/implementation_notes.md +++ b/doc/implementation_notes.md | |||
| @@ -4,7 +4,7 @@ Implementation notes | |||
| 4 | Lightweight cleaning mode | 4 | Lightweight cleaning mode |
| 5 | ------------------------- | 5 | ------------------------- |
| 6 | 6 | ||
| 7 | Due to *popular* request, MAT2 is providing a *lightweight* cleaning mode, | 7 | Due to *popular* request, mat2 is providing a *lightweight* cleaning mode, |
| 8 | that only cleans the superficial metadata of your file, but not | 8 | that only cleans the superficial metadata of your file, but not |
| 9 | the ones that might be in **embedded** resources. Like for example, | 9 | the ones that might be in **embedded** resources. Like for example, |
| 10 | images in a PDF or an office document. | 10 | images in a PDF or an office document. |
| @@ -19,7 +19,7 @@ are entirely removed. | |||
| 19 | deleted. For example journalists that are editing a document to erase | 19 | deleted. For example journalists that are editing a document to erase |
| 20 | mentions sources mentions. | 20 | mentions sources mentions. |
| 21 | 21 | ||
| 22 | - Or they are aware of it, and will likely not expect MAT2 to be able to keep | 22 | - Or they are aware of it, and will likely not expect mat2 to be able to keep |
| 23 | the revisions, that are basically traces about how, when and who edited the | 23 | the revisions, that are basically traces about how, when and who edited the |
| 24 | document. | 24 | document. |
| 25 | 25 | ||
| @@ -27,15 +27,15 @@ are entirely removed. | |||
| 27 | Race conditions | 27 | Race conditions |
| 28 | --------------- | 28 | --------------- |
| 29 | 29 | ||
| 30 | MAT2 does its very best to avoid crashing at runtime. This is why it's checking | 30 | mat2 does its very best to avoid crashing at runtime. This is why it's checking |
| 31 | if the file is valid __at parser creation__. MAT2 doesn't take any measure to | 31 | if the file is valid __at parser creation__. mat2 doesn't take any measure to |
| 32 | ensure that the file is not changed between the time the parser is | 32 | ensure that the file is not changed between the time the parser is |
| 33 | instantiated, and the call to clean or show the metadata. | 33 | instantiated, and the call to clean or show the metadata. |
| 34 | 34 | ||
| 35 | Symlink attacks | 35 | Symlink attacks |
| 36 | --------------- | 36 | --------------- |
| 37 | 37 | ||
| 38 | MAT2 output predictable filenames (like yourfile.jpg.cleaned). | 38 | mat2 output predictable filenames (like yourfile.jpg.cleaned). |
| 39 | This may lead to symlink attack. Please check if you OS prevent | 39 | This may lead to symlink attack. Please check if you OS prevent |
| 40 | against them | 40 | against them |
| 41 | 41 | ||
| @@ -65,10 +65,10 @@ didn't remove any *deep metadata*, like the ones in embedded pictures. This was | |||
| 65 | on of the reason MAT was abandoned: the absence of satisfying solution to | 65 | on of the reason MAT was abandoned: the absence of satisfying solution to |
| 66 | handle PDF. But apparently, people are ok with [pdf redact | 66 | handle PDF. But apparently, people are ok with [pdf redact |
| 67 | tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply | 67 | tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply |
| 68 | transform the PDF into images. So this is what's MAT2 is doing too. | 68 | transform the PDF into images. So this is what's mat2 is doing too. |
| 69 | 69 | ||
| 70 | Of course, it would be possible to detect images in PDf file, and process them | 70 | Of course, it would be possible to detect images in PDf file, and process them |
| 71 | with MAT2, but since a PDF can contain a lot of things, like images, videos, | 71 | with mat2, but since a PDF can contain a lot of things, like images, videos, |
| 72 | javascript, pdf, blobs, … this is the easiest and safest way to clean them. | 72 | javascript, pdf, blobs, … this is the easiest and safest way to clean them. |
| 73 | 73 | ||
| 74 | Images handling | 74 | Images handling |
| @@ -81,7 +81,7 @@ XML attacks | |||
| 81 | ----------- | 81 | ----------- |
| 82 | 82 | ||
| 83 | Since our threat model conveniently excludes files crafted to specifically | 83 | Since our threat model conveniently excludes files crafted to specifically |
| 84 | bypass MAT2, fileformats containing harmful XML are out of our scope. | 84 | bypass mat2, fileformats containing harmful XML are out of our scope. |
| 85 | But since MAT2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities) | 85 | But since mat2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities) |
| 86 | to process XML, it's "only" vulnerable to DoS, and not memory corruption: | 86 | to process XML, it's "only" vulnerable to DoS, and not memory corruption: |
| 87 | odds are that the user will notice that the cleaning didn't succeed. | 87 | odds are that the user will notice that the cleaning didn't succeed. |
| @@ -1,4 +1,4 @@ | |||
| 1 | .TH MAT2 "1" "May 2019" "MAT2 0.9.0" "User Commands" | 1 | .TH mat2 "1" "May 2019" "mat2 0.9.0" "User Commands" |
| 2 | 2 | ||
| 3 | .SH NAME | 3 | .SH NAME |
| 4 | mat2 \- the metadata anonymisation toolkit 2 | 4 | mat2 \- the metadata anonymisation toolkit 2 |
| @@ -32,7 +32,7 @@ show program's version number and exit | |||
| 32 | list all supported fileformats | 32 | list all supported fileformats |
| 33 | .TP | 33 | .TP |
| 34 | \fB\-\-check\-dependencies\fR | 34 | \fB\-\-check\-dependencies\fR |
| 35 | check if MAT2 has all the dependencies it needs | 35 | check if mat2 has all the dependencies it needs |
| 36 | .TP | 36 | .TP |
| 37 | \fB\-V\fR, \fB\-\-verbose\fR | 37 | \fB\-V\fR, \fB\-\-verbose\fR |
| 38 | show more verbose status information | 38 | show more verbose status information |
| @@ -41,7 +41,7 @@ show more verbose status information | |||
| 41 | how to handle unknown members of archive-style files (policy should be one of: abort, omit, keep) | 41 | how to handle unknown members of archive-style files (policy should be one of: abort, omit, keep) |
| 42 | .TP | 42 | .TP |
| 43 | \fB\-s\fR, \fB\-\-show\fR | 43 | \fB\-s\fR, \fB\-\-show\fR |
| 44 | list harmful metadata detectable by MAT2 without | 44 | list harmful metadata detectable by mat2 without |
| 45 | removing them | 45 | removing them |
| 46 | .TP | 46 | .TP |
| 47 | \fB\-L\fR, \fB\-\-lightweight\fR | 47 | \fB\-L\fR, \fB\-\-lightweight\fR |
diff --git a/doc/threat_model.md b/doc/threat_model.md index 31bfe91..8b97c67 100644 --- a/doc/threat_model.md +++ b/doc/threat_model.md | |||
| @@ -3,7 +3,7 @@ Threat Model | |||
| 3 | 3 | ||
| 4 | The Metadata Anonymisation Toolkit 2 adversary has a number | 4 | The Metadata Anonymisation Toolkit 2 adversary has a number |
| 5 | of goals, capabilities, and counter-attack types that can be | 5 | of goals, capabilities, and counter-attack types that can be |
| 6 | used to guide us towards a set of requirements for the MAT2. | 6 | used to guide us towards a set of requirements for the mat2. |
| 7 | 7 | ||
| 8 | This is an overhaul of MAT's (the first iteration of the software) one. | 8 | This is an overhaul of MAT's (the first iteration of the software) one. |
| 9 | 9 | ||
| @@ -53,7 +53,7 @@ Adversary | |||
| 53 | user. This is the strongest position for the adversary to | 53 | user. This is the strongest position for the adversary to |
| 54 | have. In this case, the adversary is capable of inserting | 54 | have. In this case, the adversary is capable of inserting |
| 55 | arbitrary, custom watermarks specifically for tracking | 55 | arbitrary, custom watermarks specifically for tracking |
| 56 | the user. In general, MAT2 cannot defend against this | 56 | the user. In general, mat2 cannot defend against this |
| 57 | adversary, but we list it for completeness' sake. | 57 | adversary, but we list it for completeness' sake. |
| 58 | 58 | ||
| 59 | - The adversary created the document for a group of users. | 59 | - The adversary created the document for a group of users. |
| @@ -65,7 +65,7 @@ Adversary | |||
| 65 | - The adversary did not create the document, the weakest | 65 | - The adversary did not create the document, the weakest |
| 66 | position for the adversary to have. The file format is | 66 | position for the adversary to have. The file format is |
| 67 | (most of the time) standard, nothing custom is added: | 67 | (most of the time) standard, nothing custom is added: |
| 68 | MAT2 must be able to remove all metadata from the file. | 68 | mat2 must be able to remove all metadata from the file. |
| 69 | 69 | ||
| 70 | 70 | ||
| 71 | Requirements | 71 | Requirements |
| @@ -73,28 +73,28 @@ Requirements | |||
| 73 | 73 | ||
| 74 | * Processing | 74 | * Processing |
| 75 | 75 | ||
| 76 | - MAT2 *should* avoid interactions with information. | 76 | - mat2 *should* avoid interactions with information. |
| 77 | Its goal is to remove metadata, and the user is solely | 77 | Its goal is to remove metadata, and the user is solely |
| 78 | responsible for the information of the file. | 78 | responsible for the information of the file. |
| 79 | 79 | ||
| 80 | - MAT2 *must* warn when encountering an unknown | 80 | - mat2 *must* warn when encountering an unknown |
| 81 | format. For example, in a zipfile, if MAT2 encounters an | 81 | format. For example, in a zipfile, if mat2 encounters an |
| 82 | unknown format, it should warn the user, and ask if the | 82 | unknown format, it should warn the user, and ask if the |
| 83 | file should be added to the anonymised archive that is | 83 | file should be added to the anonymised archive that is |
| 84 | produced. | 84 | produced. |
| 85 | 85 | ||
| 86 | - MAT2 *must* not add metadata, since its purpose is to | 86 | - mat2 *must* not add metadata, since its purpose is to |
| 87 | anonymise files: every added items of metadata decreases | 87 | anonymise files: every added items of metadata decreases |
| 88 | anonymity. | 88 | anonymity. |
| 89 | 89 | ||
| 90 | - MAT2 *should* handle unknown/hidden metadata fields, | 90 | - mat2 *should* handle unknown/hidden metadata fields, |
| 91 | like proprietary extensions of open formats. | 91 | like proprietary extensions of open formats. |
| 92 | 92 | ||
| 93 | - MAT2 *must not* fail silently. Upon failure, | 93 | - mat2 *must not* fail silently. Upon failure, |
| 94 | MAT2 *must not* modify the file in any way. | 94 | mat2 *must not* modify the file in any way. |
| 95 | 95 | ||
| 96 | - MAT2 *might* leak the fact that MAT2 was used on the file, | 96 | - mat2 *might* leak the fact that mat2 was used on the file, |
| 97 | since it might be uncommon for some file formats to come | 97 | since it might be uncommon for some file formats to come |
| 98 | without any kind of metadata, an adversary might suspect that | 98 | without any kind of metadata, an adversary might suspect that |
| 99 | the user used MAT2 on certain files. | 99 | the user used mat2 on certain files. |
| 100 | 100 | ||
