diff options
| author | georg | 2019-11-28 02:15:20 +0000 |
|---|---|---|
| committer | jvoisin | 2019-11-30 01:14:41 -0800 |
| commit | 697cb36b814d7e01da336c43b1932264302a2528 (patch) | |
| tree | c14075024469adf5aa30614c95173cada156229c /doc/implementation_notes.md | |
| parent | 6e52661cfb4e79a76a6ff80637d5adf495a15479 (diff) | |
This is mat2, not MAT2
Closes #131
Diffstat (limited to '')
| -rw-r--r-- | doc/implementation_notes.md | 18 |
1 files changed, 9 insertions, 9 deletions
diff --git a/doc/implementation_notes.md b/doc/implementation_notes.md index 7555d2e..e298646 100644 --- a/doc/implementation_notes.md +++ b/doc/implementation_notes.md | |||
| @@ -4,7 +4,7 @@ Implementation notes | |||
| 4 | Lightweight cleaning mode | 4 | Lightweight cleaning mode |
| 5 | ------------------------- | 5 | ------------------------- |
| 6 | 6 | ||
| 7 | Due to *popular* request, MAT2 is providing a *lightweight* cleaning mode, | 7 | Due to *popular* request, mat2 is providing a *lightweight* cleaning mode, |
| 8 | that only cleans the superficial metadata of your file, but not | 8 | that only cleans the superficial metadata of your file, but not |
| 9 | the ones that might be in **embedded** resources. Like for example, | 9 | the ones that might be in **embedded** resources. Like for example, |
| 10 | images in a PDF or an office document. | 10 | images in a PDF or an office document. |
| @@ -19,7 +19,7 @@ are entirely removed. | |||
| 19 | deleted. For example journalists that are editing a document to erase | 19 | deleted. For example journalists that are editing a document to erase |
| 20 | mentions sources mentions. | 20 | mentions sources mentions. |
| 21 | 21 | ||
| 22 | - Or they are aware of it, and will likely not expect MAT2 to be able to keep | 22 | - Or they are aware of it, and will likely not expect mat2 to be able to keep |
| 23 | the revisions, that are basically traces about how, when and who edited the | 23 | the revisions, that are basically traces about how, when and who edited the |
| 24 | document. | 24 | document. |
| 25 | 25 | ||
| @@ -27,15 +27,15 @@ are entirely removed. | |||
| 27 | Race conditions | 27 | Race conditions |
| 28 | --------------- | 28 | --------------- |
| 29 | 29 | ||
| 30 | MAT2 does its very best to avoid crashing at runtime. This is why it's checking | 30 | mat2 does its very best to avoid crashing at runtime. This is why it's checking |
| 31 | if the file is valid __at parser creation__. MAT2 doesn't take any measure to | 31 | if the file is valid __at parser creation__. mat2 doesn't take any measure to |
| 32 | ensure that the file is not changed between the time the parser is | 32 | ensure that the file is not changed between the time the parser is |
| 33 | instantiated, and the call to clean or show the metadata. | 33 | instantiated, and the call to clean or show the metadata. |
| 34 | 34 | ||
| 35 | Symlink attacks | 35 | Symlink attacks |
| 36 | --------------- | 36 | --------------- |
| 37 | 37 | ||
| 38 | MAT2 output predictable filenames (like yourfile.jpg.cleaned). | 38 | mat2 output predictable filenames (like yourfile.jpg.cleaned). |
| 39 | This may lead to symlink attack. Please check if you OS prevent | 39 | This may lead to symlink attack. Please check if you OS prevent |
| 40 | against them | 40 | against them |
| 41 | 41 | ||
| @@ -65,10 +65,10 @@ didn't remove any *deep metadata*, like the ones in embedded pictures. This was | |||
| 65 | on of the reason MAT was abandoned: the absence of satisfying solution to | 65 | on of the reason MAT was abandoned: the absence of satisfying solution to |
| 66 | handle PDF. But apparently, people are ok with [pdf redact | 66 | handle PDF. But apparently, people are ok with [pdf redact |
| 67 | tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply | 67 | tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply |
| 68 | transform the PDF into images. So this is what's MAT2 is doing too. | 68 | transform the PDF into images. So this is what's mat2 is doing too. |
| 69 | 69 | ||
| 70 | Of course, it would be possible to detect images in PDf file, and process them | 70 | Of course, it would be possible to detect images in PDf file, and process them |
| 71 | with MAT2, but since a PDF can contain a lot of things, like images, videos, | 71 | with mat2, but since a PDF can contain a lot of things, like images, videos, |
| 72 | javascript, pdf, blobs, … this is the easiest and safest way to clean them. | 72 | javascript, pdf, blobs, … this is the easiest and safest way to clean them. |
| 73 | 73 | ||
| 74 | Images handling | 74 | Images handling |
| @@ -81,7 +81,7 @@ XML attacks | |||
| 81 | ----------- | 81 | ----------- |
| 82 | 82 | ||
| 83 | Since our threat model conveniently excludes files crafted to specifically | 83 | Since our threat model conveniently excludes files crafted to specifically |
| 84 | bypass MAT2, fileformats containing harmful XML are out of our scope. | 84 | bypass mat2, fileformats containing harmful XML are out of our scope. |
| 85 | But since MAT2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities) | 85 | But since mat2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities) |
| 86 | to process XML, it's "only" vulnerable to DoS, and not memory corruption: | 86 | to process XML, it's "only" vulnerable to DoS, and not memory corruption: |
| 87 | odds are that the user will notice that the cleaning didn't succeed. | 87 | odds are that the user will notice that the cleaning didn't succeed. |
