[Workers] How to receive emails and get attachment

Hi, how could I create a simple worker which receives emails and uploads a PDF attachment to a server?
31 Replies
Cyb3r-Jak3
Cyb3r-Jak37mo ago
And an example email worker would be https://github.com/edevil/email_worker_parser and then you just take the attachments from it and upload to your server
GitHub
GitHub - edevil/email_worker_parser: An example of a worker that pa...
An example of a worker that parses an email. Contribute to edevil/email_worker_parser development by creating an account on GitHub.
ThatOtherAndrew
ThatOtherAndrew7mo ago
Forgive me for the incredibly basic question, but I can't resolve require("postal-mime") - is there some way to do that within the basic browser editing environment or would I have to have a local project of some kind which I can then publish? Alternatively, just looking at the postal-mime example on the NPM site, the following example was used:
const email = await parser.parse(`Subject: My awesome email 🤓
Content-Type: text/html; charset=utf-8

<p>Hello world 😵‍💫</p>`);
const email = await parser.parse(`Subject: My awesome email 🤓
Content-Type: text/html; charset=utf-8

<p>Hello world 😵‍💫</p>`);
Which along with the repository example above suggests to me that the attachment data is somehow encoded into event.raw. Would it be a sensible/plausible idea to HTTP POST the entire content of event.raw to my webserver and do the parsing there, in an environment a bit less restrictive (and which I'm much more comfortable with) than Workers? slight bump!
DaniFoldi
DaniFoldi7mo ago
The quick editor and the playground don't currently support importing npm modules, so you'd have to work on it locally and use wrangler deploy to publish to the edge.
ThatOtherAndrew
ThatOtherAndrew7mo ago
I see, thanks Now, I'm incredibly lazy, so just a quick question, would there be any harm to just making event.raw the body of an HTTP request and posting that? It may very slightly cut down in CPU time too
Chaika
Chaika7mo ago
no I'd say that's a good idea. the raw property of the EmailEvent is a ReadableStream which is fine to set as a body of a request, and it allows you to stream it through the Worker without buffering it, ex
export default {
async email(message, env, ctx) {
await fetch("https://yourwebserver.example.com", {
body: message.raw,
method: "POST",
headers: {
"email-to": message.to,
"email-from": message.from,
"apikeyorsomeidentifideryouuse": env.secret,
}
});
}
}
export default {
async email(message, env, ctx) {
await fetch("https://yourwebserver.example.com", {
body: message.raw,
method: "POST",
headers: {
"email-to": message.to,
"email-from": message.from,
"apikeyorsomeidentifideryouuse": env.secret,
}
});
}
}
ThatOtherAndrew
ThatOtherAndrew7mo ago
That's perfect, I'll give that a shot - thank you very much!
ThatOtherAndrew
ThatOtherAndrew6mo ago
However, it doesn't seem to actually work
ThatOtherAndrew
ThatOtherAndrew6mo ago
Since this is the format that message.from appears to take
No description
ThatOtherAndrew
ThatOtherAndrew6mo ago
Is it safe to just parse for what's between the angle brackets? Is that guaranteed to be the actual sender's email address and it can't be spoofed?
Chaika
Chaika6mo ago
That looks like header.from message.from is Envelope from and message.headers.get from is header from, https://www.xeams.com/difference-envelope-header.htm Email is insecure by design, it's not really a question of "can it be spoofed" and instead "if the sender properly sets everything up on their end, how easy is it for this stuff to be spoofed?" afaik DKIM Alignment checks should make both have to align. Usually clients like gmail and such will parse header from and use that for search and stuff. Parsing it isn't so easy though, will probably want to find a library for RFC5322.FROM parsing
ThatOtherAndrew
ThatOtherAndrew6mo ago
oof, this is a little confusing - sorry, I'm being a little slow here 😅 You were correct in identifying that I used message.header.from instead of message.from - that's my bad, I think I might've nicked that from one of the templates or something Can I trust that message.from accurately describes the sender then?
Chaika
Chaika6mo ago
not really It's not what you'd see in gmail or any email client Those all show header from, and they also use header from for searches and stuff
ThatOtherAndrew
ThatOtherAndrew6mo ago
hmm, why do the examples use it for whitelisting/blacklisting senders then?
Chaika
Chaika6mo ago
If they didn't, they'd have to parse it and use some parser They used to use header before and they just didn't work for people because they were an exact match
ThatOtherAndrew
ThatOtherAndrew6mo ago
So what would your advice be if I want a worker which only accepts emails from a specific inbox I control? Should I look for the parsing library you mentioned above? Or maybe I should have the worker forward all email, and instead have the safe validation be handled on my server side
Chaika
Chaika6mo ago
You control the sending inbox? It matters a lot less in that case, you could just hardcode it to either, shouldn't change, and if you have dmarc setup it right it would ensure alignment of both
ThatOtherAndrew
ThatOtherAndrew6mo ago
Oh, right To summarise: - I want to only accept emails from a specific inbox - I control the sending inbox - I want this to be secure enough that no other inbox would be able to pass the filter Would message.from be the correct attribute for me to access in this case?
Chaika
Chaika6mo ago
My understanding is that the header from is the "more secure" option as long as you are using DKIM signing, as the entire contents of the message, including the header from, is signed with a key on that domain
ThatOtherAndrew
ThatOtherAndrew6mo ago
right... I'm still a bit lost then Is it Good Enough™️ to just check for the part between the angle brackets or no?
Chaika
Chaika6mo ago
If you have control over the sending inbox, why not match the whole thing? Or do you think you'd change your name? It's kind of silly but it removes any issues with parsing it
ThatOtherAndrew
ThatOtherAndrew6mo ago
I'd rather not hardcode the name part of it because then that means that the system would break if I changed my display name in my email client yep, the latter Or rather, not me changing it - it'd be my organisation changing it My only concern with that is, does the email protocol allow for someone to just insert angle brackets into their display name, therefore tricking the worker into thinking it's a legitimate address? I'm aware that it's very unlikely someone will figure that out, but I don't like the idea of security through obscurity
Chaika
Chaika6mo ago
The format for it is RFC5322.FROM I don't know much about it other then it's complex to write a parser for it
ThatOtherAndrew
ThatOtherAndrew6mo ago
goodness, yeah I'm hoping that I can take a shortcut given the fact that I know for certain emails from my inbox will take the format of "Display name" <actual@email.address> Would something like this be secure?
message.headers.get('from').endsWith('<actual@email.address>')
message.headers.get('from').endsWith('<actual@email.address>')
Chaika
Chaika6mo ago
I have no clue, you could try reading over the spec yourself, it's in rfc5322. I haven't really tried other then understanding it's too much to do by hand
Chaika
Chaika6mo ago
apparently I was off about alignment though, dmarc ensures at least one aligns with the header from. ex, this is valid: X-Mail-From being the envelope from, which is authorized by records on spfmailtechno.com to send. From: is auth'd via DKIM, and aligns in this case. DMARC only requires one to pass & align for it to work. This describes it better then I can: https://support.google.com/a/answer/10032169?hl=en Mail servers all have their own additional checks they put into their spam score though, ex fastmail considers this to be misaligned and that somehow factors into their code, which is what I got confused by it. Anyway email is a mess, afaik the from header is still better as long as you're using dkim/dmarc, and it aligns with what you'd expect
No description
ThatOtherAndrew
ThatOtherAndrew6mo ago
Goodness, this is confusing Honestly I'll just attempt this and hope and pray for the best
Chaika
Chaika6mo ago
lol sorry it is a bit confusing but I probably made it worse then it is. I would just see it as: Header From is the right thing to use. In order to spoof header from, you'd either need your domain to have an SPF Policy that includes the Sender's IP, or a dkim record that includes the signature of the key that signed the message. Both things involve control over the domain. Envelope from is more free, only SPF cares about it and you could fail it and still have the mail delivered as long as the dkim record matched some other domain. Envelope from is used for auth, not identity.
ThatOtherAndrew
ThatOtherAndrew6mo ago
I see, thank you! Then if I'm understanding this correctly, my above method should actually be secure, right?
Chaika
Chaika6mo ago
In theory of using the right field? Yea In how you're checking? No clue lol. All the docs on the from header and parsing are here: https://datatracker.ietf.org/doc/html/rfc5322#section-3.6.3 It seems that maybe it would be fine, as long as Cloudflare is validating the header is right and rejecting anything wrong. <> are only allowed in quotes
ThatOtherAndrew
ThatOtherAndrew6mo ago
perfect, thank you! that's Good Enough™️ for me then, I'm considering this solved!