Storage, with Frank de Jonge
Matt Stauffer:
Welcome back to The Laravel Podcast Season Four. Today we are talking to Frank de Jonge, the creator of Flysystem, which powers all of Laravel's internal file system goodness and lots of other wonderful things. Stay tuned. All right, welcome back to The Laravel Podcast Season Four, where every single episode is about a specific topic and today we're talking about storage and file systems and putting your files in the system, or in the server. And one of my good friends, one of my oldest friends in Laravel is going to join us today, who has a very important role to play here that I'm not going to tell you right now because we're going to get there eventually, but first I'm just going say, I'm going to welcome Frank de Jonge.
Matt Stauffer:
And Frank, I'm just going to allow you to be the one to tell us about who you are and where you come from. But I do just want to say that you all use his work every single time you work with storage and file systems. We'll explain that later in the podcast, but just so you know, this guy knows what he's doing. So Frank, if you meet somebody, again, I keep asking this grocery store system thing for people, but in the past or if you meet somebody in any other context, how do you tell people what you do? If they're not programmers, how do you talk about what you do?
Frank de Jonge:
That's an interesting one. I mostly just tell people that I write software.
Matt Stauffer:
Okay, yeah.
Frank de Jonge:
So people ask me what I do and it's like, "do you make websites, then? Is that what you do? Is that sort of the area that you're in?" It's like yeah, well, it's a little bit more than that because you can now have serious software working on web servers that you can use like actual applications for pretty much anything. So in my day to day job, I work at Mollie, which is a company that processes payments, so I do FinTech stuff in PHP.
Matt Stauffer:
Yeah. Yeah.
Frank de Jonge:
So there's probably a lot of people who are like "yeah, I would never do that in PHP," but it's totally possible and yeah.
Matt Stauffer:
I love that you're doing it. And you also do event sourcing stuff, which we won't get into here, but are you doing event sourcing at work or no?
Frank de Jonge:
So some of it, yeah.
Matt Stauffer:
Okay, cool. Because I know that when people talk about when should we use event sourcing, that's a FinTech thing. For anybody who doesn't know, FinTech just means finances. Technology related to finances. So he basically does lots of stuff with... what's the word? Currency and transactions and all that kind of stuff but in PHP. I love it.
Frank de Jonge:
It's an interesting field. Yeah.
Matt Stauffer:
Yeah. It's funny because we were talking before the podcast and I was like, "man, I would love to have you or one of your coworkers on some other time just to talk about doing money in PHP." But that's not today. That's another time. So you work there, so you also... no, again, I'm not talking about why we have you on today, but whatever. You created an open source package with the PHP League, right? Called Flysystem. Was that with the PHP League?
Frank de Jonge:
That's with PHP-
Matt Stauffer:
So with those who are not familiar, because I don't think we've talked about them yet on this podcast, there's a group of packages... I don't know if it's a group of package authors because it's not like all of them are under it, called the PHP League. Is it ThePHPLeague.com? Or PHPLeague.com or something like that.
Frank de Jonge:
ThePHPLeague, yeah.
Matt Stauffer:
ThePHPLeague.com and it says The League of Extraordinary Packages, and I think the primary goal was just to put together some really high quality stuff that targets the PHP environment, right?
Frank de Jonge:
Correct. So initially it started off as a way to have a collective that could together maintain a couple of open source packages and then the togetherness is more in the sense of making sure that we reduce the bus factor on the individual packages.
Matt Stauffer:
Yeah. I like that.
Frank de Jonge:
So if I, god forbid, get run over by a bus, then somebody else can pick it over because there's a lot of people depending on package from the League and we've had a period in time when we were actively sourcing new packages. That's no longer the case, although we still get a lot of requests for that.
Matt Stauffer:
Yeah.
Frank de Jonge:
But we wanted to focus on everything that was outside of frameworks. So my buddy, Phil Sturgeon and Ben Corlett, at the time were looking at first maintaining frameworks. I used to be on a core team with Phil Sturgeon. We kind of got done with the whole framework thing because we saw it got done over and over and we were getting passed by left and right by Laravel, so that's kind of demotivating to be working on a framework. But yeah, that was the case. But I was working on some other stuff and Phil... I was working on a like a MongoDB abstraction and Phil knew about that and was like, "Hey, do you want to bring that abstraction into the starting group of The PHP League?" And I was like "yeah, sure, but I've also got this other thing. It's a file system abstraction that I'm working on." And he's like, "okay, yeah. Could be something." And fast forward and I think we're eight years into it now.
Matt Stauffer:
How many, millions of downloads?
Frank de Jonge:
Yeah. So I always count my downloads on my website, so I've got a counter there and the accumulated number is now at, I think, over 220,000,000.
Matt Stauffer:
Wow.
Frank de Jonge:
So that's... yeah.
Matt Stauffer:
Yeah.
Frank de Jonge:
It's something I never imagined would be the case.
Matt Stauffer:
Yeah.
Frank de Jonge:
For Flysystem, the main package alone is 133,000,000, which is...
Matt Stauffer:
Yeah.
Frank de Jonge:
Yeah.
Matt Stauffer:
It's a lot. Yeah. So Flysystem is a file system abstraction layer for PHP. It's not for Laravel, right? But at some point it got pulled into Laravel core, which is why we're talking about it today. I was planning on talking about it later, but let's just do it right now. If you had to tell somebody simply what Flysystem is, let's talk about a programmer now, what is Flysystem for a programmer?
Frank de Jonge:
Yeah. So if you're interacting with file systems, every file system has their own API. This means if you are integrating with that API, whether it be FTP or Dropbox or AWS, it doesn't really matter, if you are integrating to that, your code knows the destination where you write to. So this means there is a matter of coupling to this implementation. So for small surfaces, it doesn't really matter, but the more you write files and the bigger your application becomes, if this coupling becomes too much it can become problematic.
Matt Stauffer:
Yeah.
Frank de Jonge:
It can also, if there is too much coupling, you call this vendor locking, then the cost of moving away from a particular vendor is too high to pay.
Matt Stauffer:
Yeah. Exactly.
Frank de Jonge:
So what Flysystem is is basically a generic API so that regardless of where your files end up, you write the same codes to get it there and the destination is just a configuration. So this means that you basically rid yourself of this vendor locking and you can switch between file storages if you need to. I think at one time, Taylor messaged me about transferring all kinds of files from FTP to AWS. He was just passing it from one end to the other without even the code having any specific things knowing about the destination and all the consuming code and the code around it would just remain untouched. So that's the power of Flysystem. So it's an abstraction to be able to use whatever file system you want and still get all the same behavior and all the same interface.
Matt Stauffer:
I love that. And we talked about containers in a really recent episode and one of the things we kept getting examples of is let's say you were using Mandrill as an email provider for a long time and then all of a sudden you decided you want to change to use Mailgun because Mandrill started charging money, and we talk about how it's so easy if those clients that you're using are basically using the same interface, then as long as you're making calls against that interface, then if you just swap whether you're using the Mailgun provider or the Mandrill provider, once in the container, we talk about a version of control. And if you all have not listened to the episode with Christoph Rumpel about the container, I'd definitely take a listen to it because we're going to reference it again here. But that same way that you can swap two different email providers, what Flysystem allows you to do, even before you're in the Laravel world, is to similarly use the same code against the same interface and swap between AWS and DigitalOcean Spaces or whatever else it ends up being.
Matt Stauffer:
And so Flysystem made so much sense. And prior to Flysystem coming to Laravel, we just had this file facade, which is very helpful, but it was mainly a wrapper around the existing PHP functions like Open and all that kind of stuff. And so we still have that, although I don't know if it's documented, but the more common one we're going to use is when Flysystem got brought into the core and that's the storage facade. And so when we're dealing with storage in Laravel and we say storage blah blah blah, maybe you'd like to find the desk and then you a method, that method is basically just proxying that off to a particular Flysystem driver. And the Flysystem drivers are each the implementations, right? For each of the storage providers, each of the vendors. Yeah.
Frank de Jonge:
So you could say Flysystem uses an adaptor pattern, so think of an adaptor as you would like if you've got European power outlets and you need to put in an American device. Well you need a thing that you put in between that adapts one thing to the other.
Matt Stauffer:
Yes.
Frank de Jonge:
Well, for file storage, you basically use the same.
Matt Stauffer:
Yeah.
Frank de Jonge:
So while the thing that you're interacting with or the thing that you're connecting to is not identical to the thing that is on the other end being connected with, that doesn't matter for you anymore because your adapter just fits. So it's pretty much the same concept.
Matt Stauffer:
Yeah. I love that.
Frank de Jonge:
And indeed, the file system facade I think is still there. I think also it's still relevant. I think there is a good distinction to be made between having interactions with your local file system and using that file system for storage. So I think the storage name in the facade is actually really smart because it puts a particular view on how you need to treat this.
Matt Stauffer:
Interesting.
Frank de Jonge:
Because as a programmer, you have a lot of interaction with the local file system, like your PHP files are on there, maybe you've got some things that will only ever be local. For example, a dumped routing file, for example.
Matt Stauffer:
Right.
Frank de Jonge:
That gets stored on your local disk, but nobody will ever put that on a remote FTP server and use that for their...
Matt Stauffer:
Right.
Frank de Jonge:
That makes no sense. So that will always ever be local. So having this distinction, I think, is good and making sure that if you think about file storage, you're really focusing around handling stuff that the user uploads, bigger stuff that you need to generate, get inputs, manipulate, and then store somewhere else. But not the things you would have only on that website.
Matt Stauffer:
I love that. Because I'd never actually fully known when should I use file versus storage and you just gave me a great metric. If you can imagine putting this thing on AWS or FTP or Dropbox, usually it's user provided, very frequently it's going to be served to users later. Even if you are temporarily holding it on your local server until you set up AWS or something, that makes sense for the storage facade if it's something that will never be put in that place because it's much more part of the local environment or part of the local code, that makes sense to use the file facade. That is lovely. Thank you.
Frank de Jonge:
It makes it a lot easier, right?
Matt Stauffer:
I love doing this podcast.
Frank de Jonge:
Yeah. But this is also that mental model, I think, is if I look at people trying to use Flysystem for the wrong things, it is them trying to use it for when they should have used a local file system.
Matt Stauffer:
Yeah. I love that.
Frank de Jonge:
Flysystem is an 80/20 solution, right?
Matt Stauffer:
Yeah.
Frank de Jonge:
So it does 80 percent of the things very well.
Matt Stauffer:
Yeah.
Frank de Jonge:
And those other 20 percent of the things, you just shouldn't use Flysystem for that.
Matt Stauffer:
Yeah, that's great.
Frank de Jonge:
Either it's going to be overkill or it won't be a good fit.
Matt Stauffer:
Yeah. I love that. So we usually start with you describing the topic to a five year old and I don't think I even asked you that, so let's roll this back a little bit and just say if you were talking about file storage through a framework, not even going to worry about Flysystem, not Laravel, what does it look like for you to describe this topic to a five year old?
Frank de Jonge:
For a five year old, I don't have kids...
Matt Stauffer:
I know you don't have kids, yeah. They watch TV.
Frank de Jonge:
They watch TV. Do they use social media?
Matt Stauffer:
They watch TV, they use iPads.
Frank de Jonge:
They use iPads. All right.
Matt Stauffer:
No social media, though.
Frank de Jonge:
No social media. Okay. So that's a tough one. Do they use file storage?
Matt Stauffer:
Well, a lot of five year olds these days are in virtual school.
Frank de Jonge:
Oh, right.
Matt Stauffer:
So maybe that might give you some context to work with.
Frank de Jonge:
Of course. I bet they need to create assignments that they need to upload.
Matt Stauffer:
You got it.
Frank de Jonge:
So if you go and have to upload your assignment to your teacher, that goes via your browser. You upload it there and there is a machine that needs to handle this, but it needs to also give it to somebody else afterwards because it's not a direct line. So it needs to be stored in a location, this file for somebody else to retrieve it later.
Matt Stauffer:
Yeah.
Frank de Jonge:
So the file storage is basically everything between the process of you uploading that file to the other person being able to access it and seeing that there is a file, reading the file, and being able to download it.
Matt Stauffer:
Love it.
Frank de Jonge:
That's it.
Matt Stauffer:
For someone who wasn't prepared, who doesn't have kids, you did a pretty good job. I love that. Yeah, so file storage for those of us who are a little bit older than five, it's the process of... and again, you're continuing to make this distinction of it's not as much about reading your .env, it's not as much about even opening some stub files you have. You can do those, but that's a little bit less what we're talking about here. We're talking a little bit more about user uploaded stuff. Sometimes it might be stuff that is particular to this environment, but one thing to note is that I believe that Laravel's file storage, the storage drivers always default to the storage directory. So Frank is super involved and very key for influential Laravel stuff but doesn't write Laravel on the day to day right now.
Matt Stauffer:
So I'm going to try to step in and bring any modern Laravel context. So Frank, I'm not putting this expectation on you, but I'm pretty sure that everything sits in the /storage directory, which is also helpful. At times I've been like "oh, I'm working with this storage facade, why is it always assuming everything is in the storage directory?" And I have to make a new disk for local stuff. I should have been using the file facade for that one. So that's kind of the idea there, is if it belongs in the storage directory, which is this more ephemeral temporary uploaded stuff or whatever else it is, whether it's for the users to download and we're putting it in replace place temporarily or user uploaded or whatever, that's what this is for. So I love that. That's perfect and you're continuing to teach me how to use this well. So you gave one use case for a student, but as a programmer, as a Laravel programmer, what do you think some of the common use cases are for somebody to use this storage facade, this file system?
Frank de Jonge:
Right. So for anything that is involving any type of publishing, you need to store photos, you need to store movies, anything in regards to those kind of media formats, you need a place for that. If you doing... like I do financial processing. A lot of that is basically getting CSV files actually from FTP servers.
Matt Stauffer:
Yeah.
Frank de Jonge:
So like finance-
Matt Stauffer:
I figured.
Frank de Jonge:
Finance is as non-sexy as it is a sexy business.
Matt Stauffer:
Yeah.
Frank de Jonge:
So we're still dealing with FTP here. So in all those kind of cases, when you have these read/write scenarios from interconnected sources, that's where you use file systems.
Matt Stauffer:
Yeah. And then you mentioned user uploads already. So read/writes from different connected... and I appreciate that because we tend to talk about it the most often as user uploaded content, but it's also stuff that you're pulling in from somewhere else or pushing up to somewhere else and I always forget that there's an FTP driver. I feel like there's some cool stuff that I could do with that that I haven't thought about. I feel like having modern programmatic access to FTP must enable some cool things or some cool things built on top, old things. Like what you're doing, right? You're taking old financial systems and building new modern tech on it, so now you've got me thinking. So yeah. So common use cases. User uploaded stuff, writing to, reading from external systems. Any other common use cases you can think of or is that about it?
Frank de Jonge:
Well, interacting with legacy systems, this is one. A lot of old legacy systems, they store intermediate state stuff on FTP systems and those are old systems that people don't want to change. They're still making money and the entire world integrates in that way. Stuff that you can do with that is if you have an easy way of accessing it, interact with FTP and then read it and maybe push it on to Kafka or push it onto something else where you use the new tech.
Matt Stauffer:
Yeah.
Frank de Jonge:
So all of this stuff that is pretty ancient, and FTP, I think, should fall under the ancient category...
Matt Stauffer:
Yeah.
Frank de Jonge:
You can basically put a wrapper around that and use all the new and fancy tech on everything that you connect to.
Matt Stauffer:
I like that.
Frank de Jonge:
So I like it for that kind of use case as well.
Matt Stauffer:
Yeah, and that's cool because what is easier is to get them to take their old COBOL or Fortran or whatever it is system and to add an API or is it for them to build a custom report that exports a CSV and stays it into an FTP accessible directory. Like one speaks the language of those old systems, one of them does not, but we now have the access to that CSV file through the magic of Flysystem. That's great. I love that. Legacy systems. And we talk about legacy systems a lot at Titan and I don't think that's something we talk about that often. So I should make sure everybody at Titan listens to this one.
Frank de Jonge:
I don't think people talk enough about legacy systems.
Matt Stauffer:
Yeah.
Frank de Jonge:
Because I think in general, the systems that make money are mostly legacy systems, so these are older systems, have been around for a while, have maybe not got all the new shiny tech, but that does not mean they are not valuable. I think some people at the company that I work at, we've got some older parts as well and we call those the vintage parts.
Matt Stauffer:
Fantastic. I love that. Vintage. Yeah, that's so much nicer word than legacy, right?
Frank de Jonge:
Right? It exudes some value and there's some finesse to it.
Matt Stauffer:
It's managed to stick around for this long, so obviously it has some value to it.
Frank de Jonge:
Yeah, yeah.
Matt Stauffer:
I like that. Collectible. Collectible code samples. Okay, so we know what this is for, we've actually talked a little bit about the architecture of how Flysystem kind of powers this. And one of the reasons I wanted everybody to hear that, maybe you'll get into it later, I'm not sure, is knowing that if you get stuck in the file system and you're seeing these Flysystem errors, just for people to know, what does the name Flysystem mean? What is it talking about? So off the top of... it's an open source package. Are you still the primary maintainer?
Frank de Jonge:
Yeah.
Matt Stauffer:
I think you are, right?
Frank de Jonge:
Yeah.
Matt Stauffer:
Yeah. So if you get stuck and you're having storage trouble and you want to know what the heck is going on with this Flysystem thing, just go Google Flysystem or Frank de Jonga or maybe I'll just put his phone number up on the internet after this and you can just text him at 3:00 in the morning and ask him, but I just wanted you all to hear that. But let's go back to just using Flysystem primarily, although maybe the file system as well, which is like file Git contents and stuff like that. Using it day to day as a PHP programmer who is new to this, what tripped people up? When you see people getting stuck or maybe coming to Flysystem and asking things and you have to help them figure out a problem, what are the most common ways people get stuck or have to learn a new way of thinking when it comes to this?
Frank de Jonge:
So I think there's two areas which we can look at. One is operating on big files.
Matt Stauffer:
Okay, yeah.
Frank de Jonge:
So that's if you are dealing with videos, for example. These tend to be multi-megabyte things that you need to carry around and so if you're looking at images, those are a couple hundred kilobytes.
Matt Stauffer:
Yeah.
Frank de Jonge:
Sometimes one megabyte. And you can hold those in memory pretty well, but if this number starts to rise and I would even say even if you have a lot of images and you have a lot of traffic, then you want to use something different in terms of you handling that file.
Matt Stauffer:
Yeah.
Frank de Jonge:
So if your user is uploading something, it gets sort of streamed to this temp directory, and so from that you can open that file and you can do file get contents, but this will load that entire thing into a string. And a string is basically everything that you have is also loaded into memory.
Matt Stauffer:
Yeah.
Frank de Jonge:
So if you have a three megabyte file, you will have a three megabyte stream.
Matt Stauffer:
Yeah.
Frank de Jonge:
So this means that your memories will start jumping up and down, will be very unpredictable and difficult to scale and sometimes you will hit an out of memory thing. So one of the things that I see often is people saying, "Well I've got this error from Flysystem that says trying to allocate an X number of bytes and I only have so much free." And they think it's a Flysystem issue because the error says this is from Flysystem.
Matt Stauffer:
Yeah.
Frank de Jonge:
So actually, this is just memory consumption and there is a pretty easy way around it and in part, I think part of the success of Flysystem was that it handles streams very well.
Matt Stauffer:
Yeah.
Frank de Jonge:
So the stream is a more memory efficient way that's more sort of native to the operating system where you only hold a chunk of data in memory at a time and you basically are able to pump it to a destination. So you can flush a string somewhere. So this means that, and it does it per chunk, so at any given time you will only ever have as much as the chunk is. So I think by default PHP uses one megabyte chunks, so this means your top memory consumption based on that file system operation is one megabyte even if you're transferring a multi-gigabyte file.
Matt Stauffer:
Yeah.
Frank de Jonge:
Flysystem does this in a way that also the underlying adapters respect this. So for example, if you're uploading to S3 and you've got a multi-gigabyte file, it will make sure to do that in chunks and it will even do that in parallel, but it will always make sure that the memory consumption will remain low.
Matt Stauffer:
Yeah.
Frank de Jonge:
So whether you're doing that with AWS or you're doing that with Google Cloud storage, it's all the same for Flysystem users and they can just trust that this will respect the memory limits that are set on the machine, so that's quite nice.
Matt Stauffer:
Which we get magically for free because if you use the PUT or PUT file methods on the storage facade, it's already using Flysystem's streaming capacity. So basically if you've got something already, you're writing using streaming. The question, though, is about reading use and streaming, and so that's the thing that's most interesting to me. So if somebody uploads a massive file through a web input, they're going to be just stuck with basically more traditional system, right? Like their browser is going to be hitting the PHP server that's going to be saving it in temp, and that's not something you can control, right? It's when you're moving it from the temp to your final location that you're in control of it, right?
Frank de Jonge:
So there's this thing called PassThru and you can pass a stream on close handler and this will-
Matt Stauffer:
Oh really?
Frank de Jonge:
Make sure that you can also stream the downloads back to the client.
Matt Stauffer:
Okay.
Frank de Jonge:
So of course it depends on your web server configuration, so you will need to do some stuff in Apache or if you use NGINX then you will need to configure some stuff there. I think it's mostly output buffering that you need to get a better handle on. But if you figure this right, you can basically stream it from end to end, so that's pretty nice.
Matt Stauffer:
Okay. Yeah. And there is a... I don't know if it's a function or a macro, but a function on the response helper from Laravel called Stream Download, and again, whatever you put in there will get streamed down to the user. So we can get free streamed saving files to your local file system. If you've got it in your control or whatever else, you can stream it to the file system for free just by using the storage facade, and if you want to download to user you can get it for free using stream download. See, this is the cool thing, right? You did all this work to learn all this stuff. Some other smart person, probably Taylor, pulled it in the core and now we have been getting the advantage of the streaming without even realizing we had it all along. Okay, so big files is the number one complaint you get. Are there any other places you see people getting stuck a lot?
Frank de Jonge:
People trying to put that 20 percent on top of that 80 percent.
Matt Stauffer:
Right. Do you have any examples of the 20 percent of where Flysystem isn't the right fit?
Frank de Jonge:
So for me, I manage a lot of these adapters, so one of the things that I've consciously left out is anything that has to do with URLs. I know that Laravel takes this on because it says for these ones, I support it.
Matt Stauffer:
Yeah.
Frank de Jonge:
And I think that makes a lot of sense for developer experience of a framework to add richness on top of stable stuff, but that does mean for the Flysystem context, it's less relevant.
Matt Stauffer:
Yeah, totally.
Frank de Jonge:
You should not try to get all of the, let's call it, exotic features from everything into this...
Matt Stauffer:
Into all of them.
Frank de Jonge:
One abstraction.
Matt Stauffer:
Yeah.
Frank de Jonge:
Because in the end of the day, that's what Flysystem is, an abstraction, and an abstraction by definition is a simplification of the world. If it did everything, it would not be an abstraction, it would just be the thing that abstracts, right?
Matt Stauffer:
Yeah.
Frank de Jonge:
So there needs to be this commonality and there needs to be an abstract concept that you can replicate over multiple things sometimes with the translation layer. For example, the visibility is an abstraction over Unix file permissions or ACLs in AWS or whatever different kind of file system is used. So this is something that does not directly translate, but you can come up with a concept that does unite it in a useful way.
Matt Stauffer:
I love that.
Frank de Jonge:
So for these cases, I will introduce additional abstractions and for Flysystem V2, those abstractions got a lot richer because you get a lot more fine grained control over it. But for the thing that's one off, so like URLs or some people wanted the MD5 digest of some cloud provider that only that cloud provider supplies or an e-tech, that's only going to create flakiness around the API and that's what you try to avoid as a maintainer of an abstraction.
Matt Stauffer:
Now I have never actually tried to make a custom driver, but let's say that I was working with AWS and I wanted to do all my core work using Laravel's storage facade but I also wanted to extend it. How hard is it to basically take a new either layer on top of it or make my own custom driver that basically is the AWBS driver with a whole bunch of other methods that are very custom to this. Is that kind of what you're talking about or is that...? Like how would you suggest somebody who is just locked into AWS who needs to use all those AWS specific features, how would you suggest they integrate together with Flysystem?
Frank de Jonge:
Well, my advice would be don't use Flysystem for this.
Matt Stauffer:
Fair enough.
Frank de Jonge:
If you're looking at the chain of dependencies, you're creating an AWS SDK instance, you're passing it to Flysystem, and now you want to do something else with SDK. But you already had-
Matt Stauffer:
Why not just work with the SDK in the first place?
Frank de Jonge:
Just use the SDK. You already had it. In fact, you put it there.
Matt Stauffer:
Yeah. Fair enough. Okay, cool. So that's a good point. If you're getting to the point where you do need all those things, that may even be a day where it's time for you to start moving over to using the SDK rather than trying to shoehorn all these things into something that was built to be a more generic abstraction like you were talking about earlier.
Frank de Jonge:
Yeah.
Matt Stauffer:
Cool. I get that.
Frank de Jonge:
It's supposed to be a simplification of the thing, not the thing.
Matt Stauffer:
Yeah.
Frank de Jonge:
Yeah.
Matt Stauffer:
I get that. I like that. For anybody else who is a super nerd, in college I studied... the guy... basically there was this philosopher and I think it may be Baudrillard, but I'm not sure, and he did this whole thing about simulacra where it's... and it's fine if this is not something you've experienced because most humans have not, but he had this idea of basically what would it be like if we had a map that covered the whole world that was exactly the same size as the world, and so now you're getting me back to my college days of what is the simulation. So, anyway. And now I'm going to go email my college professor and ask him some questions about this.
Frank de Jonge:
Well, in the DDD community and especially in the mapping community, this is what they reference a lot.
Matt Stauffer:
Oh really?
Frank de Jonge:
You need to capture at the level of detail that helps you solve the problem.
Matt Stauffer:
Yeah.
Frank de Jonge:
So if you need a map to navigate something, then you need a simplification of it because otherwise you will just be standing in the field trying to figure it out.
Matt Stauffer:
Exactly. So I know the simulation simulacra was Baudrillard, but I am thinking that the map may have been Baudrillard or it may have been Borges. So if anybody knows, I bet you my business partner Dan knows. Let me know afterwards and I'll try and throw it in the show notes. But yes, exactly. That's exactly the point. If you had a whole map, you would just be standing on something as big as the world and it wouldn't be doing the job for you. The benefit of a map is it's smaller than the world and gives you enough information, just like you're saying. The benefit of this is it's smaller than the entire API... I don't know. It's smaller than building everything for everything, which is why it's easier, but then of course you don't get everything. Cool.
Frank de Jonge:
Yeah, we often say with models and maps, all of them are wrong but some are useful.
Matt Stauffer:
Who said... I heard that quote recently. Whoever said it, they know what they are talking about.
Frank de Jonge:
Yeah, they're smart people.
Matt Stauffer:
Okay, so are there any other common places where you see people who are new getting stuck or even if it's not something that people get stuck on, if you have one or two pieces of information you really wanted to impart to somebody to set their direction right as they consider working with storage in the future, is there anything else you want to make sure people hear?
Frank de Jonge:
Yeah. So one other important aspect that I put into the design of Flysystem is to make sure that everything is based off of a relative root. So some people, especially if you're used to operating the file system directly, you're always going from your place and then relative to that by cascading up or using an absolute path, and that's often what people prefer.
Matt Stauffer:
Yeah.
Frank de Jonge:
They say absolute paths are the quickest to resolve and you always know where you are, and that's completely fine from that "I'm operating with a local file system" point of view.
Matt Stauffer:
Right.
Frank de Jonge:
Like it would be weird if everything was relative in your PHP application itself. But if you're looking from a portability aspect, if you consider that everything needs to be stored somewhere and if you... those paths may not map across different storage solutions.
Matt Stauffer:
Right.
Frank de Jonge:
So if you make sure that from a given root, it's always relative to that, whatever it's relative to can change.
Matt Stauffer:
Yeah.
Frank de Jonge:
So this allows you to sunset a lot of this vendor locking that I talked about earlier because if you're now storing in S3 and you've got one bucket that you maybe share for multiple parts of your application but you just have one bucket, then this relative root is not going to be at the root of the bucket, it's going to be somewhere deeper, and if you change this or you want to be able to restructure, and I think this is also one of the things that I personally like, I like being able to move stuff because I like to grow software as it evolves.
Matt Stauffer:
Yeah.
Frank de Jonge:
And I want to have the least amount of constraint. So if I can apply sort of doctrine in this, and not Doctrine the ORM, but doctrine as in thought, to solutions that work regardless of context, it is make sure that when you write files, always make it relative to something so then you can always move it around. So this is even a good thing if you're looking at storing it on the local disk. I've been on projects where all the file system stuff was done locally, but the application was developed and it was working fine and a couple of years later, those people came back to me and they were like, "Hey, we're running out of disk space."
Matt Stauffer:
Okay.
Frank de Jonge:
And I was like okay, well this is something that we need to fix. So what is often, like if you're not in the cloud or you are in the cloud but you're on VMs for example, what is often the case is that either you will get a bigger disk, but that can also result in some down time, but what's more common is that you will just get new mount points and new places where you can write. So then you'll have to do sort of a migration and your base path changes. So if you can do all that without having to reconfigure all the paths in your application to write to this new destination, then that is a win. So that's one of the cases. Always make sure to write in Flysystem based on a relative path.
Matt Stauffer:
It's so funny because we have a client right now who had the exact same thing but not only did they run out of storage space, but everything was in Git, so like gigabytes of file were in the storage directory of their Git and we were working with them to figure out what does it look like to move this stuff over. And it's funny because we had given them lots of very server based suggestions because I tend to go server based before PHP based so I'm teaching them about SCP and stuff like that. But I was like, "You know, Flysystem does have access to that directory and it does have access to your AWS S3 account and that probably would be the most convenient way to do this" and just hadn't even thought about it. So you're opening my brain up for Flysystem and the storage facade in general as tooling not just for day to day but also for this data transfer, which I don't tend to think of it as.
Frank de Jonge:
You could even do this live. So what you can do, because this is an abstraction, so there's an interface that you can integrate with. So what you can do is basically make sure that the outer adapter is something that first tries your original source and then tries your next source. So there's actually for Flysystem an adapter called the replicate adapter...
Matt Stauffer:
Okay.
Frank de Jonge:
Which will also make sure that if you write, it will go into multiple locations and when it reads it, it will try to read from multiple locations. So because you're integrating against this interface, all sorts of magic can happen underneath. So that would completely solve this case, I think, for you.
Matt Stauffer:
So let me re-say it just to make sure I'm hearing you. One of the options is to not move them all at once, but instead just allow the thing to say "Where is it? Is it in the old system or the new system?" And serve it from whichever. Meanwhile, all the new content gets put into the new system. Wow. There's something.
Frank de Jonge:
Right?
Matt Stauffer:
See. Like I said, this is why I have this podcast. Okay. So one of the things I want to say real quick is that I know that when, and you may not know this, but when you're configuring the disks in Laravel, you can say... the local file system disk, you can put a root on it and so you can actually say what's that root going to be. For an S3 disk, you can configure the bucket, but at least by default, there's no root in there as one of the configuration options. I'm wondering if it's something I could add later and I'm looking at the docs right now trying to see. Do you know off the top of your head?
Frank de Jonge:
I think there should be.
Matt Stauffer:
Okay.
Frank de Jonge:
I recently had a couple of interactions with Dries while getting ready for Flysystem V2.
Matt Stauffer:
Oh, cool.
Frank de Jonge:
And he mapped all of those configuration options. I don't think he was surprised by anything there, so I think yeah.
Matt Stauffer:
Cool. It probably can be used. Again, if I find it after the call, I'll throw it in the show notes, everybody. It's not one of the ones that's there by default, but I mean the worst case scenario is you just basically prefix everything in this app with app name slash when you're writing it. But it would be cool if it was something you could actually configure at the file system driver level and I'm pretty darn sure it's possible. So again, I'll throw in the show notes if it is.
Frank de Jonge:
Yeah, and if it's not, registering a custom adapter is also super... or custom driver in Laravel is also super easy and then you can use your own configuration option to put that prefix in there.
Matt Stauffer:
Although if it's not, the first thing I'm going to do is go ask Taylor if he minds if I pull request it because that would be nice, but there may not be reasons it's not there. We'll figure it out. Okay. Is there anything that you want to talk about, that you want to share, that you think everybody should hear that's really on your mind that you're really excited about or anything else like that?
Frank de Jonge:
So last week, the Flysystem V2 was released.
Matt Stauffer:
Oh wait, it's live already? How did I miss that? I thought you were going to trickle that it's coming out in two weeks or something like that. My goodness.
Frank de Jonge:
Well, that was two weeks ago.
Matt Stauffer:
Oh, so it's in RC1 though, so it's not the...
Frank de Jonge:
No, no, no. It's the 2.0.0 final release. Yeah, yeah, yeah. It's totally live. It's totally out there.
Matt Stauffer:
All right.
Frank de Jonge:
The main documentation page will guide you to it.
Matt Stauffer:
I love this.
Frank de Jonge:
Yeah.
Matt Stauffer:
So if I am a Laravel programmer who is not pulling in Flysystem on a day to day basis, you just said you were talking with Dries, so do you know what the timeline is for us getting V2 in Laravel?
Frank de Jonge:
I don't know if I'm putting somebody in the hot seat for this.
Matt Stauffer:
Oh, no. I'm not going to say anything else about it. If I get that information in a way I can share it publicly, then I will throw it in the show notes as well.
Frank de Jonge:
Cool.
Matt Stauffer:
But just know that we will get it at some point. So what are you most excited about about Flysystem V2?
Frank de Jonge:
So Flysystem V2 was... so I've been maintaining this for about eight years now, right?
Matt Stauffer:
Yeah. Lots of time.
Frank de Jonge:
So in these eight years, any mistake that I make with the original design, I've...
Matt Stauffer:
Just stuck with it.
Frank de Jonge:
I've got hit in the face with that for eight years.
Matt Stauffer:
Yeah.
Frank de Jonge:
So I also knew that if you are doing a major version, this is your opportunity to make breaking changes, and if you're going to do that, you should make them worth your while, right?
Matt Stauffer:
Yeah, for sure.
Frank de Jonge:
So I went all out.
Matt Stauffer:
All right.
Frank de Jonge:
So the basic premise is still the same. You have a generic interface, there is write string, read string, there is list contents, and there is getters to get some metadata. The content listing is one of the first things I wanted to have different. One of the things that can happen if you have a large file system, in the old implementation, it would just collect in an array, an array of arrays, with all these properties. So it would have to collect all of them before in the end returning it as the output of that function.
Matt Stauffer:
Yeah.
Frank de Jonge:
So this could mean that while you're listing your files, you would be running out of memory because it would allocate to many resources.
Matt Stauffer:
Yeah.
Frank de Jonge:
So this is one of the things that I wanted to change. so what I did, I made the entire thing generator based.
Matt Stauffer:
I was going to ask. Very cool.
Frank de Jonge:
So whether or not it's multiple API consols of getting more data and more data, it's just being streamed back to you.
Matt Stauffer:
I love that.
Frank de Jonge:
So not only are reads and writes and streamed, listings are also streamed.
Matt Stauffer:
That's brilliant.
Frank de Jonge:
And so because of the streaming thing underneath, I could now offer similar convenience tooling that Laravel does with collections. So it really can actually do maps and filters over those collections.
Matt Stauffer:
Oh, cool.
Frank de Jonge:
And so what you get back, if you want to get a list of files, then you basically get a content listing and then you map to the file name and then you use to array and then you have your array of strings and everything is streamed.
Matt Stauffer:
That's super cool.
Frank de Jonge:
And super memory efficient. So if you're doing large migrations of files, this is a breeze in that construct, basically.
Matt Stauffer:
I love that. And for anyone who is not familiar, generators in PHP are... the simplest way I would explain it is sort of like if you're imagining a for-each loop over a really huge array and the array had to be filled before you could do the for-each loop, with a generator, you do a for-each loop over a generator and it only has to fill the item that you're asking for at that given moment. That's the simplest version of it. So a stream, you're interacting with it as it gets each piece rather than once it's gotten the whole thing.
Frank de Jonge:
Yeah. Yeah. In the context of file systems, comparing it to a file handle over a full file contents is the perfect analogy.
Matt Stauffer:
Yeah, I love that. So the one thing that I noticed that I like the most, and I'm sure you're going to tell us so much more is switching to exceptions for errors instead of true or false. I don't know why, but I just love that so much. I don't even know if Laravel folks are ever going to get that because I think Laravel extracts away the error state, but I do love that you did that. But what other things have you... Are there any other ones you wanted to share?
Frank de Jonge:
Well, so the exception part was the next top thing on my list because I rewrote the exception system around four times before committing to the last one. I think this is where I spent the majority of my time because I wanted something that really resonated with also how I look at just generic application design. So I tend to write unconventional exception messages, so I've got sort of a trick I like to apply. It's that when I read it, well it should be clear of course what's happening, but it always needs to relate back to the thing I was trying to do.
Matt Stauffer:
Okay.
Frank de Jonge:
So I try to apply language tricks in order to make sure that I create a name that resonates that way. So what you'll see in the new version of Flysystem is that all the prefixes for exceptions that occur based on an operation that you tried to execute on Flysystem will start with unable to. So it will say unable to write the file, unable to delete a file, unable to delete a directory.
Matt Stauffer:
Yes.
Frank de Jonge:
And so not only is that a lot more logical than having, for example, data based exception. So the exception is the type so you can remove that and what are you left with, the data base. And what's wrong with the data base?
Matt Stauffer:
Yeah.
Frank de Jonge:
So these things, they don't articulate anything so that's what I applied. Over the years, I've had a couple of times where I was lucky enough to work with Ross Tuck and he-
Matt Stauffer:
Smart guy.
Frank de Jonge:
He is one, he is a very smart guy and he had a lot of original thoughts around this and I think even at Mollie, his blog posts have been referenced for a very long time and we're like hey, if you want a good exception design, then this is something that you can look at.
Matt Stauffer:
Yeah.
Frank de Jonge:
And he mainly focused on the main constructors, so those kind of main constructors also apply there. For example, if you say...
Matt Stauffer:
Love this.
Frank de Jonge:
Unable to write a file at location and that is how you write that there is an exception for something.
Matt Stauffer:
Yeah. I love that.
Frank de Jonge:
So that's really nice. But the last thing that I also spent quite a lot of time on is actually the testing set up of everything.
Matt Stauffer:
Okay.
Frank de Jonge:
So before, I used a lot of mocking for file system interactions of Flysystem. One was I mocked all the way through. In my day to day life, I've been struggling with mocks, especially when projects age.
Matt Stauffer:
Yeah.
Frank de Jonge:
So it really makes it hard to evolve a code base if there is extensive mocking.
Matt Stauffer:
Yeah.
Frank de Jonge:
I'm not saying mocking tools are bad. I've had this discussion before and I'll link you to a blog post of me actually advocating against them.
Matt Stauffer:
Yeah.
Frank de Jonge:
But they're not inherently bad, but my experience with it was bad also because people have not enough alternatives.
Matt Stauffer:
Yeah.
Frank de Jonge:
So for file system interaction, I think the biggest proof which you can do is actually interact with a file system.
Matt Stauffer:
Yes. You're speaking my language here. So that's what your tests do?
Frank de Jonge:
All my tests on every run on Github, they actually talk to an in docker FTP server, an in docker SFTP server, an FTPD server. They talk actually to AWS and they actually talk to Google Cloud.
Matt Stauffer:
Wow.
Frank de Jonge:
So all these things are just tested as they are. But it goes one step further. There is a generic test contract that's an abstract class that defines some common use cases for things that we really want to have the same behavior on. So regardless of what implementation you have, if you extend your base test class from this test class and implement the methods, you will have a base set of tests that you are developing.
Matt Stauffer:
That's so cool.
Frank de Jonge:
And this is really the proof of the pudding that you can really not trust that everything will be the same, but you can prove that everything is the same and proof is better than trust, right?
Matt Stauffer:
That is cool. And I know not everyone here is fully conversed in all this stuff around testing. I mean, one thing to note is that mocks would be something like instead of actually... So what Frank is talking about here is the idea that if you want to test to make sure that you're write method works, you check that there's no file and then you run the write method and then you check that there's a file with the right name and the right permissions and the right everything. And that's literally happening in the tests. Files are being written.
Matt Stauffer:
In mocks, it would have probably been something more like tests to make sure that the write is passed correctly to AWS, you would make a mock of the AWS SDK that should receive method write AWS whatever, and so you're testing in some ways with the mock, you're testing to assure that the methods are called that you're calling. And so it's harder to know for sure that it's doing the thing you want because the more you mock, the less you know that it's doing the thing you can and the more you know that you just wrote your mocks the same way you wrote your code.
Frank de Jonge:
You wrote your integration. Yeah, that's basically it. I've been in situations if the underlying SDK changes, and not always... like backwards compatibility is sometimes broken unintentionally. And for those cases, you want to know, especially if you're on the integration boundaries. And if those are your money makers, you want to be able to assert them.
Matt Stauffer:
And we talked about Adam about that on the testing episode and he said "What I do is I write all my code against my service layer and then I have a separate set of tests where my service layer actually test against the real integration because otherwise how do I know that my service layer..." or whatever you want to call it. But basically, his Dropbox code at some point actually tests against really Dropbox instance, and then everything else in the rest of the app is just tested against a mock of that Dropbox code. But he at some point has to believe that the Dropbox code is doing what it says it does by actually checking it. So I love this.
Frank de Jonge:
Yeah, so if you're looking at that from, let's say, a school of thought kind of perspective, this is a big component, even something that you call a hexagonal architecture. That you have a port where you integrate with the outside world and you have an adaptation to the outside world that implements the port. So the port is the interface and the concrete implementation of that is the adaptor. So for these cases, this always works and Flysystem is built around that same premise, although for a generic use case. But you can apply this holistically as well in application development and that's referenced often as hexagonal architecture.
Matt Stauffer:
Yeah. And Chris Fidao gave a talk about hexagonal architecture years ago at a Laracon, wrote a blog post there. I haven't heard it come up very much, but for anyone who is new to Laravel, new to storage, as you can tell, Frank and I have a history of just kind of going super deep on things. So if you're, I say this often, but if you're overwhelmed by any of the things we're diving deep into, the beginning of this episode, I think, was the most important key. The most important key is for you to understand that you're using this file facade when you're dealing with things that have much more to do with the running of the application, the stuff in your local system, the things that it takes to work.
Matt Stauffer:
You're dealing with the storage facade when you're dealing with storing files, writing files, or taking user uploads and writing them or user downloads or downloading stuff from other servers or whatever. The storage facade is backed by Flysystem and Flysystem is this whole system of adapters that Frank was telling us about. Is there anything else that you feel like, just high level stuff that is worth having a conversation about before we wrap up for today just about dealing with storage? Not even Flysystem, not even Laravel, is there anything else as a web programmer working with file uploads, file downloads, file streaming, is there anything else we haven't covered?
Frank de Jonge:
I think there's one tiny aspect and that is try not to rely too much on metadata from files.
Matt Stauffer:
Okay.
Frank de Jonge:
So a lot of people want to... like if this file was written at this point in time, that means something else, like this time stamp now has a significance to my domain. If it has significance to your domain, store it in a database because if you want to have to rely on your file system for this, you're going to get weird needs from your file system. But also, file systems tend to get corrupted, need to be restored, sometimes migrate, and now all this metadata is basically useless.
Matt Stauffer:
For example, when you look at the created at date on some photos hoping it will tell you when the photo was taken and they all say the date you moved them to your new computer or something like that? No, I totally get that.
Frank de Jonge:
And this is also like the modified time is pretty unique, but add dot time, for example, is something that's only available mostly in local file systems. So I wouldn't count on it. If it's important to you, just make sure that you store it in a normal database.
Matt Stauffer:
That's really helpful. Okay so the last question I usually ask before we start wrapping up is, is there anything else... are there any other learning resources that if someone wanted to learn more about this, are there talks that you or somebody else have given, blog posts that you or somebody else have given, or any training things or anything like that? Are there really good places for people to learn about this?
Frank de Jonge:
So I try to capture the most and the majority of it in the documentation of Flysystem itself. So even if it's around topics like writing deterministic codes around file systems, I will even have pages around that sort of thing.
Matt Stauffer:
I love that. Yeah.
Frank de Jonge:
I've done a lot of talks over the years about Flysystem, so if you type Flysystem on YouTube, I'm sure you will get a couple of hits.
Matt Stauffer:
Love it.
Frank de Jonge:
So any talks there, yeah. They should be good enough.
Matt Stauffer:
I love it. Okay, well we'll find those and we'll get them. So the last thing is, before you tell everybody how they can follow you or pay you money, one other thing. There's a personal fun moment. I ask something about somebody's history and there's probably things from 20 years in your past I could ask you about, but I'm actually super curious. You recently worked for Schiphol Airport and it is maybe my favorite airport I've ever been in in my entire life and it is modern and it's clean and it's well run. And I was just curious, are there any... first of all, did you have fun working there? But second of all, are there any stories? And maybe no, but are there any stories or interesting things that come away from working for an airport as a programmer that you could share?
Frank de Jonge:
Well, so I should say I was a freelancer before and now I am employed. But hands down, that was the best freelancing place to work at ever.
Matt Stauffer:
Really?
Frank de Jonge:
Nothing comes close. The level of trust from the organization, the level of freedom, the drive for innovation there. We were doing Agile while the rest of the company was at that time doing Waterfall planning because Waterfall planning makes sense if you're... like nobody is scrubbing a runway, I can tell you that.
Matt Stauffer:
Yes.
Frank de Jonge:
So that's clever stuff. But we had a literal three by two meter pirate flag on our department where we worked because we were like the programming rebels over there.
Matt Stauffer:
Yeah.
Frank de Jonge:
But it was work hard, play hard. I've never learned so much in my life as in that period. I've never been so excited to take on any assignment there.
Matt Stauffer:
Yeah.
Frank de Jonge:
I actually went back and did another... like I was there for a year and a half and I used to like to move after a year. Already extended that to a year and a half and then came back on a different project within the same company. That was a TypeScript project, so something completely different, but again, turned out to be some of my best programming that I've ever done and the best business successes I've ever been a part of.
Matt Stauffer:
That's really cool.
Frank de Jonge:
So that was amazing. Yeah.
Matt Stauffer:
Yeah. Cool. It's just so funny because I saw it and I just remembered I really liked that airport and just seeing that you were working there was cool. Okay, so if someone wants to follow you, where is the best place to follow you? Is there anything that you want them to buy or download or try or anything like that? What plugs do you have for us?
Frank de Jonge:
Well, of course you can follow me on Twitter, so that's Twitter.com/frankdejonga. I'm pretty sure this will be in the show notes, right?
Matt Stauffer:
It will be in the show notes, yeah.
Frank de Jonge:
So you can sponsor my open source efforts. I'm mostly the sole maintainer of Flysystem. I'm at zero issues right now and that's not because it's...
Matt Stauffer:
Hey. Come on.
Frank de Jonge:
It's not because it's perfect software, but I put in the time. I put in the work.
Matt Stauffer:
Putting the work in. Yeah.
Frank de Jonge:
So if you can, please support me. If you want to do something good for the environment, my Github sponsors thing also links to a place where you can donate trees.
Matt Stauffer:
I love it.
Frank de Jonge:
I'm yay for the environment so if I see that, I'll be just as happy seeing the money go that way than it goes into my pocket.
Matt Stauffer:
I love this. So I've know you for I don't know how long. Not eight years, but close. And I don't think I've ever heard you say your first name before. I've only heard other people say it. And you just said it and I went are you kidding me? Can you say your first name again?
Frank de Jonge:
Frank.
Matt Stauffer:
So I've been saying Frank, like ah, and it's Frank. It's higher in the nasal.
Frank de Jonge:
It's higher.
Matt Stauffer:
How many years have you known me and not told me that I was saying your name wrong? Oh my god. Freaking Americans and just our reputation for getting it wrong.
Frank de Jonge:
I'm a non-confrontational guy. The majority of the people just call me Frank and that's fine for me as well. And for me, I hear you put in effort and that's what I appreciate, man.
Matt Stauffer:
Okay.
Frank de Jonge:
yeah.
Matt Stauffer:
I got your last name right, at least, so I'll take that for a win.
Frank de Jonge:
Yeah.
Matt Stauffer:
All right. It was so wonderful having you. Thank you for Flysystem. And do you still maintain the CSV package or not?
Frank de Jonge:
No, no, no. That's...
Matt Stauffer:
Or is that somebody else that does that?
Frank de Jonge:
That's somebody else, yeah.
Matt Stauffer:
Okay. I was about to thank you for somebody else's work. Well, thank you for Flysystem. Thank you for the stuff... look, because again, you have done something that is so foundational for us and you were at Laracons and Laravel conversations early days and while you were in full PHP, which is not Laravel, your and Phil's work in fuel was a part of helping the PHP ecosystem grow and stuff like that, and I just want people to know what you've done. So thank you for what you've done. Thank you for continuing to help us. I do hope you get some more trees or sponsors out of this, but either way, thank you so much for joining me. It was a ton of fun.
Frank de Jonge:
You're very welcome, man, and thank you for having me on.
Matt Stauffer:
Of course. See y'all next time.