How to find large files with size in Linux? find and du command example

One of the common problem while working in UNIX is to find large files to free some space. Suppose, your file system is full and you are receiving an alert to remove spaces or if your host is run out of space and your server is not starting up, the first thing you do is find top 10 largest files and see if you can delete them. Usually, old files, large Java heap dumb are good candidates for removal and freeing up some space. If you are running Java application e.g. core java based programs or web application running on Tomcat then you can remove those heap dump files and free some space, but the big question is how do you find those? How do you know the size of the biggest file in your file system, especially if you don’t know which directory it is? We’ll try to find answers for some of those questions in this article.

When I was new to Linux, I don’t have any other choice but to go to the log directory and look for old files which are larger among rest and delete them. They worked well until one day our server died due to a huge cache file. I wasn’t able to locate that because it wasn’t in the log directory, then I come to know about find command which let you search sub-directories for large files as shown below:

$ find . -size +1G

This command will print all the files which are greater than 1GB from current directory and any subdirectory. The only problem with this one is that it doesn’t print the exact size. The problem was solved by using the  -printf option, which allows you to specify a format String much like Java’s printf() method. See Linux Command Line Interface (CLI) Fundamentals to learn more about various options of find command in Linux.

 

You can further tweak the command to find files up-to certain size e.g. below command will find all files.  Here is the modified UNIX command to find large files with size :

$ find . -size +1G -printf ‘%s %p\n’

here is %s is for size and %p are for the path.

Alternatively, You can also use -exec option to run ls on each file the find command return to print its size as shown below:

$ find . -size +100M -exec ls -sh {} \;

This is good enough, you can just see which files you can delete and free some space, but problem is that you will not find any file which is larger than 1GB, hence I always use this command with some hypothetical large number e.g. 10GB etc, but, those are just workaround, not the proper fix. Let’s see what we can do next.

Btw, you can also use the du  (disk usage) command to find large directories and their size, as shown below :

$ du -a . | sort -n -r | head -n 10
16095096 .
13785288 ./logs
6095380 ./logs/app
2125252 ./temp
2125244 ./temp/data
2125240 ./temp/data/app

This is the right command, it will list both directories and file. I have also combined the output of the du command with sort command to print the top 10 largest file and directories. This is exactly what we are looking for. In fact, this is also one of the frequently asked Linux Interview questions, so if you know this trick you answer this question on interviews as well.

That’s all about how to find the large files and directories in Linux. As I said, earlier I used to search large files by using find command with -size option but that is more or less a guess work because you never know the size of the largest file in a machine, but by using a reasonable high size, you can possibly find all big files in your filesystem.  One more command you can use to find the large files with size in Linux is the disk usage or du command, which will also list both files and directories.

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>