problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory
problem reading files from a large directory problem reading files from a large directory
problem reading files from a large directory
Go Back  Xtreme Visual Basic Talk > > > problem reading files from a large directory


Reply
 
Thread Tools Display Modes
  #1  
Old 07-15-2009, 09:00 PM
duckbier duckbier is offline
Newcomer
 
Join Date: May 2009
Posts: 7
Default problem reading files from a large directory


I wrote a cleanup program to go through some directories and delete files based on if the creation date is older than say 6 months. It works fine with some of the directories I have that contain around a few thousand small files. However, there is one directory (that contains small backup files from another program) that is loaded with over 300,000 files and it locks up on me as soon as I read in the first file in that directory.

I am convinced it is the directory has too many files in it to open it. The server that the directory is on is slow. It takes a half hour to open the directory while on the server itself. I know it will take forever to delete the amount of files I want to delete, but I don't understand why it gets stuck and hangs there with no error message.

Here is where I get stuck. Listbox1 is the directory I'm attempting to access

For Each selectFile In My.Computer.FileSystem.GetFiles(ListBox1.Items.Item(Count), FileIO.SearchOption.SearchTopLevelOnly, "*.*")
compFile = Path.GetFileName(selectFile)

This may be confusing, but if anyone has a better idea as to how I can read files from this large directory I would greatly appreciate it. Is there any other information I can provide?

Thank you in advance!
Reply With Quote
  #2  
Old 07-16-2009, 12:21 AM
HQcool22's Avatar
HQcool22 HQcool22 is offline
Regular
 
Join Date: Jun 2009
Location: Oregon
Posts: 94
Default

does it take long time to open a file off the server?
__________________
I know stuff!
Reply With Quote
  #3  
Old 07-16-2009, 04:48 AM
duckbier duckbier is offline
Newcomer
 
Join Date: May 2009
Posts: 7
Default

Yeah it does, and like I said I kind of feel that is why its hanging but thought there might be a way around that. If I open up that particular directory, I have to wait at least a half hour. The server itself needs to be replaced but in the meantime I wanted to run this cleanup program on it.
Reply With Quote
  #4  
Old 07-16-2009, 07:43 AM
Gruff's Avatar
Gruffproblem reading files from a large directory Gruff is offline
Bald Mountain Survivor

Retired Moderator
* Expert *
 
Join Date: Aug 2003
Location: Oregon, USA - deceased
Posts: 6,440
Default

Does your server have a lot of space or is it almost full?

I had a similar problem a short while ago. I solved it by creating a new folder structure on the same server. Perhaps you would use 100 new folders. After normal work hours when there is no load on the server. Log onto the server itself. Do not use a remote session. Use windows explorer and Move files based on Date to your new folder structure.

Moving from one point on a server to another on the same server is really just a renaming process the files contents are not actually moved. This is many orders of magnitude faster than copying.

Once your files are in this new folder structure you will likely to be able to run your program on each new folder. Or for that matter use windows explorer, sort by. date then delete blocks of files.

I probably do not have to say this but do not use drag and drop to move the files. It is much too easy to make a mistake. Use two Explorer Windows. Highlight the files you want to move. Cut and Paste into the new folder.

Another option might be to skip the graphical interface (Explorer) altogether and Move or Delete files via command line. There is a lot less overhead
involved. I hope your server is backed up to tape in case you need to restore.

Finally after it is cleaned out you really should get your IT department to create an automated process that runs periodically that backs up the oldest files in that folder to tape then removes them so you do not get a repeat problem in the future.
__________________
Burn the land and boil the sea
You can't take the sky from me


~T

Last edited by Gruff; 07-16-2009 at 07:56 AM.
Reply With Quote
  #5  
Old 07-16-2009, 08:44 AM
duckbier duckbier is offline
Newcomer
 
Join Date: May 2009
Posts: 7
Default

Gruff,

We're almost at the point where the server will be replaced with a faster server with more disk space. Right now I'm just trying to keep it as clean as possible.

The directory I'm having problems with is producing around 500 text backup files per day. Nobody in our IT department, including myself, realized there were so many being created per day. I like your idea about moving them into folders sorted by date. Once I get caught up, I'll write a small program or script to automate this in the early morning as you suggested. This seems to be the only solution outside of replacing the server. It's outdated and updating it would be a waste of money with little improvement.

This morning I manually deleted all but 6 months worth of the files, which left me with 200,000 files. I ran the program and it cleaned a few files beyond my 6 month mark very slowly, but it did finish.

Thanks for your suggestions and help.
Reply With Quote
  #6  
Old 07-16-2009, 09:12 AM
AtmaWeapon's Avatar
AtmaWeaponproblem reading files from a large directory AtmaWeapon is offline
Fabulous Florist

Forum Leader
* Guru *
 
Join Date: Feb 2004
Location: Austin, TX
Posts: 9,500
Default

You might want to consider using API calls to traverse the directory.

FindFirstFile() can be used to start a search; you can pass the struct it returns to FindNextFile() and FindClose() to get the files in a directory one at a time.

My guess is this will be faster. When you call FileSystem.GetFiles(), it calls to Directory.GetFiles() (which makes more sense to use anyway.) Directory.GetFiles() uses the FindXXXXFile() API to get all the files that match the wildcard in the directory, then returns this information. Since it has to enumerate all of the files, it will take a long time if the directory has hundreds of thousands of files. If you manually call the API yourself, then you don't have to wait so long to start responding. It's still going to take a long time to enumerate all of the files, but it's easier to write a responsive UI this way. I want to make an example, but it will be a while. I may be wrong that this will be faster. I only say this because you will have information about each file more quickly. It will still take roughly the same amount of time to visit each item in the folder. If you are accessing the folder over the network, you may see a benefit to running the application on the server instead.

If your only goal is to keep UI responsiveness, it may be easier to put the GetFiles() call on another thread and display a "please wait..." dialog to the user while it does its business. The only disadvantage to this approach is you have to wait for the files to be enumerated before you can operate on them; with the API you might be able to delete files while you are enumerating (though I have no idea if it's safe.)
__________________
.NET Resources
My FAQ threads | Tutor's Corner | Code Library
I would bet money 2/3 of .NET questions are already answered in one of these three places.
Reply With Quote
  #7  
Old 07-16-2009, 01:33 PM
AtmaWeapon's Avatar
AtmaWeaponproblem reading files from a large directory AtmaWeapon is offline
Fabulous Florist

Forum Leader
* Guru *
 
Join Date: Feb 2004
Location: Austin, TX
Posts: 9,500
Default

I spent a while working on a demo of using FindFirstFile(), but now I'm convinced that it's a red herring. It takes me about a second to enumerate 300,000 files using Directory.GetFiles(), and approximately 50 seconds to use FindFirstFile() and FindNextFile(). Ouch.

It's interesting because Directory.GetFiles() uses these API methods and is so much faster. My wrapper did a lot of work to make using the API calls feel like using a .NET object, but it looks like all I did was add a ton of extra overhead so I won't be posting it.
__________________
.NET Resources
My FAQ threads | Tutor's Corner | Code Library
I would bet money 2/3 of .NET questions are already answered in one of these three places.
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Advertisement:





Free Publications
The ASP.NET 2.0 Anthology
101 Essential Tips, Tricks & Hacks - Free 156 Page Preview. Learn the most practical features and best approaches for ASP.NET.
subscribe
Programmers Heaven C# School Book -Free 338 Page eBook
The Programmers Heaven C# School book covers the .NET framework and the C# language.
subscribe
Build Your Own ASP.NET 3.5 Web Site Using C# & VB, 3rd Edition - Free 219 Page Preview!
This comprehensive step-by-step guide will help get your database-driven ASP.NET web site up and running in no time..
subscribe
problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory problem reading files from a large directory
problem reading files from a large directory
problem reading files from a large directory
 
problem reading files from a large directory
problem reading files from a large directory
 
-->