Xtreme Visual Basic Talk

Xtreme Visual Basic Talk (http://www.xtremevbtalk.com/)
-   .NET File I/O and Registry (http://www.xtremevbtalk.com/-net-file-i-o-and-registry/)
-   -   Using Directory.GetFiles() WITH multiple extensions AND sort order (http://www.xtremevbtalk.com/-net-file-i-o-and-registry/326364-using-directory-getfiles-multiple-extensions-sort.html)

Jayme65 09-01-2013 05:08 AM

Using Directory.GetFiles() WITH multiple extensions AND sort order
 
Hi,

I have to get a directory file list, filtered on multiple extensions...and sorted!

I use this, which is the fastest way I've found to get dir content filtered on multiple extensions:

Code:

Dim ext As String() = {"*.jpg", "*.bmp","*png"}
Dim files As String() = ext.SelectMany(Function(f) Directory.GetFiles(romPath, f)).ToArray
Array.Sort(files)

and then use an array sort.


I was wondering (and this is my question ;)) if there would be a way to do the sorting IN the same main line? A kind of:
Code:

Dim files As String() = ext.SelectMany(Function(f) Directory.GetFiles(romPath, f).Order By Name).ToArray
and, if yes, if I would gain speed doing this instead of sorting the array at the end (but I would do my test and report..as soon as I get a solution!!)?
Thanks for your help!!

AtmaWeapon 09-01-2013 08:47 AM

Short answer: there is, but it won't help.

There's an OrderBy() method that does the same thing the "SQL syntax" does and works like SelectMany(); in this case it takes a function that returns the value used for sorting.

Unfortunately I don't think it will make this go any faster. Directory.GetFiles() returns its results all at once, which means you pay the biggest performance penalty up-front before you've even sorted. There's a Directory.EnumerateFiles() that returns them one at a time, but in order to sort OrderBy() has to iterate over the entire collection anyway.

So as long as "sorted" is a requirement, no variation of the code is going to be significantly faster than any other.

Jayme65 09-01-2013 08:53 AM

AtmaWeapon,

Thanks for your reply!
You're right, I've tested a:
Code:

myFiles = myExtensions.SelectMany(Function(ext) Directory.GetFiles(myPath, ext)).OrderBy(Function(x) x).ToArray
Which gives exactly the same time!

But, for people interested in this topic, I've found that calling GetFiles once then filtering results by file extension is far better...especially when the number of extensions to look for is raising!

Code:

Dim supportedExtensions As String = "*.zip,*.aaa,*.bbb,*.ccc,*.ddd"
Dim files As String() = Directory.GetFiles(romPath, "*.*", SearchOption.AllDirectories)
Array.Sort(files)

For Each fi As String In files
 If supportedExtensions.Contains(Path.GetExtension(fi)) Then
 ...
 End If
Next

...gives invariably the same amount of time whatever the number of extension is...which is not the case of my previous code.
In my case, on 20000 files, 6 extension types: 0.2sec for this method against 0.6sec for the previous one!

PlausiblyDamp 09-01-2013 02:16 PM

No idea if the timing would be any different but you could try
Code:

Dim supportedExtensions As String = "*.zip,*.aaa,*.bbb,*.ccc,*.ddd"
Dim files As String() = Directory.GetFiles(romPath, "*.*", SearchOption.AllDirectories)

For Each fi As String In From fi1 In files.OrderBy(Function(x) x) Where supportedExtensions.Contains(Path.GetExtension(fi1))
    ...
Next

as a pure linq solution and remove the explicit loop and array sort.

If you are running a multi-core system and on .Net 4.0 or later you may get a performance improvement by trying
Code:

Dim supportedExtensions As String = "*.zip,*.aaa,*.bbb,*.ccc,*.ddd"
Dim files As String() = Directory.GetFiles(fromPath, "*.*", SearchOption.AllDirectories)
 
For Each fi As String In From fi1 In files.AsParallel().OrderBy(Function(x) x) Where supportedExtensions.Contains(Path.GetExtension(fi1))
    ...
Next

Like any multithreaded code though performance may or may not improve depending on the size / type of data.

AtmaWeapon 09-01-2013 08:17 PM

In my opinion there's dang near never a reason to use the SQL-like syntax, and especially never a reason to mix it with the functional syntax, but I like the idea of trying to parallelize it.

Intuitively, I think it'd be more efficient to do the Where() first. GetFiles() has to enumerate the hard drive once no matter what. OrderBy() is going to have to sort and is likely O(n log(n)), but if you let the Where clause filter first, you'll end up with a smaller overall n.

Code:

Dim supportedExtensions As String = "*.zip,*.aaa,*.bbb,*.ccc,*.ddd"
Dim fileSelector = Function(filePath As String) As Boolean
                    Return supportedExtensions.Contains(Path.GetExtension(filePath))
                  End Function
Dim filteredFiles = Directory.GetFiles(fromPath, "*.*", SearchOption.AllDirectories) _
                        .AsParallel() _
                        .Where(fileSelector) _
                        .OrderBy(Function(x) x)
 
For Each fi As String In filteredFiles
    ...
Next

No clue if that's actually valid, C# is much more sane with respect to line breaks and SQL. I don't have VS at home to test it out. I like moving as much out of the For Each statement as I can, though.


All times are GMT -6. The time now is 09:21 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Search Engine Optimisation provided by DragonByte SEO v2.0.15 (Lite) - vBulletin Mods & Addons Copyright © 2017 DragonByte Technologies Ltd.
All site content is protected by the Digital Millenium Act of 1998. Copyright©2001-2011 MAS Media Inc. and Extreme Visual Basic Forum. All rights reserved.
You may not copy or reproduce any portion of this site without written consent.