Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Go Back  Xtreme Visual Basic Talk > > > > Excel: Introduction to Array Formulas


Reply
 
Thread Tools Display Modes
  #1  
Old 04-29-2008, 10:43 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Excel: Introduction to Array Formulas


Members frequently post questions about array formulas on the Excel board. This tutorial is designed to provide an introduction on this subject and applies to versions of Excel prior to 2007. If you have any questions about array formulas once you have read the tutorial please start a new thread on the Excel board so that we can try to help.

An updated version of this tutorial will soon be available here.

The tutorial is broken up into the following sections:
  1. Terminology, Rules, Pros and Cons
  2. The Challenges Worksheet
  3. Challenge #1
  4. Data Types and Data Type Conversions
  5. Streamlining Challenge #1 Solution
  6. A Brief Discussion on SUMPRODUCT
  7. Challenge #2
  8. Challenge #3
  9. Challenge #4
  10. Excel VBA: FormulaArray Property
  11. Useful Links


Many thanks go to Nate Oliver and Mike Rosenblum for their input and feedback on this article.

Last edited by Colin Legg; 09-13-2011 at 03:58 AM.
Reply With Quote
  #2  
Old 04-29-2008, 10:45 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Terminology, Rules, Pros and Cons

There are quite a few terms and rules on array formulas so I've put together this overview which should cover the most important and common points.

Definition Of An Array
  • An array can be defined as a group or collection of related items.
  • A one dimensional array, or vector, is a collection of items along a single row or single column.
  • A two dimensional array, or matrix, is a collection of items on multiple rows or columns.

Definition Of An Array Formula
  • An array formula is a formula which performs multiple calculations on one or more arrays. By this definition, these three examples could all be considered to be array formulas (this will be explained in more detail later):
    Code:
    =SUM(D1:D3*E1:E3)              'CSE entry
    =SUM({1;2;3}*{1;2;3})          'Standard formula entry
    =SUMPRODUCT(D1:D3,E1:E3)       'Standard formula entry
  • An array formula which has been entered using CTRL+SHIFT+ENTER can be referred to as a CSE formula.
  • An array formula can be classified into two result types. A single result array formula is where the output of the formula is a single value which can be viewed within a single cell. A multiple result array formula is where the output of the formula is an array of values which can be viewed in its entirety within a block of cells with the same dimensions as the output. A block of cells which shares the same multiple result array formula is called an array block or an array range.

General Array Formula Rules
  • You cannot create or use an array that uses a whole column of cells.
  • Array formulas cannot handle arrays with more than two dimensions.
  • Some worksheet functions cannot accept array data types within certain parameters. Examples include COUNTIF(), SUMIF(), COUNTBLANK() and database functions.
  • Only certain functions can aggregate arrays: these include SUM(), COUNT(), AVERAGE(), MAX(), MIN() and PRODUCT(). Note that CONCATENATE() cannot.

Additional Rules For CSE Formulas
  • The process of entering an CSE formula is the same as that of a standard formula except that, when you type the formula into the formula bar, you complete the entry by pressing CTRL+SHIFT+ENTER instead of ENTER. This will surround the array formula with parentheses ("{}"). The parentheses must not be entered manually.
  • You cannot create more than 65,472 CSE formulas that refer to another worksheet.
  • You cannot enter a CSE formula into a cell if it has already been merged. However, if it is a single cell CSE formula, you can merge it with other cells after the CSE formula has been entered.
  • Because of the way they have been registered, SUMPRODUCT(), INDEX() and LOOKUP() are able to convert references to arrays without the need for CTRL+SHIFT+ENTER.
  • You cannot share an Excel workbook that contains a CSE formula that is linked to another workbook. Workaround Here.
  • Some worksheet functions must be entered as CSE formulas to produce the full, correct result. These include the matrix functions (MINVERSE(), MMULT(), MDETERM()) and TRANSPOSE().

Advantages Of Working With Array Formulas
  • Array Formulas can perform complex calculations which are not possible with other, standard formulas.
  • Array Formulas condense multiple calculations into one cell thereby reducing the size of a workbook.

Disadvantages Of Working With Array Formulas
  • Using too many array formulas will reduce the calculation speed of the application.
  • Other users may not understand them.
  • Other alternatives such as database functions and pivot tables may be more efficient.

Array Ranges Rules
  • You have to select the range of cells to hold your results before you input the CSE formula.
  • You cannot change the contents of an individual cell in an array range. To make changes you can convert the array formula into a normal formula by selecting a cell in the array range, pressing CTRL+/ to expand the selection to the entire array range, pressing F2 and then CTRL+ENTER.
  • You cannot move or delete part of an array range.
  • You cannot insert blank cells into an array range.
  • An array range cannot contain merged cells.

Advantages Of Using Array Ranges
  • Parts of array ranges cannot be altered: this reduces the chance of a user accidentally overtyping part of an array range.
  • A range array may calculate more quickly than using individual formulas in each cell inside the range. It will also reduce the size of the workbook.

Last edited by Colin Legg; 07-29-2011 at 06:11 AM.
Reply With Quote
  #3  
Old 04-29-2008, 10:48 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Challenges Worksheet

For all of the challenges in this thread we will assume that we have the following worksheet set up:
Code:
.       Col A   Col B	Col C
Row 1	Name	Group	Score
Row 2	James	1	25
Row 3	Bob	2	31
Row 4	Brian	1	15
Row 5	James	2	20
Row 6	Bob	1	1
Row 7	Brian	2	3
Row 8	James	1	5
Row 9	Bob	2	67
Row 10	Brian	1	8
Row 11	James	2	9
Row 12	Bob	1	10
Row 13	Brian	2	12
Row 14	James	1	15
Row 15	Bob	2	18
Row 16	Brian	1	20
Row 17	James	2	22
Row 18	Bob	1	25
Row 19	Brian	2	28
A solution workbook is attached at the end of each challenge thread. These workbooks will prove to be an important resource as you experiment with the concepts presented in this tutorial: use and refer to them as you read through each challenge.
Reply With Quote
  #4  
Old 04-29-2008, 10:51 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Challenge #1

To demonstrate the mechanics of an array formula I am going to give a CSE example using the SUM() and IF() functions. I am using these functions because most of you will already be familiar with them and will therefore find it accessible. Then, over the next few posts we will examine each component and improve on our example.

Take a scenario where we want to sum data based on multiple conditions across columns. This cannot be achieved using the SUMIF() worksheet function which has a limit of one conditional range which, in turn, cannot be extended using logical worksheet functions such as AND() and OR(). Here we can achieve our goal by using a CSE array formula.

Challenge

Let’s try to SUM score (column C) if both these conditions are satisfied:
  • The name in column A is equal to "James".
  • The group in column B is equal to 1.


Solution

Code:
CSE formula: =SUM(IF(A2:A19="James",1,0)*IF(B2:B19=1,1,0)*(C2:C19))
Value returned: 45

Explanation

Now let's breakdown this formula into its component parts so we can see how it works. You can use the attached workbook as a reference as we work through it and don't forget that when you type an array formula into the formula bar you have to press CTRL + SHIFT + ENTER.

To begin let's examine the first IF() function call in the solution:
Code:
CSE Formula: =IF(A2:A19="James",1,0)
Value returned: {1;0;0;1;0;0;1,0;0;1,0;0;1,0;0;1;0;0}
The array formula above produces a one dimensional, vertical array of 18 elements (or values). Note that the convention in Excel is to represent horizontal elements (across a row) using commas (",") and vertical elements (down a column) using semi-colons (";"). This may vary depending on your regional settings but is the convention used throughout the tutorial.

Excel iterates through the range A2:A19 evaluating whether or not each cell's value is equal to the string "James". If it is equal to "James" then the condition in the IF() function returns TRUE and 1 is allocated to the corresponding array element; if it is not equal to "James" then the condition in the IF() function returns FALSE and 0 is allocated to the corresponding array element.

Notice that the end output of this array formula is an array of results: it is a "multiple result array formula" and therefore entire result matrix cannot be viewed within a single cell in a spreadsheet. If you put this formula in a single cell then you will only see the first element of the array which, in this case, is a 1.

There are three useful tools available to us when we want to debug an array formula:
  • Use an array range, for example:
    • Within your worksheet select a range one column wide and eighteen rows high (F2:F19 in the attached workbook).
    • Type the array formula =IF(A2:A19="James",1,0) into the formula bar and press CTRL+SHIFT+ENTER.
    Now you will see the range populated with 1s and 0s as outlined above. Take note regarding the rules that govern an array range (in Post #2) and how they apply here. I find this to be a very useful technique.
  • Select sections of the formula in the formula toolbar and press F9.
  • Evaluate the formula via Tools --> Formula Auditing --> Evaluate Formula.

Now let's examine the other components of the array formula:
The logic behind the returned value of this part is exactly the same:
Code:
CSE Formula: =IF(B2:B19=1,1,0)
Value returned: {1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0}
If the value held in each cell in B2:B19 is 1 then a 1 is returned by the IF() function; if the value is a 0 then 0 is returned by the IF() function.


Here's the final part of the array formula. There's no condition to be met so this will simply return the values in the range C2:C19
Code:
CSE Formula: =(C2:C19)
Value returned: {25;31;15;20;1;3;5;67;8;9;10;12;15;18;20;22;25;28}

Okay, so now we've examined how the three result arrays are produced. Next let's examine what happens when they are multiplied together:
Code:
CSE Formula: =IF(A2:A19="James",1,0)*IF(B2:B19=1,1,0)*(C2:C19)

Here we are multiplying together these arrays:
{1;0;0;1;0;0;1;0;0;1;0;0;1;0;0;1;0;0} * 
{1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0} * 
{25;30;15;20;1;3;5;67;8;9;10;12;15;18;20;22;25;28}

The first element in each array is multiplied together, then the second, then the third, and so on:
1*1*25=25
0*0*30=0 
0*1*15=0
1*0*20=0
0*1*1 =0
0*0*3 =0
1*1*5 =5
..... etc ..... 

Value returned: {25;0;0;0;0;0;5;0;0;0;0;0;15;0;0;0;0;0}


The final step is to sum together the results of the multiplication using the SUM() function. By summing up the values to give a single total we are creating a single result array formula: the result can be viewed correctly in a single cell.
Code:
CSE Formula: =SUM(IF(A2:A19="James",1,0)*IF(B2:B19=1,1,0)*(C2:C19))
Value returned: 45
This sums up the elements of the previous result array to give a single value of 45:
Code:
25+0+0+0+0+0+5+0+0+0+0+0+15+0+0+0+0+0 = 45
Attached Files
File Type: zip Challenge 1 Workbook.zip (3.8 KB, 141 views)

Last edited by Colin Legg; 05-09-2009 at 05:02 PM.
Reply With Quote
  #5  
Old 04-29-2008, 10:55 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Data Types and Data Type Conversions

Now that we have a working formula it’s time to discuss how we might simplify it and improve its efficiency. Before we can do this, it is crucial that we have a solid understanding about Excel's data types.

Data Types

Excel classifies data into types including:
  • text (string)
  • logical (boolean)
  • number (double)
  • error
  • array
You can determine the data type of a cell’s value using the TYPE() worksheet function. It’s very important to be aware of the data types because they tell us how Excel will read and treat the data within a cell. Depending on how they are registered, worksheet functions can often only handle particular types of data (this may even vary between parameters within the function). To demonstrate this, in a new worksheet put the following into cells A1:A4 respectively:
Code:
A1	'1
A2	'2
A3	'3
A4      =SUM(A1:A3)
You will see that the SUM() worksheet function returns a value of 0. This is because Excel does not consider the values in A1:A3 to be numbers; it considers them to be text. According to the help file, if an argument passed into the SUM() function is an array or a reference, only numerical data types in that array or reference are summed: text data types are ignored. Therefore we have to convert these text values into numbers before we can sum them with the SUM() function.


Explicit Data Type Conversions

An explicit data type conversion is where we use a worksheet function to convert one data type to another. There are numerous ways we can explicitly tell Excel to convert these text values into number values, one of which is the VALUE() worksheet function. If you look up this function in the help file you will see that its purpose is to convert a text string that represents a number to a number. Put the following formulas into the B column:
Code:
B1	=VALUE(A1)
B2	=VALUE(A2)
B3	=VALUE(A3)
B4	=SUM(B1:B3)
You will now see that the SUM() worksheet function returns a value of 6. The VALUE() function converts the text data types to number data types and then the SUM() function is able to aggregate those numbers.


Implicit Data Type Conversions

If we perform a arithmetic operation on a text data type Excel understands that we want that text to be treated like a number: Excel does the text to number conversion for us. This is called implicit type conversion or coercion. Here are some examples:
Code:
=-A1      negation
=A1^1     exponentiation
=A1*1     multiplication
=A1/1     division
=A1+0     addition
=A1-0     subtraction
Many consider the best way to coerce text to numbers is by negation. Not only is it the most efficient method but it also resides above exponentiation, multiplication, division, addition and subtraction in terms of operator precedence. For information on the operator precedence in formulas please read these links:
MS Help and Support: Order of Operations in Excel Formulas
MS Help and Support: Formula Returns Unexpected Positive Value

It is also important to note that the operator precedence in visual basic for applications is different:
MSDN: Visual Basic for Applications Operator Precedence

Here is an example of using negation. Put the following formulas into the C column:
Code:
C1	=--A1
C2	=--A2
C3	=--A3
C4	=MAX(C1:C3)
You will see that the MAX() worksheet function returns a value of 3.


What Is Negation?

-X is to impose a unary minus on X. It is a mathematical operation as follows:
Code:
-X = 0-X
So -X coerces the text string that represents a number into a negative number data type.

Code:
--X = 0-(0-X)
The second unary minus repeats the process, thereby returning a positive number data type.

The sequence of events for the formula in C1 can be summarized as:
Code:
"1" --------->    -1 --------->    1

Why Is Negation So Efficient?

We are working with signed numeric data types: positive and negative are stored in a Bit within the variable. When you work with byte arrays it becomes very clear that flipping bytes is very fast: flipping a bit, a lighter memory allocation, is very efficient; more so than asking for a derived calculation, which requires you to follow a more rigorous logical process, to the extent where doing it twice is more efficient than requesting X*1 or X^1 or X+0 once.

As a side note, a slightly more concise use of negation (fewer bit flips) than our previous effort might be:
Code:
D1	=-A1
D2	=-A2
D3	=-A3
D4	=-MAX(D1:D3)

How About Converting Other Data Types?

We've already seen how to convert text to numbers, but how do we go about converting between other data types? Below I've listed a few examples.


Logicals To Numbers

When we’re working with array formulas it is very important to be able to convert logical data types into numerical data types because we often find that the worksheet functions we are using are not able to process logical data types whilst they can process number data types. Put the following into cells A10:A13:
Code:
A10	TRUE
A11	FALSE
A12	TRUE
A13	=AVERAGE(A10:A12)
The AVERAGE() function returns an error because it can only aggregate numerical data types (it is trying to divide by zero).

One way to explicitly convert logical data types into number data types is by using the N() worksheet function:
Code:
B10	=N(A10)
B11	=N(A11)
B12	=N(A12)
B13	=AVERAGE(B10:B12)
The N() worksheet function converts the logical data types into number data types. Specifically it converts FALSE to 0 and TRUE to 1. Another worksheet function, SIGN(), works in a similar fashion on logicals. However, the N() and SIGN() worksheet functions are not as simple as a straightforward arithmetic operator so one would expect a slight performance penalty in comparison with coercion techniques.

A preferred method is to coerce logical data types into numbers with negation:
Code:
C10	=--A10
C11	=--A11
C12	=--A12
C13	=AVERAGE(C10:C12)
AVERAGE() returns a value of 2/3.

Numbers To Text

We can explicitly convert a number or logical value to text by using some of the string functions such as CONCATENATE(). We can coerce a number or logical value to text by using the & operator, for example:
Code:
H10	1
H11   =H10&""
H12   =TYPE(H10)           'returns 1 representing a number data type
H13   =TYPE(H11)           'returns 2 representing a text data type

Errors To Logicals

There's no way for us to coerce error values to logicals. Fortunately we have a few worksheet functions available:
  • ISERR()
  • ISERROR()
  • ISNA()



Thanks again to Nate Oliver for the above information he provided on negation!

Last edited by Colin Legg; 05-02-2009 at 06:16 AM.
Reply With Quote
  #6  
Old 04-29-2008, 10:56 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Streamlining Challenge #1 Solution

We can now look to improve the efficiency of our solution to challenge 1.

For reference, here is the original solution again:

Code:
CSE Formula: =SUM(IF(A2:A19="James",1,0)*IF(B2:B19=1,1,0)*(C2:C19))
Value returned: 45

Improving the first part of the solution

Firstly let's consider this part of our original solution.
Code:
CSE Formula: =IF(A2:A19="James",1,0)
Value returned: {1;0;0;1;0;0;1;0;0;1;0;0;1;0;0;1;0;0}

Using the information we have just learnt, we can improve it’s efficiency by rewriting it as this:
Code:
CSE Formula: =--(A2:A19="James")
Value returned: {1;0;0;1;0;0;1;0;0;1;0;0;1;0;0;1;0;0}

That might seem like a bit of a jump so let's consider it in more detail. First of all let's examine the altered version without the negation:
Code:
CSE Formula: =(A2:A19="James")
Value returned: {T;F;F;T;F;F;T;F;F;T;F;F;T;F;F;T;F;F}
Excel works through the range A2:A19 evaluating whether or not each cell's value is equal to the string "James". If it is equal to "James" then the corresponding array element is TRUE (T); else it is FALSE (F).


Next, we can coerce these logical data types into number data types with negation:
Code:
CSE Formula: =--(A2:A19="James")
Value returned: {1;0;0;1;0;0;1;0;0;1;0;0;1;0;0;1;0;0}

Improving the second part of the solution

If we apply the same principle to the second part of the formula we find that we can improve this:
Code:
CSE Formula: =IF(B2:B19=1,1,0)
Value returned: {1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0}
to this:
Code:
CSE Formula: =--(B2:B19=1)
Value returned: {1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0;1;0}

Substituting these improvements into our formula
Code:
CSE Formula: =SUM(--(A2:A19="James")*--(B2:B19=1)*(C2:C19))
Value returned: 45
Now, that’s a more efficient version. But we’re not quite done yet. In this formula you should notice that we are already multiplying the three multiple result arrays together: we are performing a mathematical operation on them. This multiplication is a necessary part of the formula in order to derive the result we want: the multiplication signs are acting like "AND" operators so they cannot be excluded. This mathematical operation also means that Excel will understand that it has to convert the two boolean arrays into numbers. Therefore in this situation negation is not required – it only serves to slow the formula down - and we can streamline our array formula to this:


Final Array Formula Solution

Code:
CSE Formula: =SUM((A2:A19="James")*(B2:B19=1)*(C2:C19))
Value returned: 45
Note that this solution is included in the Challenge #1 workbook.

Last edited by Colin Legg; 04-30-2009 at 05:06 AM.
Reply With Quote
  #7  
Old 04-29-2008, 11:04 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default A Brief Discussion On SUMPRODUCT

We’ve now streamlined our SUM() CSE array formula as much as possible: it's as quick as it's ever going to be. However, we have another array formula option to consider here which uses the SUMPRODUCT() worksheet function.

You will see or hear some Excel users stating that they find a SUMPRODUCT() formula easier to understand than it's array formula counterpart. However, the reality is that the logic used by both is extremely similar - so if you understand one then you should understand the other.

The advantages of using SUMPRODUCT() instead of the CSE array formula equivalent are that it often allows us to bypass the necessity of using the clunky CTRL+SHIFT+ENTER key combination to complete the entry of the formula - something which some users complain as being an inconvenience - and very often the SUMPRODUCT() solution calculates more quickly. This tutorial is specifically designed to address CSE formulas so I'm not going to enter a lengthy analysis of SUMPRODUCT(); I will just highlight some important points.

Here are a few examples showing SUMPRODUCT() formulas with their CSE formula counterparts:

Conditional Sum
Code:
Formula: =SUMPRODUCT(-(A2:A19="James"),-(B2:B19=1),(C2:C19))
Value returned: 45

CSE formula: =SUM((A2:A19="James")*(B2:B19=1)*(C2:C19))
Value returned: 45
Testing shows that the standard formula is much more efficient. It is also simpler and is therefore the better choice.

Conditional Max
Code:
Formula: =SUMPRODUCT(MAX((A2:A19="James")*(B2:B19=1)*(C2:C19)))
Value returned: 25

CSE formula: =MAX((A2:A19="James")*(B2:B19=1)*(C2:C19))
Value returned: 25
Here the use of SUMPRODUCT() seems frivolous: an excuse not to press CTRL+SHIFT+ENTER. However, the SUMPRODUCT() formula is actually marginally quicker than the CSE formula despite the additional function nesting. My vote therefore goes with the CSE formula although there is no clear winner here.

Conditional Large
Code:
Formula: =SUMPRODUCT(LARGE((A2:A19="James")*(B2:B19=1)*(C2:C19),2))
Value returned: 15

CSE formula: =LARGE((A2:A19="James")*(B2:B19=1)*(C2:C19),2)
Value returned: 15
Here the CSE formula is quicker and simpler than its SUMPRODUCT() counterpart.



Summary on CSE Formulas vs SUMPRODUCT()

The general rule of thumb I suggest is this:
Quote:
If you are SUMMING or COUNTING then, all else being equal, use SUMPRODUCT() in preference to a CSE formula. In all other situations use the CSE array formula.


SUMPRODUCT() Formula Variations

I will briefly discuss four possible solutions using SUMPRODUCT(). All four of these formulas are valid working alternatives to our array formula solution from challenge #1.

They are all more efficient than our streamlined CSE formula solution, but which one is the best? Well, all four of these are popular variations, each with its own advantages and disadvantages.

Code:
1 Formula: =SUMPRODUCT(N(A2:A19="James"),N(B2:B19=1),(C2:C19))

2 Formula: =SUMPRODUCT((A2:A19="James")*(B2:B19=1)*(C2:C19))

3 Formula: =SUMPRODUCT(--(A2:A19="James"),--(B2:B19=1),(C2:C19))

4 Formula: =SUMPRODUCT(-(A2:A19="James"),-(B2:B19=1),(C2:C19))
Formula 1
This formula explicitly converts the boolean arrays into numbers via the N worksheet function. This conversion is required because SUMPRODUCT treats array entries that are not numeric as if they were zeros. However, I mentioned earlier that the N worksheet function can be unreliable. Additionally, the other formulas are more efficient than this one since using N adds the cost of additional function calls. My opinion is that this formula is not as good as the others.

Formulas 2 and 3
Choosing a preference between formulas 2 and 3 is not easy.

Testing in VBA shows that formula 2 is marginally less efficient than formula 3. The gain in efficiency is at a cost of increased complexity. In formula 3, the user has to demonstrate understanding that the two boolean arrays must be coerced into numbers before they are passed into SUMPRODUCT. Introducing negation into the fray makes formula 3 look more complicated and this is evidenced by how frequently users ask what the -- is all about (if you're not sure then read the post in this thread about Data Type Conversions)! In formula 2, the coercion is done for the user by using the multiplication operators so, in many ways, it has the closest resemblence to the SUM array formula equivalent. Despite this, formula 3 is the most popular variation on the online forums whilst formula 2 is the second most popular. The most important distinction between these two is that there are certain situations where one will work when the other one does not.

Formula 4
Formula 4 is the variation I tend to use and is the most efficient of all since it uses less negation operations than formula 3. However, it's not as popular as formula 3 for a number of reasons including:
(1) If there are an odd number of conditions then the output of the SUMPRODUCT() function also needs to be negated to return the correct result which adds an extra level of consideration.
(2) Some users also feel that a single unary minus looks more like a typo; a double unary minus looks more deliberate.

Last edited by Colin Legg; 09-10-2009 at 07:38 AM.
Reply With Quote
  #8  
Old 04-29-2008, 11:07 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Challenge #2

Now let’s look at another example; this time we will use the LARGE worksheet function to aggregate the result array.


Challenge

Find the third largest score when either of these conditions is satisfied:
  • The name in column A is equal to "James"
  • The group in column B is equal to 1.

Solution

Code:
CSE formula: =LARGE(IF(IF(A2:A19="James",1,0)+IF(B2:B19=1,1,0)>0,1,0)*(C2:C19),3)
Value returned: 22

Explanation

Code:
CSE Formula: =IF(A2:A19="James",1,0)+IF(B2:B19=1,1,0)
Value returned: {2;0;1;1;1;0;2;0;1;1;1;0;2;0;1;1;1;0}
Excel evaluates whether or not each condition is met and then adds the two result arrays together.

If the value in the above array is greater than 0 then this indicates that one of the conditions has been met. Now, we now intend to multiply this array with an array containing the corresponding scores but, before we can do this calculation, we must adjust the result array so it only contains 1s and 0s: otherwise, where a 2 has been returned, the corresponding score will be incorrectly doubled. We can achieve this by using another IF function:

Code:
CSE Formula: =IF(IF(A2:A19="James",1,0)+IF(B2:B19=1,1,0)>0,1,0)
Value returned: {1;0;1;1;1;0;1;0;1;1;1;0;1;0;1;1;1;0}
We can return the scores with this simple CSE formula:
Code:
CSE Formula: =(C2:C19)
Value returned: {25;31;15;20;1;3;5;67;8;9;10;12;15;18;20;22;25;28}
Now we can do the multiplication:
Code:
CSE Formula: =IF(IF(A2:A19="James",1,0)+IF(B2:B19=1,1,0)>0,1,0)*(C2:C19)
Value returned: {25;0;15;20;1;0;5;0;8;9;10;0;15;0;20;22;25;0}
And finally we find the third largest number in the array with the LARGE function:
Code:
CSE formula: =LARGE(IF(IF(A2:A19="James",1,0)+IF(B2:B19=1,1,0)>0,1,0)*(C2:C19),3)
Value returned: 22

Streamlining Challenge #2 Solution

Using the same ideas presented at the data type conversions section, we can adjust our solution to make it more efficient as follows:
Code:
CSE formula: =LARGE(((A2:A19="James")+(B2:B19=1)>0)*(C2:C19),3)
Value returned: 22
Since we are performing an adding the two boolean arrays together, we are performing a mathematical operation on them and therefore negation is not required. The streamlined solution is included in the attached workbook.
Attached Files
File Type: zip Challenge 2 Workbook.zip (3.2 KB, 83 views)

Last edited by Colin Legg; 04-30-2009 at 05:16 AM.
Reply With Quote
  #9  
Old 04-29-2008, 11:13 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Challenge #3

Here's a third example. This one uses the MAX function to aggregate the result array and I am also going to introduce the concept of using array constants in array formulas: something which can be quite handy.

Challenge

Determine the MAX score (column C) if both these conditions are satisfied:
  • The name in column A is equal to either "James" OR "Bob".
  • The group in column B is equal to 1.


Solution

Code:
Solution 1:
CSE formula: = MAX(ISNUMBER(MATCH(A2:A19,{"James","Bob"},0))*(B2:B19=1)*(C2:C19))
Value returned: 25

By utilising the concept introduced in challenge #2 we can use addition to represent an OR operator:
Solution 2:
CSE formula: = MAX(((A2:A19="James")+(A2:A19="Bob"))*(B2:B19=1)*(C2:C19))
Value returned: 25

Since there are only three different names in total, this solution can also be used 
Solution 3:
CSE formula: = MAX((A2:A19<>"Brian")*(B2:B19=1)*(C2:C19))
Value returned: 25

Solution 3 is the most efficient in this situation, but is only applicable because there are just three different names in the list.

If we consider a situation where there could be many different names, then should we use solution 1 or solution 2? Solution 1 is a better approach for the following reasons:
  • Solution 1 is a faster formula than solution 2.
  • If the qualifying names list of James and Bob was to increase to many names, then this formula is easily extended by simply adding more names to the array constant. However, with solution 3 we have to add an additional (A2:A19="Name") each time so the formula quickly becomes unwieldy.


Explanation of Solution 1

Again, let's break the solution down into it's component parts.

Code:
CSE Formula: =(MATCH(A2:A19,{"James","Bob"},0))
Value returned: {1;2;#N/A;1;2;#N/A;1;2;#N/A;1;2;#N/A;1;2;#N/A;1;2;#N/A}
This array formula passes an array constant (a primer on array constants can be found in the links section) into the MATCH() worksheet function. It returns a vertical array of eighteen elements:
  • The 1's tell us that "James" was found (the first element in the array constant).
  • The 2's tell us that "Bob" was found (the second element in the array constant).
  • The #N/As tell us that neither James nor Bob was found.
Code:
CSE Formula: =ISNUMBER(MATCH(A2:A19,{"James","Bob"},0))
Value returned: {T;T;F;T;T;F;T;T;F;T;T;F;T;T;F;T;T;F}
The next step is to wrap the previous expression with the ISNUMBER() worksheet function. This serves two purposes:
  1. It converts those #N/A errors, which would be problematic later, into FALSE boolean values.
  2. It converts all matches (1s and 2s) into TRUE boolean values. This is handy because when we later multiply this against the Score array, the scores for "Bob" will not be incorrectly doubled.
The remaining two parts are similar to that in previous challenges:
Code:
CSE Formula: =(B2:B19=1)
Value returned: {T;F;T;F;T;F;T;F;T;F;T;F;T;F;T;F;T;F}
Code:
CSE Formula: =(C2:C19)
Value returned: {25;31;15;20;1;3;5;67;8;9;10;12;15;18;20;22;25;28}


Mutliplying the three arrays together returns scores where the criteria have been met:
Code:
CSE Formula: =ISNUMBER(MATCH(A2:A19,{"James","Bob"},0))*(B2:B19=1)*(C2:C19)
Value returned: {25;0;0;0;1;0;5;0;0;0;10;0;15;0;0;0;25;0}
Then we wrap the expression with the MAX() worksheet function to find maximum value within all of the elements of the above array:
Code:
CSE formula: =MAX(ISNUMBER(MATCH(A2:A19,{"James","Bob"},0))*(B2:B19=1)*(C2:C19))
Value returned: 25
All three solutions are included in the attached workbook.
Attached Files
File Type: zip Challenge 3 Workbook.zip (9.4 KB, 45 views)

Last edited by Colin Legg; 04-30-2009 at 05:27 AM.
Reply With Quote
  #10  
Old 04-29-2008, 11:18 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Challenge #4

For the last example I'm going to demonstrate how to find the highest score (column C) in a filtered range given a certain critereon.

A question similar to this recently came up in the Excel board which is what prompted me to include it. After some considerable searching I found the authors of the original solution on that thread: credit to Andrew Poulsom and Aladin Akyurek. There is also a very good article by Dick Kusleika and Laurent Longre under the "Arrays with Offset()" link in the links section of this thread which is where this technique originated.

Let’s suppose for our example that Column A has been filtered to show only "Brian". If you try a formula such as this:
Code:
=MAX(C2:C19)
it will return 67 because it considers all of the cells in the range whether they are visible or not. To find the max visible value in a filtered range you can use the SUBTOTAL function as follows:
Code:
=SUBTOTAL(4,C2:C19)
If you check out this function in the helpfile you will see that 4 represents the function MAX. But what if we want to find the max value in the filtered range given a critereon which hasn't been filtered for?


Challenge

Find the MAX visible score (column C) where the group (column B) is 2 and when the data could be filtered in any way (for our example we will assume that the Name (column A) has been filtered to show only "Brian").


Solution
Code:
CSE formula: =MAX(SUBTOTAL(4,OFFSET(C2,ROW(C2:C19)-ROW(C2),,1))*(B2:B19=2))
Value returned: 28

Explanation

Let's start by examining the part of the formula which is different to previous examples:
Code:
CSE Formula: =SUBTOTAL(4,OFFSET(C2,ROW(C2:C19)-ROW(C2),,1))
Value returned: {0;0;15;0;0;3;0;0;8;0;0;12;0;0;20;0;0;28}
This part of the formula acts as a "filter indicator". It iterates through the range C2:C19 and returns a value of 0 to cells which have been hidden by filtering and the actual values of the cells that are visible. The first argument in the subtotal function is 4 which represents the Max worksheet function. In fact, 9 could also be successfully passed to derive the same result - when passing arrays into the rows/columns parameters of the OFFSET() function, it returns a structure representing ranges and we can pull off the corresponding numbers in those ranges by using N() or maths functions such as MAX() or SUM() - but I am using 4 [MAX()]here because it 'feels' more consistent.

So, the essential point here is that the SUBTOTAL() worksheet function ignores hidden rows that result from a list being filtered. Have a look at the attached workbook to see how the indicators change from *cell value* to 0 as the respective cells are hidden by filtering. If you need to then break this formula down further to see how each part works.

We retrieve a logical array indicating if each score belongs to group 2 as follows:
Code:
CSE Formula: =(B2:B19=2)
Value returned: {F;T;F;T;F;T;F;T;F;T;F;T;F;T;F;T;F;T}
Now we can do the multiplication:
Code:
CSE Formula: =SUBTOTAL(4,OFFSET(C2,ROW(C2:C19)-ROW(C2),,1))*(B2:B19=2)
Value returned: {0;0;0;0;0;3;0;0;0;0;0;12;0;0;0;0;0;28}
And finally we find the highest value in that result array with the MAX function:
Code:
CSE formula: =MAX(SUBTOTAL(4,OFFSET(C2,ROW(C2:C19)-ROW(C2),,1))*(B2:B19=2))
Value returned: 28
Attached Files
File Type: zip Challenge 4 Workbook.zip (3.1 KB, 52 views)

Last edited by Colin Legg; 11-03-2009 at 03:52 AM.
Reply With Quote
  #11  
Old 04-29-2008, 11:22 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Excel VBA: FormulaArray Property

Inserting a CSE array formula into a worksheet via VBA is a simple task achieved by using the Range.FormulaArray property.


Single Result CSE Formulas

This is how you would add the streamlined Challenge 1 CSE formula solution to the worksheet using VBA:
Code:
Sub Example1()

    'put our CSE formula in cell E2
    Sheet1.Range("E2").FormulaArray = "=SUM((A2:A19=""James"")*(B2:B19=1)*(C2:C19))"
End Sub

Multiple Result CSE Formulas

You can insert multiple result CSE formulas into an array range as follows:
Code:
Sub Example2()

    'put our CSE formula in range E2:E19
    Sheet1.Range("E2:E19").FormulaArray = "=(A2:A19=""James"")"
End Sub

Problem-Shooting

You may get the Run-time error '1004' message 'Unable to set the FormulaArray property of the Range class'. The message doesn't contain a lot of useful information so determining the cause of the problem can be quite tough. Here are some reasons why you may be getting this error message:


You Are Trying To Change Part Of An Array Range

For example, this code will fail:
Code:
Sub ErrorExample1A()

    With Sheet1
        'create an array range
        .Range("E2:E19").FormulaArray = "=(A2:A19=""James"")"
    
        'try to change part of an array range --> ERROR
        .Range("E2:E12").FormulaArray = "=SUM((A2:A19=""James"")*(B2:B19=1)*(C2:C19))"
    End With
End Sub
You have to clear the array range first or change the entire array range at the same time. You can determine if a cell is part of an array range as follows:
Code:
Sub ErrorExample1B()
    
    With Sheet1
    'create an array range for this example
        .Range("E1:E20").FormulaArray = "=Sum(R1C1:R3C3)"
    
        With .Range("E1")
            'check if E1 is part of an array range
            If .HasArray Then
        
                'what is the full array range?
                VBA.MsgBox .CurrentArray.Address
            End If
        End With
    End With
End Sub

Your CSE Formula Exceeds 255 Characters

This issue is described on the following MS Support article:
http://support.microsoft.com/kb/213181

More specifically:
  • If you are using A1 notation then the R1C1 equivalent must be less than 255 characters.
  • If you are using R1C1 notation then the formula must be less than 256 characters.
A workaround is using the Range.Replace method as demonstrated here:
http://www.dailydoseofexcel.com/archives/2005/01/10/entering-long-array-formulas-in-vba/E.


Your CSE Formula Contains A Syntax Error Such As A Missing Or Invalid Argument

Code:
Sub ErrorExample3()
    
    With Sheet1.Range("E2")
        'this will give an error because argument in SUM() function missing
        .FormulaArray = "=SUM()"
    
        'this will give an error because SUMIF() cannot accept an array data type
        'passed into its 1st or 3rd parameters: http://support.microsoft.com/kb/214286/
        .FormulaArray = "=SUMIF((A2:A19=1)*(B2:B19),B2,C2:C19)"
    End With
End Sub

You Are Trying To Put A CSE Formula Into A Merged Cell
Code:
Sub ExampleError4A()
   
    With Sheet1
        'merge the cells so we get an error when we try to add the CSE formula
        .Range("E2:F2").Merge
        .Range("E2").FormulaArray = "=SUM((A2:A19=""James"")*(B2:B19=1)*(C2:C19))"
    End With
End Sub
You can check if a cell is part of a merged range as follows:
Code:
Sub ExampleError4B()
    
    With Sheet1
        'merge cells A1:B1 for this example
        .Range("A1:B1").Merge
    
        'check if A1 is part of a merged range
        With .Range("A1")
    
            If .MergeArea.Address = .Address Then
                VBA.MsgBox "Cell not merged"
            Else
                VBA.MsgBox "Cell is merged, merged range = " & .MergeArea.Address
            End If
        End With
    End With
End Sub

A Note On Reference Styles

According to the Excel VBA helpfile and the link below, the formula must use the R1C1 reference style and not the A1 reference style.
http://msdn2.microsoft.com/en-us/library/bb208529.aspx

In this thread I have given examples using both these styles. A1 notation is perfectly acceptable provided that the R1C1 equivalent does not exceed the character limit.

An updated version of this post on the Range.FormulaArray can be found here.

Last edited by Colin Legg; 06-13-2011 at 06:25 AM.
Reply With Quote
  #12  
Old 04-29-2008, 11:23 AM
Colin Legg's Avatar
Colin Legg Colin Legg is offline
Out Of Office

Retired Moderator
* Expert *
 
Join Date: Mar 2005
Location: London, UK
Posts: 3,402
Default Useful Links

Here are some useful links with array formula relevant material. If you have any other useful information or links then please send me a PM and I will add to the list.


Array Formula Tutorials And Information
  1. Microsoft Office Online: Introducing Array Formulas In Excel
  2. Microsoft Office Online: Putting Basic Array Formulas To Work
  3. Microsoft Office Online: Putting Advanced Array Formulas To Work
  4. Microsoft Office Online: Introducing Array Constants In Excel
  5. Bo's Array Formula Page
  6. Charles Williams' Array Formula Page
  7. Tushar Mehta's Array Formula Page
  8. Bob Phillip's SUMPRODUCT() Page
  9. John McGimpsey's "Why use -- in SUMPRODUCT()" Page
  10. Dick Kusleika's 'Anatomy Of An Array Formula' Blog
  11. Bob Umlas' Array Formula Page
  12. Philipp von Wartburg's Array Page
  13. Chip Pearson's Array Formula Page



Array Formula Solutions
  1. Sum values based on multiple conditions
  2. Sum Every Nth Cell
  3. Sum a range that contains error values
  4. Sum Integer Portions Only
  5. Compute an average that excludes zeros
  6. Finding the nth largest values in a range
  7. Finding the nth smallest value in a range
  8. Find the location of the maximum value in a range
  9. Are There Duplicate Values In Two Ranges?
  10. Extracting Just The Numeric Portion Of A Cell
  11. Extracting All Numbers From Text In A Cell *****
  12. Finding The Closest Value To The Average
  13. Count the number of error values in a range
  14. Count the number of differences between two ranges of cells
  15. Count occurrences of substrings in a range of cells
  16. Count characters in a range of cells
  17. Count Unique Items In A Column *****
  18. Create A List Of Unique Items
  19. Find longest text string in a range of cells
  20. Most Common String in a Range Excluding Zeros and Blanks
  21. Creating a calendar with a single array-entered formula
  22. Is Cell Text A Palindrome?
  23. Multiple Criteria LookUps *****
  24. Sorting a List
  25. Arrays with Offset()

Last edited by Colin Legg; 04-30-2009 at 05:27 AM.
Reply With Quote
Reply

Tags
array, array formula, array formula example, array formula links, array formula tutorial, array formulas explained, coercion, conditional lookup, conditional sum, cse formula, data types, excel, formulaarray, matrix formula, negation, sumif and, sumif or, sumproduct


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Advertisement:





Free Publications
The ASP.NET 2.0 Anthology
101 Essential Tips, Tricks & Hacks - Free 156 Page Preview. Learn the most practical features and best approaches for ASP.NET.
subscribe
Programmers Heaven C# School Book -Free 338 Page eBook
The Programmers Heaven C# School book covers the .NET framework and the C# language.
subscribe
Build Your Own ASP.NET 3.5 Web Site Using C# & VB, 3rd Edition - Free 219 Page Preview!
This comprehensive step-by-step guide will help get your database-driven ASP.NET web site up and running in no time..
subscribe
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
 
Excel: Introduction to Array Formulas
Excel: Introduction to Array Formulas
 
-->