Chapter6 스트림으로 데이터 수집

컬렉터란 무엇인가?

collect로 다양한 요소 누적 방식을 인수로 받아서 스트림의 결과 리듀싱 연산기능을 수행할 수 있다.

> 고급 리듀싱 기능을 수행할 수 있다.

- collect로 결과를 수집하는 과정을 간단하면서도 유연한 방식으로 정의할 수 있다는 점이 컬렉터의 강점임.

Collectors에서 제공하는 메서드의 기능 3가지

1. 스트림 요소를 하나의 값으로 리듀스하고 요약

2. 요소 그룹화

3. 요소 분할

1. 리듀싱과 요약

1) counting 팩토리 메서드를 통해 개수를 구할 수 있다.

@Test
@Description("메뉴 리스트 개수 구하기")
void collectTest6_2_0() throws Exception {
	// given
	setMenu();

	// when
	long streamCount = menu.stream().count();

	//Stream.collect()로 스트림 항목을 컬렉션으로 재구성 가능
	//Collectors.counting으로 스트림의 개수를 출력할 수 있음
	long collectCount = menu.stream().collect(Collectors.counting());

	// then
	assertEquals(9, streamCount);
	assertEquals(9, collectCount);
}

2) maxBy, minBy로 최대/최소값을 구할 수 있다.

@Test
@Description("메뉴 중 칼로리가 제일 높은/낮은 요리 찾기")
void collectTest6_2_1() throws Exception {
	// given
	setMenu();

	// 칼로리 비교하는 comparator 생성
	Comparator<Dish> dishCarloriesComparator = Comparator.comparingInt(Dish::getCalories);

	// when
	//Optional -> null 반환 X
	Optional<Dish> mostCarloriesMenu = menu.stream().collect(maxBy(dishCarloriesComparator));

	Optional<Dish> leastCarloriesMenu = menu.stream().collect(minBy(dishCarloriesComparator));
	// then
	assertEquals("pork", mostCarloriesMenu.get().getName());
	assertEquals("season fruit", leastCarloriesMenu.get().getName());
}

3) summingInt/Long/Double 로 요소들의 총합을 구할 수 있다.

@Test
@Description("메뉴 리스트의 총 칼로리 계산 (요약 연산)")
void collectTest6_2_2() throws Exception{
	// given
	setMenu();

	// 메뉴 리스트의 총 칼로리 계산 (일반R 코드)
	int totalCaroies = 0;
	for (Dish dish : menu) {
		totalCaroies += dish.getCalories();
	}

	// when
	int result = menu.stream().collect(summingInt(Dish::getCalories));

	// then
	System.out.println("total : "+totalCaroies);
	assertEquals(totalCaroies, result);
}

4) averagingInt/Long/Double로 요소들의 평균을 구할 수 있다.

5) summarizingInt로 요소들의 개수, 합계, 평균, 최대값, 최소값등을 구할 수 있다.

6) jaining으로 구분값으로 문자열을 연결할 수 있다.

@Test
@Description("메뉴 이름 전부 1개의 String으로 반환")
void collectTest6_2_3() throws Exception{
	// given
	setMenu();

	// when
	// 요리 이름으로 스트림을 가져온 뒤 컬렉션의 joining을 통해 이름 string을 연결함
	String menuNames = menu.stream().map(Dish::getName).collect(Collectors.joining(", "));
	// then
	System.out.println("All menu : "+menuNames);
}

2. 범용 리듀싱 요약 연산

범용 팩토리 메서드 대신 특화된 컬렉터를 사용함으로써 프로그래밍적 편의성을 높일 수 있다.

reducing(첫번째 인수, 두번째 인수, 세번째 인수)

첫번째 인수 : 리듀싱 연산의 시작 값 또는 결과 스트림이 없는 경우 기본값

두번째 인수 : 변환함수

세번째 인수 : 연산 정보

@Test
@Description("메뉴의 전체 칼로리 총합 계산(범용 리듀싱 요약 연산)")
void collectTest6_2_4() throws Exception{
	// given
	setMenu();

	// when
	// 범용 리듀싱 사용
	int totalCarories1 = menu.stream().collect(reducing(0, Dish::getCalories, (a, b) -> a+b));

	int totalCarories2 = menu.stream().collect(reducing(0, Dish::getCalories, Integer::sum));

	int totalCarories3 = menu.stream().mapToInt(Dish::getCalories).sum();

	// then
	System.out.println("totalCarories : "+totalCarories1);
	System.out.println("totalCarories : "+totalCarories2);
	System.out.println("totalCarories : "+totalCarories3);

	assertEquals(totalCarories1, totalCarories2);
	assertEquals(totalCarories1, totalCarories3);
}

collect의 toList 대신에 reduce를 사용할 수 있으나, 잘못 사용하면 실용성 문제가 발생할 수 있다.

여러 스레드가 동시에 같은 데이터 구조체를 고치면, 리스트 자체가 망가져 병렬로 수행할 수 없기 때문이다.

컬렉션 프레임워크 유연성

1) 같은 연산도 다양한 방식으로 수행할 수 있다.

> 자신의 상황에 맞는 최적의 해법을 선택할 수 있음

3. 그룹화 (groupingBy)

자바 8의 함수형을 이용함으로써 한 줄로 그룹화를 진행할 수 있다.

@Test
@Description("음식 타입별로 그룹화하기")
void collectTest6_3_0() throws Exception{
	// given
	setMenu();

	// when
	Map<Dish.Type, List<Dish>> dishesByType = menu.stream().collect(Collectors.groupingBy(Dish::getType));

	// then
	System.out.println(dishesByType.toString());
}

private enum CaloricType { DIET, NORMAL, FAT};

@Test
@Description("400칼로리 이하 diet, 400~700 normal, 700이상 fat으로 음식 분류하기")
void collectTest6_3_0_ver2() throws Exception{
	// given
	setMenu();

	// when
	Map<CaloricType, List<Dish>> dishesByCaloricType = menu.stream().collect(Collectors.groupingBy(
			dish -> {
				if (dish.getCalories() <= 400) return CaloricType.DIET;
				else if (dish.getCalories() > 400 && dish.getCalories() < 700) return CaloricType.NORMAL;
				else return CaloricType.FAT;
			}
	));

	// then
	System.out.println(dishesByCaloricType.toString());
}

@Test
@Description("칼로리가 500이상인 음식을 제외하고 음식 타입으로 분류하기(그룹화된 요소 조작)")
void collectTest6_3_1() throws Exception{
	// given
	setMenu();

	// when
	Map<Dish.Type, List<Dish>> dishedByType = menu.stream()
			.filter(dish -> dish.getCalories() > 500)
			.collect(Collectors.groupingBy(Dish::getType));

	//groupingBy 팩토리 메서드를 오버로드해서 Fish 타입 자체가 조회 안되는 현상 해결
	Map<Dish.Type, List<Dish>> dishedByType2 = menu.stream()
			.collect(Collectors.groupingBy(Dish::getType, filtering(dish -> dish.getCalories() > 500, toList())));

	// then
	assertEquals(null, dishedByType.get(Dish.Type.FISH));
	assertEquals(0, dishedByType2.get(Dish.Type.FISH).size());
}

1) 다수준의 그룹화

Collectors.groupingBy를 이용해서 항목을 다수준으로 그룹화할 수 있다.

@Test
@Description("음식 타입에서 칼로리 타입별로 분류하기(다수준 그룹화)")
void collectTest6_3_2() throws Exception{
	// given
	setMenu();

	// when
	Map<Dish.Type, Map<CaloricType, List<Dish>>> dishedByType = menu.stream().collect(
			groupingBy(Dish::getType, groupingBy(dish -> {
						if (dish.getCalories() <= 400) return CaloricType.DIET;
						else if (dish.getCalories() > 400 && dish.getCalories() < 700) return CaloricType.NORMAL;
						else return CaloricType.FAT;
					})
			)
	);

	// then
	System.out.println(dishedByType.toString());
}

2) 서브그룹으로 데이터 수집

@Test
@Description("음식 타입별 음식 개수 구하기(서브 그룹으로 데이터 수집)")
void collectTest6_3_3() throws Exception{
	// given
	setMenu();

	// when
	Map<Dish.Type, Long> dishedByType = menu.stream().collect(
    	groupingBy(Dish::getType, counting())
    );
	
    // then
	System.out.println(dishedByType.toString());
}

@Test
@Description("음식 타입별 음식 개수 구하기(서브 그룹으로 데이터 수집)")
void collectTest6_3_3() throws Exception{
	// given
	setMenu();

	// when
	Map<Dish.Type, Long> dishedByType = menu.stream().collect(
			groupingBy(Dish::getType, counting()));
	// then
	System.out.println(dishedByType.toString());
}

4. 분할 (partitioningBy)

분할 함수라 불리는 프레디케이트를 분류 함수로 사용하는 특수한 그룹화 기능이다. 결과 Map key의 형태는 Boolean이다.

@Test
@Description("음식 타입별 음식 개수 구하기(서브 그룹으로 데이터 수집)")
void collectTest6_3_3() throws Exception{
	// given
	setMenu();

	// when
	Map<Dish.Type, Long> dishedByType = menu.stream().collect(
			groupingBy(Dish::getType, counting()));
	// then
	System.out.println(dishedByType.toString());
}

분할의 장점

1. 분할 함수가 반환하는 참, 거짓 두가지 요소의 스트림 리스트를 모두 유지한다.

> 오버로드된 partitioningBy로 세부 리듀싱 연산을 수행할 수 있다.

@Test
@Description("음식 타입별 음식 개수 구하기(서브 그룹으로 데이터 수집)")
void collectTest6_3_3() throws Exception{
	// given
	setMenu();

	// when
	Map<Dish.Type, Long> dishedByType = menu.stream().collect(
			groupingBy(Dish::getType, counting()));
	// then
	System.out.println(dishedByType.toString());
}

숫자를 소수와 비소수로 분할하기

boolean isPrime(int candidate){
	int candidateRoot = (int)Math.sqrt((double)candidate);

	return IntStream.rangeClosed(2, candidateRoot).noneMatch(i -> candidate % i == 0);

}

Map<Boolean, List<Integer>> partitionPrimes(int n){
	return IntStream.rangeClosed(2, n).boxed().collect(partitioningBy(cadidate -> isPrime(cadidate)));
}

@Test
@Description("2에서 n까지의 자연수를 소수와 비소수로 나눠보기 (분할의 장점)")
void collectTest6_4_2() throws Exception{
	// given

	// when
	Map<Boolean, List<Integer>> result = partitionPrimes(20);
	
    // then
	System.out.println(result.toString());
}

5. Collector 인터페이스

Collector 인터페이스는 리듀싱 연산을 어떻게 구현할지 제공하는 메서드 집합으로 구성되어 있다.

그러므로 Collector 인터페이스를 직접 구현함으로 더 효율적으로 문제를 해결하는 컬렉터를 만들 수 있다.

Collector 인터페이스의 메서드 살펴보기

1) supplier 메서드 : 새로운 결과 컨테이너 만들기

- 수집 과정에서 빈 누적자 인스턴스를 만드는 파라미터가 없는 함수

2) accumulator 메서드 : 결과 컨테이너에 요소 추가하기

- 리듀싱 연산을 수행하는 함수 (요소를 탐색하면서 적용하는 함수에 의해 누적자 내부상태가 바뀌므로 누적자가 어떤 값일지 단정할 수 없음)

3) finisher 메서드 : 최종 반환값을 결과 컨테이너로 적용하기

- 스트림 탐색을 끝내고 누적자 객체를 최종 결과로 변환하면서 누적 과정을 끝낼 때 호출할 함수를 반환해야 한다.

4) combiner 메서드 : 두 결과 컨테이너 병합

- 스트림의 서로 다른 서브파트를 병렬로 처리할 때 누적자가 이 결과를 어떻게 처리할지 정의한다.

5) characteristics 메서드 : 컬렉터의 연산을 정의하는 Characteristics 형식의 불변 집합

- 스트림을 병렬로 리듀스할 것인지 그리고 어떻게 최적화 할 것인지 선택한다.

UNORDERED : 순서 상관없음

CONCURRENT : 병렬 처리

IDENTITY_FINISH : 누적자 객체 바로 사용

※ 참고 사항

toList = Collectors.toList() 로 모든 요소를 리스트로 수집함

#참고서적

모던 자바 인 액션 - 저 : 라울-게이브리얼 우르마, 마리오 푸스코, 앨런 마이크로프트

싸니까 믿으니까 인터파크도서

자바 1.0이 나온 이후 18년을 통틀어 가장 큰 변화가 자바 8 이후 이어지고 있다. 자바 8 이후 모던 자바를 이용하면 기존의 자바 코드 모두 그대로 쓸 수 있으며, 새로운 기능과 문법, 디자인 패턴�

book.interpark.com

출판사 : 한빛미디어
발행 : 2019년 08월 01일

'IT > 자바' 카테고리의 다른 글

Chapter 8 컬렉션 API 개선 (0)	2020.10.04
Chapter 7 병렬 데이터 처리와 성능 (0)	2020.10.04
Chapter3 람다 표현식 (0)	2020.09.07
Chapter2 동작 파라미터화 코드 전달하기 (0)	2020.08.30
HashMap의 동작 (0)	2020.07.23

티끌 모아 태산

Chapter6 스트림으로 데이터 수집

'IT > 자바' 카테고리의 다른 글

티스토리툴바

Chapter6 스트림으로 데이터 수집

'IT > 자바' 카테고리의 다른 글

'IT/자바' Related Articles

티스토리툴바