Write a function named reportConflicted
that processes a file of phrases, converts those phrases into acronyms, and reports the acronyms that could be expanded to more than one unique phrase.
Your function accepts a string representing the input file name as its parameter.
The file contains a list of phrases, one phrase per line.
The words are separated by single spaces and the important words in each phrase are capitalized:
Vice Provost of Teaching and Learning
University of South Carolina
University of Southern California
Marc Tessier Lavigne
University of Spoiled Children
Leland Stanford Junior University Marching Band
Marc Tessier Lavigne
Leland Stanford Junior University Marching Band
Motion Time Lapse
University of South Carolina
An acronym is an abbreviation that concatenates the initial letters of the important capitalized words.
For example, the first line has the acronym "VPTL".
For this problem we will define an acronym as conflicted if it could be expanded to form more than one phrase from the file.
"VPTL" can only expand to the first line's phrase, so we say this acronym is not conflicted.
Two acronyms in this file are conflicted: "MTL" can expand to two distinct phrases and "USC" to three.
You must process a file of phrases like the one above and report all acronyms that are conflicted, in alphabetical order.
Output one acronym per line with its count of distinct phrases.
On the above file, the console output would be:
MTL: 2
USC: 3
Notice that duplicates can occur in the file, such as University of South Carolina and Leland Stanford Junior University Marching Band.
We are concerned with unique phrases; the acronym "LSJUMB" appears twice but it is for the same phrase, so it is not conflicted.
And the two occurrences of Marc Tessier Lavigne count only 1 toward MTL's count of 2.
Your solution may use any of the standard collections.
You can use the stringSplit
function to split a string into words and return the words as a vector
.
You may assume that the file exists, that no lines are blank, and that each line contains at least one capitalized word.