I'm just curious why they want web data-- what is written here seems different from both spoken English and formal writing.
My guess is that that's exactly the reason.
Anya ,'Sleeper'
A thread to discuss naming threads, board policy, new thread suggestions, and anything else that has to do with board administration and maintenance. Guaranteed to include lively debate and polls. Natter discouraged, but not deleted.
Current Stompy Feet: ita, Jon B, DXMachina, P.M. Marcontell, Liese S., amych
I'm just curious why they want web data-- what is written here seems different from both spoken English and formal writing.
My guess is that that's exactly the reason.
Can we turn -age words into gerunds?
t /ducks
We can make new words?
*starts searching for George W. Bush interviews and speeches*
CROMULENT!
I'm all for the idea.
Oh my god, that's so COOOOL.
I'm all for it.
Foamy foamy foamy! AIFG!
I t counts hastily eleventibillion the YES!
That sounds very cool. Is this a situation in which we could approve certain threads for the samples to be taken from (or exclude a given thread)? I don't have a particular concern, but I could see where some people might.
I am so freaking pleased with this idea, and I'm still going to probably argue against it. I'm the privacy freak. I know that it sounds terrific, and I'm sure it will be used responsibly, but my inner paranoia bells are going off something awful.
But I gotta say I love the idea of affecting dictionary verbiage for all eternity. Or, you know, ten years. Whichever comes first.
Corpsified. Definitely Corpsified.
Wow, glad to see so many responses already.
It would be the words in context -- otherwise they're not data, they're just anecdotes.
We could certainly but the kibosh on releasing certain threads -- I was thinking Bitches, for example, has more personal info than any of the others. Natter, the Music, Fic, and Movie threads, and the show/spoiler threads would probably be the most valuable.
The data can be anonymized so that no user name, personal name, or place name appears. So that "I hung out in Somerville with Emily and VWbug last night" would appear as "I hung out in PLACENAME with PERSONNAME and PERSONNAME last night." Actual replacement strings would vary.
Let me know what other questions you have! Remember, I can't put foamy in the dictionary until I can show use ... like, in a major corpus of American English ...
The corpus researchers (and lexicographers) want this data specifically because it has not been professionally edited, and because it's so wide-ranging. Linguists go to great lengths to get this kind of data -- one project gave free phone calls to grad students as long as they let themselves be recorded, in order to get spoken language data.