Last modified: 2014-10-20 11:44:28 UTC
a deep copy "of Entity is made whenever calling Revision::getContent(). the deep copy is done with "unserialize( serialize( $this ) );" which is expensive, time and memory wise. When saving an item, getContent() is called at least 15 times: 5x - WikiPage::getContent 1x - WikiPage::doEditContent 1x - WikiPage::prepareContentForEdit 1x - RepoHooks::onNewRevisionFromEditComplete 1x - WikiPage::isRedirect 1x - WikiPage::isCountable 2x - ChangeNotifier::notifyOnPageModification 2x - Revision::getText 2x - EchoDiscussionParser::getChangeInterpretationForRevision 1x - AbuseFilterHooks::filterEdit 1x - EntityChange::setRevisionInfo 1x - RepoHooks::notifyEntityStoreWatchersOnUpdate 1x - WikiPage::doEditUpdates 1x - Revision::checkContentModel 1x - WikiPage::updateRevisionOn getContent might be called more times in various other extensions that I do not have enabled currently. When viewing or purging the item, getContent is called 5 times: 1x - WikiPage::isRedirect 1x - RepoHooks::onOutputPageBodyAttributes 1x - ViewEntityAction::show (via ContentRetriever::getContentForRequest) 1x - ViewEntityAction::isPlainView (again via ContentRetriever::getContentForRequest) 1x - Article::fetchContentObject possible action items are: 1) reduce the number of times getContent() is called. 2) maybe lazy initialize the full entity parameter? perhaps we can have some sort of stubbed Entity / EntityContent object, with the entity serialization as a variable, and then full serialize when getEntity is called? stuff like redirect target and entity id and type might be made more easily accessible without needing the entire entity object. 3) investigate alternatives for deep cloning the entity that are better performance-wise, not needing to do full serialize/unserialize.
with corrected indentation: 5x - WikiPage::getContent * 1x - WikiPage::doEditContent * 1x - WikiPage::prepareContentForEdit * 1x - RepoHooks::onNewRevisionFromEditComplete * 1x - WikiPage::isRedirect * 1x - WikiPage::isCountable 2x - ChangeNotifier::notifyOnPageModification 2x - Revision::getText * 2x - EchoDiscussionParser::getChangeInterpretationForRevision 1x - AbuseFilterHooks::filterEdit 1x - EntityChange::setRevisionInfo 1x - RepoHooks::notifyEntityStoreWatchersOnUpdate 1x - WikiPage::doEditUpdates 1x - Revision::checkContentModel 1x - WikiPage::updateRevisionOn
Change 167116 had a related patch set uploaded by Aude: Remove duplicate getContentForRequest call in ViewEntityAction https://gerrit.wikimedia.org/r/167116
Change 167127 had a related patch set uploaded by Aude: Don't getContent in OutputPageBodyAttributes handler https://gerrit.wikimedia.org/r/167127
For cases where getContent() is called to look at the EntityContent, but getEntity() is never called on it, we could could give EntityContent a flag that tells it to copy the Entity object when getEntity() is called (or even copy the Entity every time). EntityContent::copy could then just return $this. We could also defer the deserialization of the Entity until getEntity is called (EntityContent would get some sort of SerializedEntity object instead of an actual Entity). We could further have special logic in EntityContent::getEntityId() that would extract the ID from the serialized version without actually unserializing the full Entity.
Change 167116 merged by jenkins-bot: Remove duplicate getContentForRequest call in ViewEntityAction https://gerrit.wikimedia.org/r/167116
Change 167127 merged by jenkins-bot: Don't getContent in OutputPageBodyAttributes handler https://gerrit.wikimedia.org/r/167127
what's still missing? "a deep copy "of Entity is made whenever calling Revision::getContent(). the deep copy is done with "unserialize( serialize( $this ) );" which is expensive, time and memory wise." : daniel is making a patch to help avoid this when the entity is not needed. we need to see if/how much this helps. "When saving an item, getContent() is called at least 15 times: 5x - WikiPage::getContent 1x - WikiPage::doEditContent 1x - WikiPage::prepareContentForEdit 1x - RepoHooks::onNewRevisionFromEditComplete 1x - WikiPage::isRedirect 1x - WikiPage::isCountable 2x - ChangeNotifier::notifyOnPageModification 2x - Revision::getText 2x - EchoDiscussionParser::getChangeInterpretationForRevision 1x - AbuseFilterHooks::filterEdit 1x - EntityChange::setRevisionInfo 1x - RepoHooks::notifyEntityStoreWatchersOnUpdate 1x - WikiPage::doEditUpdates 1x - Revision::checkContentModel 1x - WikiPage::updateRevisionOn getContent might be called more times in various other extensions that I do not have enabled currently." "When viewing or purging the item, getContent is called 5 times: 1x - WikiPage::isRedirect 1x - RepoHooks::onOutputPageBodyAttributes" : removed! "1x - ViewEntityAction::show (via ContentRetriever::getContentForRequest) 1x - ViewEntityAction::isPlainView (again via ContentRetriever::getContentForRequest)" : removed! "1x - Article::fetchContentObject possible action items are: 1) reduce the number of times getContent() is called." : removed 2 places, but more to investigate and do. "2) maybe lazy initialize the full entity parameter? perhaps we can have some sort of stubbed Entity / EntityContent object, with the entity serialization as a variable, and then full serialize when getEntity is called? stuff like redirect target and entity id and type might be made more easily accessible without needing the entire entity object." : in progress "3) investigate alternatives for deep cloning the entity that are better performance-wise, not needing to do full serialize/unserialize."